MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

TL;DR

MDER-DR框架通过实体中心的摘要提高多跳问答性能，提升66%。

cs.CL 🔴 高级 2026-03-12 13 次浏览

Riccardo Campi Nicolò Oreste Pinciroli Vago Mathyas Giudici Marco Brambilla Piero Fraternali

知识图谱多跳问答信息检索实体摘要跨语言

核心发现

方法论

MDER-DR框架结合了Map-Disambiguate-Enrich-Reduce (MDER)和Decompose-Resolve (DR)两种方法。MDER在知识图谱构建阶段生成实体级别的摘要，避免了问答检索阶段的显式图遍历。DR则将用户查询分解为可解析的三元组，通过迭代推理将其锚定在知识图谱中。这种结合使得MDER-DR在处理稀疏、不完整和复杂关系数据时表现出色。

关键结果

在标准和特定领域的基准测试中，MDER-DR相较于传统的RAG基线实现了高达66%的性能提升，尤其是在WikiQA数据集上，MDER-DR的Soft EM得分为0.800，而最佳基线Vector-RAG仅为0.538。
在HotpotQA数据集中，MDER-DR在LLM-as-a-Judge评估中表现优异，得分为0.515，相较于其他RAG架构有显著提升。
在BenchEE数据集上，MDER-DR在领域专家的人工评估中获得了高分，显示出在处理专业领域内容时的强大能力。

研究意义

MDER-DR框架在学术界和工业界都有重要影响。它解决了知识图谱中多跳问答的长期痛点，特别是在信息丢失和复杂关系推理方面。通过在索引阶段压缩关系信息，MDER-DR不仅提高了检索效率，还增强了跨语言的鲁棒性。这种方法为多语言和多领域的问答系统提供了新的可能性。

技术贡献

MDER-DR的技术贡献在于其在索引阶段就压缩了多跳关系信息，避免了推理阶段的显式图遍历。这种方法与现有的最先进方法有根本区别，提供了新的理论保证和工程可能性。通过实体中心的摘要，MDER-DR实现了对复杂关系的高效检索和推理。

新颖性

MDER-DR是首个在索引阶段通过实体摘要压缩多跳关系信息的框架。与现有的多跳问答方法相比，它在推理阶段不需要显式的图遍历，显著提高了效率和准确性。

局限性

在处理非常复杂的关系网络时，MDER-DR可能会面临实体摘要过于简化的问题，从而导致信息丢失。
由于依赖于大语言模型，MDER-DR在处理非常长的文本时可能会遇到性能瓶颈。
在某些特定领域，MDER-DR可能需要针对性地调整参数以获得最佳性能。

未来方向

未来的研究方向包括进一步优化MDER-DR在处理超大规模知识图谱时的性能，以及探索其在更多领域和语言中的应用。此外，结合其他先进的自然语言处理技术，如深度学习和图神经网络，可能会进一步提升其性能。

AI 总览摘要

知识图谱（KGs）在结构化信息方面发挥着重要作用，但在多跳问答任务中，现有的检索增强生成（RAG）方法往往因文本被简化为三元组而丢失重要的上下文信息。特别是在需要从多个实体、事实或关系中组合答案的多跳问答中，这种信息丢失会导致性能下降。

为了解决这一问题，我们提出了一种基于知识图谱的问答框架MDER-DR，该框架涵盖了索引和检索/推理阶段。MDER-DR由两个主要组件组成：Map-Disambiguate-Enrich-Reduce (MDER)和Decompose-Resolve (DR)。MDER在知识图谱构建阶段生成上下文派生的三元组描述，并将其与实体级别的摘要整合，从而避免了在问答检索阶段显式遍历图中的边。DR则将用户查询分解为可解析的三元组，并通过迭代推理将其锚定在知识图谱中。

MDER-DR在多个多跳问答基准测试中表现出色，包括跨语言和特定领域的设置。实验结果表明，与标准的RAG基线相比，MDER-DR在性能上有显著提升，尤其是在WikiQA和HotpotQA数据集上。MDER-DR的实体中心摘要有效地保留了提取精确答案所需的细节。

然而，MDER-DR在处理非常复杂的关系网络时，可能会面临实体摘要过于简化的问题，从而导致信息丢失。此外，由于依赖于大语言模型，MDER-DR在处理非常长的文本时可能会遇到性能瓶颈。未来的研究方向包括进一步优化MDER-DR在处理超大规模知识图谱时的性能，以及探索其在更多领域和语言中的应用。

深度解读

原文摘要

Retrieval-Augmented Generation (RAG) over Knowledge Graphs (KGs) suffers from the fact that indexing approaches may lose important contextual nuance when text is reduced to triples, thereby degrading performance in downstream Question-Answering (QA) tasks, particularly for multi-hop QA, which requires composing answers from multiple entities, facts, or relations. We propose a domain-agnostic, KG-based QA framework that covers both the indexing and retrieval/inference phases. A new indexing approach called Map-Disambiguate-Enrich-Reduce (MDER) generates context-derived triple descriptions and subsequently integrates them with entity-level summaries, thus avoiding the need for explicit traversal of edges in the graph during the QA retrieval phase. Complementing this, we introduce Decompose-Resolve (DR), a retrieval mechanism that decomposes user queries into resolvable triples and grounds them in the KG via iterative reasoning. Together, MDER and DR form an LLM-driven QA pipeline that is robust to sparse, incomplete, and complex relational data. Experiments show that on standard and domain specific benchmarks, MDER-DR achieves substantial improvements over standard RAG baselines (up to 66%), while maintaining cross-lingual robustness. Our code is available at https://github.com/DataSciencePolimi/MDER-DR_RAG.

cs.CL cs.AI cs.IR

参考文献 (20)

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng 等

2024 1184 引用 ⭐ 高影响力查看解读 →

PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text

Haitian Sun, Tania Bedrax-Weiss, William W. Cohen

2019 433 引用查看解读 →

WikiQA: A Challenge Dataset for Open-Domain Question Answering

Yi Yang, Wen-tau Yih, Christopher Meek

2015 997 引用

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

H. Trivedi, Niranjan Balasubramanian, Tushar Khot 等

2022 854 引用查看解读 →

How to Mitigate Information Loss in Knowledge Graphs for GraphRAG: Leveraging Triple Context Restoration and Query-Driven Feedback

Manzong Huang, Chenyang Bu, Yi He 等

2025 6 引用查看解读 →

The Web as a Knowledge-Base for Answering Complex Questions

Alon Talmor, Jonathan Berant

2018 733 引用查看解读 →

QA Is the New KR: Question-Answer Pairs as Knowledge Bases

Wenhu Chen, William W. Cohen, Michiel de Jong 等

2022 9 引用查看解读 →

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng 等

2023 7365 引用查看解读 →

A Comprehensive Survey on Automatic Knowledge Graph Construction

Lingfeng Zhong, Jia Wu, Qian Li 等

2023 254 引用查看解读 →

Unifying Large Language Models and Knowledge Graphs: A Roadmap

Shirui Pan, Linhao Luo, Yufei Wang 等

2023 1274 引用查看解读 →

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu 等

2022 6322 引用查看解读 →

ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science

Sai Munikoti, Anurag Acharya, S. Wagle 等

2023 17 引用查看解读 →

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Johannes Welbl, Pontus Stenetorp, Sebastian Riedel

2017 552 引用查看解读 →

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

O. Khattab, Keshav Santhanam, Xiang Lisa Li 等

2022 363 引用查看解读 →

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandara Piktus 等

2020 11984 引用查看解读 →

Knowledge Graphs

Aidan Hogan, E. Blomqvist, Michael Cochez 等

2020 2221 引用查看解读 →

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

Chen Liang, Jonathan Berant, Quoc V. Le 等

2016 423 引用查看解读 →

Reified Input/Output logic: Combining Input/Output logic and Reification to represent norms coming from existing legislation

Livio Robaldo, Xin Sun

2017 44 引用

Fine-tuning Language Models for Triple Extraction with Data Augmentation

Yujia Zhang, Tyler Sadler, Mohammad Reza Taesiri 等

2024 11 引用

A Survey on RAG with LLMs

Muhammad Arslan, Hussam Ghanem, Saba Munawar 等

2024 170 引用

MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

核心发现

方法论

关键结果

研究意义

技术贡献

新颖性

局限性

未来方向

AI 总览摘要

深度解读

原文摘要

参考文献 (20)

相关论文

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Interpretable Semantic Gradients in SSD: A PCA Sweep Approach and a Case Study on AI Discourse

Long-form RewardBench: Evaluating Reward Models for Long-form Generation

HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection