MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

TL;DR

MDER-DR framework enhances multi-hop QA with entity-centric summaries, achieving 66% improvement.

cs.CL 🔴 Advanced 2026-03-12 12 views

Riccardo Campi Nicolò Oreste Pinciroli Vago Mathyas Giudici Marco Brambilla Piero Fraternali

Knowledge Graphs Multi-Hop QA Information Retrieval Entity Summaries Cross-Lingual

Key Findings

Methodology

The MDER-DR framework combines the Map-Disambiguate-Enrich-Reduce (MDER) and Decompose-Resolve (DR) methods. MDER generates entity-level summaries during KG construction, avoiding explicit graph traversal during QA retrieval. DR decomposes user queries into resolvable triples and grounds them in the KG through iterative reasoning. This combination makes MDER-DR robust in handling sparse, incomplete, and complex relational data.

Key Results

In standard and domain-specific benchmarks, MDER-DR achieved up to 66% improvement over traditional RAG baselines, particularly in the WikiQA dataset where MDER-DR's Soft EM score was 0.800 compared to 0.538 for the best baseline, Vector-RAG.
In the HotpotQA dataset, MDER-DR excelled in the LLM-as-a-Judge evaluation, scoring 0.515, showing significant improvement over other RAG architectures.
On the BenchEE dataset, MDER-DR received high scores in human-based evaluations by domain experts, demonstrating strong capabilities in handling expert-level content.

Significance

The MDER-DR framework has significant implications for both academia and industry. It addresses long-standing challenges in multi-hop QA over KGs, particularly regarding information loss and complex relational reasoning. By compressing relational information during indexing, MDER-DR not only improves retrieval efficiency but also enhances cross-lingual robustness. This approach opens new possibilities for multilingual and multi-domain QA systems.

Technical Contribution

MDER-DR's technical contributions lie in its compression of multi-hop relational information during indexing, avoiding explicit graph traversal during inference. This approach fundamentally differs from existing SOTA methods, providing new theoretical guarantees and engineering possibilities. By using entity-centric summaries, MDER-DR achieves efficient retrieval and reasoning over complex relations.

Novelty

MDER-DR is the first framework to compress multi-hop relational information into entity summaries during indexing. Compared to existing multi-hop QA methods, it significantly improves efficiency and accuracy by eliminating the need for explicit graph traversal during inference.

Limitations

When dealing with highly complex relational networks, MDER-DR may face issues with overly simplified entity summaries, leading to information loss.
Due to its reliance on large language models, MDER-DR may encounter performance bottlenecks when processing very long texts.
In certain specific domains, MDER-DR may require parameter adjustments to achieve optimal performance.

Future Work

Future research directions include further optimizing MDER-DR's performance in handling large-scale KGs and exploring its applications in more domains and languages. Additionally, integrating other advanced NLP technologies, such as deep learning and graph neural networks, may further enhance its performance.

AI Executive Summary

Knowledge Graphs (KGs) play a crucial role in structuring information, but existing Retrieval-Augmented Generation (RAG) methods often lose important contextual nuances when text is reduced to triples. This information loss is particularly detrimental in multi-hop QA tasks, which require composing answers from multiple entities, facts, or relations.

To address this issue, we propose a KG-based QA framework called MDER-DR, which covers both the indexing and retrieval/inference phases. MDER-DR consists of two main components: Map-Disambiguate-Enrich-Reduce (MDER) and Decompose-Resolve (DR). MDER generates context-derived triple descriptions during KG construction and integrates them with entity-level summaries, avoiding the need for explicit traversal of edges in the graph during QA retrieval. DR decomposes user queries into resolvable triples and grounds them in the KG via iterative reasoning.

MDER-DR performs exceptionally well across multiple multi-hop QA benchmarks, including cross-lingual and domain-specific settings. Experimental results demonstrate consistent improvements over standard RAG baselines, particularly in the WikiQA and HotpotQA datasets. MDER-DR's entity-centric summaries effectively preserve the details needed for exact answer extraction.

However, MDER-DR may face challenges with overly simplified entity summaries when dealing with highly complex relational networks, leading to information loss. Additionally, due to its reliance on large language models, MDER-DR may encounter performance bottlenecks when processing very long texts. Future research directions include further optimizing MDER-DR's performance in handling large-scale KGs and exploring its applications in more domains and languages.

Deep Dive

Abstract

Retrieval-Augmented Generation (RAG) over Knowledge Graphs (KGs) suffers from the fact that indexing approaches may lose important contextual nuance when text is reduced to triples, thereby degrading performance in downstream Question-Answering (QA) tasks, particularly for multi-hop QA, which requires composing answers from multiple entities, facts, or relations. We propose a domain-agnostic, KG-based QA framework that covers both the indexing and retrieval/inference phases. A new indexing approach called Map-Disambiguate-Enrich-Reduce (MDER) generates context-derived triple descriptions and subsequently integrates them with entity-level summaries, thus avoiding the need for explicit traversal of edges in the graph during the QA retrieval phase. Complementing this, we introduce Decompose-Resolve (DR), a retrieval mechanism that decomposes user queries into resolvable triples and grounds them in the KG via iterative reasoning. Together, MDER and DR form an LLM-driven QA pipeline that is robust to sparse, incomplete, and complex relational data. Experiments show that on standard and domain specific benchmarks, MDER-DR achieves substantial improvements over standard RAG baselines (up to 66%), while maintaining cross-lingual robustness. Our code is available at https://github.com/DataSciencePolimi/MDER-DR_RAG.

cs.CL cs.AI cs.IR

References (20)

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng et al.

2024 1184 citations ⭐ Influential View Analysis →

PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text

Haitian Sun, Tania Bedrax-Weiss, William W. Cohen

2019 433 citations View Analysis →

WikiQA: A Challenge Dataset for Open-Domain Question Answering

Yi Yang, Wen-tau Yih, Christopher Meek

2015 997 citations

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

H. Trivedi, Niranjan Balasubramanian, Tushar Khot et al.

2022 854 citations View Analysis →

How to Mitigate Information Loss in Knowledge Graphs for GraphRAG: Leveraging Triple Context Restoration and Query-Driven Feedback

Manzong Huang, Chenyang Bu, Yi He et al.

2025 6 citations View Analysis →

The Web as a Knowledge-Base for Answering Complex Questions

Alon Talmor, Jonathan Berant

2018 733 citations View Analysis →

QA Is the New KR: Question-Answer Pairs as Knowledge Bases

Wenhu Chen, William W. Cohen, Michiel de Jong et al.

2022 9 citations View Analysis →

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng et al.

2023 7365 citations View Analysis →

A Comprehensive Survey on Automatic Knowledge Graph Construction

Lingfeng Zhong, Jia Wu, Qian Li et al.

2023 254 citations View Analysis →

Unifying Large Language Models and Knowledge Graphs: A Roadmap

Shirui Pan, Linhao Luo, Yufei Wang et al.

2023 1274 citations View Analysis →

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu et al.

2022 6322 citations View Analysis →

ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science

Sai Munikoti, Anurag Acharya, S. Wagle et al.

2023 17 citations View Analysis →

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Johannes Welbl, Pontus Stenetorp, Sebastian Riedel

2017 552 citations View Analysis →

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

O. Khattab, Keshav Santhanam, Xiang Lisa Li et al.

2022 363 citations View Analysis →

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandara Piktus et al.

2020 11984 citations View Analysis →

Knowledge Graphs

Aidan Hogan, E. Blomqvist, Michael Cochez et al.

2020 2221 citations View Analysis →

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

Chen Liang, Jonathan Berant, Quoc V. Le et al.

2016 423 citations View Analysis →

Reified Input/Output logic: Combining Input/Output logic and Reification to represent norms coming from existing legislation

Livio Robaldo, Xin Sun

2017 44 citations

Fine-tuning Language Models for Triple Extraction with Data Augmentation

Yujia Zhang, Tyler Sadler, Mohammad Reza Taesiri et al.

2024 11 citations

A Survey on RAG with LLMs

Muhammad Arslan, Hussam Ghanem, Saba Munawar et al.

2024 170 citations

MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Dive

Abstract

References (20)

Related Papers

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Interpretable Semantic Gradients in SSD: A PCA Sweep Approach and a Case Study on AI Discourse

Long-form RewardBench: Evaluating Reward Models for Long-form Generation

HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection