Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval

TL;DR

Using structured linked data as a memory layer improves RAG system retrieval accuracy by 29.6% in standard RAG and 29.8% in agentic pipeline.

cs.IR 🔴 Advanced 2026-03-11 13 views
Andrea Volpini Elie Raad Beatrice Gamba David Riccitelli
structured data knowledge graph RAG system retrieval enhancement AI agent

Key Findings

Methodology

This paper proposes a method using structured linked data as a memory layer to enhance retrieval accuracy and answer quality in Retrieval-Augmented Generation (RAG) systems. The study employs Schema.org markup and dereferenceable entity pages, combined with Vertex AI Vector Search 2.0 and Google Agent Development Kit (ADK) for experiments. The experimental design tests seven conditions, covering three document representations and two retrieval modes, plus an Enhanced+ condition.

Key Results

  • Result 1: In standard RAG systems, using an enhanced entity page format (including llms.txt-style agent instructions, breadcrumbs, and neural search capabilities) improved retrieval accuracy by 29.6%.
  • Result 2: In the full agentic pipeline, the enhanced entity page format improved retrieval accuracy by 29.8%, demonstrating the advantage of structured data in multi-hop link traversal.
  • Result 3: The Enhanced+ variant achieved the highest absolute scores in accuracy and completeness (accuracy: 4.85/5, completeness: 4.55/5), though the incremental gain over the base enhanced format is not statistically significant.

Significance

This study demonstrates that structured linked data can significantly enhance the performance of RAG systems, particularly in complex retrieval tasks requiring multi-source information integration. By leveraging Schema.org markup and knowledge graphs, this method offers a new perspective for information retrieval and generation, potentially impacting academia and industry, especially in high-precision and high-completeness applications.

Technical Contribution

The technical contribution of this paper lies in proposing a novel method of using structured linked data as a memory layer, significantly improving retrieval accuracy and answer quality compared to existing RAG systems. By introducing an enhanced entity page format, the paper shows how to leverage knowledge graphs for multi-hop link traversal without constructing graphs, offering new engineering possibilities.

Novelty

This paper is the first to use structured linked data as a memory layer in RAG systems, significantly improving retrieval performance. Compared to existing RAG systems, this method effectively utilizes structured data from knowledge graphs through an enhanced entity page format, providing a new way of information integration.

Limitations

  • Limitation 1: Although the Enhanced+ variant achieves the best absolute scores, the incremental gain over the base enhanced format is not statistically significant, indicating that further navigational affordances may not always lead to significant performance improvements.
  • Limitation 2: The experimental results may be influenced by specific domain datasets, with more pronounced effects in domains rich in knowledge graph information.
  • Limitation 3: Due to the complexity of the experimental design, the reproducibility of results may be limited, especially across different knowledge graphs and data platforms.

Future Work

Future research directions include exploring the application of structured linked data as a memory layer in broader domains, further optimizing the enhanced entity page format, and developing more efficient multi-hop link traversal algorithms. Additionally, research could expand to other types of structured data and knowledge graphs to verify the generalizability of this method.

AI Executive Summary

In the field of information retrieval, Retrieval-Augmented Generation (RAG) systems have become the dominant architecture, yet most systems treat documents as unstructured text, ignoring the rich structured metadata and linked relationships that knowledge graphs provide. This paper proposes a method using structured linked data as a memory layer to enhance retrieval accuracy and answer quality.

By employing Schema.org markup and dereferenceable entity pages, combined with Vertex AI Vector Search 2.0 and Google Agent Development Kit (ADK), the paper conducts experiments across four domains (editorial, legal, travel, e-commerce). The experimental design includes seven conditions, covering three document representations and two retrieval modes, plus an Enhanced+ condition.

The results show that while JSON-LD markup alone provides only modest improvements, the enhanced entity page format significantly improves retrieval accuracy: +29.6% for standard RAG and +29.8% for the full agentic pipeline. The Enhanced+ variant achieves the highest absolute scores in accuracy and completeness, though the incremental gain over the base enhanced format is not statistically significant.

This method offers a new perspective for information retrieval and generation, potentially impacting academia and industry, especially in high-precision and high-completeness applications. By leveraging structured linked data, the paper demonstrates how to use knowledge graphs for multi-hop link traversal without constructing graphs, offering new engineering possibilities.

Future research directions include exploring the application of structured linked data as a memory layer in broader domains, further optimizing the enhanced entity page format, and developing more efficient multi-hop link traversal algorithms. Additionally, research could expand to other types of structured data and knowledge graphs to verify the generalizability of this method.

Deep Analysis

Background

With the rise of generative AI, the way users access information has fundamentally changed. Search engines increasingly augment traditional results with AI-generated summaries, a paradigm exemplified by Google's AI Mode, which retrieves, reasons over, and synthesizes information from multiple web sources. Understanding and optimizing for this new retrieval paradigm is critical for content creators, marketers, and organizations that depend on search visibility.


Retrieval-Augmented Generation (RAG) has emerged as the dominant architecture for grounding large language model (LLM) outputs in factual, up-to-date information. However, most RAG implementations treat documents as unstructured text, discarding the rich structured metadata that many websites already provide via Schema.org markup and knowledge graph representations. This paper investigates whether structured linked data can improve RAG system performance and proposes a novel method to leverage this data.

Core Problem

The prevalent issue with current RAG systems is that they treat documents as flat text, ignoring the structured metadata and linked relationships provided by knowledge graphs. This approach leads to inadequate retrieval accuracy and completeness, especially in complex retrieval tasks requiring multi-source information integration. Addressing this problem is crucial for enhancing RAG system performance, particularly in applications requiring high precision and completeness.

Innovation

The core innovation of this paper is the proposal of using structured linked data as a memory layer in RAG systems. Specifically, the paper effectively utilizes structured data from knowledge graphs through an enhanced entity page format, which includes llms.txt-style agent instructions, breadcrumbs, and neural search capabilities. This method significantly improves retrieval accuracy and answer quality compared to existing RAG systems, providing a new way of information integration.

Methodology

The methodology of this paper includes the following key steps:


  • �� Use Schema.org markup and dereferenceable entity pages as sources of structured linked data.
  • �� Combine Vertex AI Vector Search 2.0 for retrieval with Google Agent Development Kit (ADK) for agentic reasoning.
  • �� The experimental design tests seven conditions, covering three document representations (plain HTML, HTML with JSON-LD, and an enhanced agentic-optimized entity page) and two retrieval modes (standard RAG and agentic RAG with multi-hop link traversal), plus an Enhanced+ condition.
  • �� Experiments are conducted across four domains (editorial, legal, travel, e-commerce) to validate the method's effectiveness.

Experiments

The experimental design covers four domains (editorial, legal, travel, e-commerce), using Vertex AI Vector Search 2.0 for retrieval and combining with Google Agent Development Kit (ADK) for agentic reasoning. The experiments include seven conditions, covering three document representations and two retrieval modes, plus an Enhanced+ condition. The dataset includes 2,443 individual query evaluations, ensuring the reliability and reproducibility of the results.

Results

The experimental results show that the enhanced entity page format significantly improves retrieval accuracy: +29.6% for standard RAG and +29.8% for the full agentic pipeline. The Enhanced+ variant achieves the highest absolute scores in accuracy and completeness (accuracy: 4.85/5, completeness: 4.55/5), though the incremental gain over the base enhanced format is not statistically significant. This demonstrates the advantage of structured data in multi-hop link traversal.

Applications

This method can be directly applied to high-precision and high-completeness applications such as legal document retrieval, travel information integration, and e-commerce product recommendation. By leveraging structured linked data, the method can significantly enhance retrieval accuracy and completeness, especially in complex retrieval tasks requiring multi-source information integration.

Limitations & Outlook

Despite the excellent performance of this method in experiments, there are some limitations. First, although the Enhanced+ variant achieves the best absolute scores, the incremental gain over the base enhanced format is not statistically significant, indicating that further navigational affordances may not always lead to significant performance improvements. Additionally, the experimental results may be influenced by specific domain datasets, with more pronounced effects in domains rich in knowledge graph information. Finally, due to the complexity of the experimental design, the reproducibility of results may be limited, especially across different knowledge graphs and data platforms.

Plain Language Accessible to non-experts

Imagine searching for a specific book in a large library. Traditional RAG systems are like a librarian who relies solely on the book title catalog, only able to find books by their titles without leveraging the relationships between them. The method proposed in this paper is like a super librarian who has access to a complete bibliography and related information, able to find not only the book by its title but also related books and materials.

This method uses structured linked data, allowing the RAG system to act like the super librarian, utilizing the relationships between books to improve retrieval accuracy and completeness. In this way, the system can better integrate multi-source information, providing more comprehensive and accurate answers.

This method is particularly suitable for complex tasks requiring the integration of large amounts of information, such as legal document retrieval, travel information integration, and e-commerce product recommendation. By leveraging structured linked data, the system can better understand and integrate information, enhancing retrieval accuracy and completeness.

In summary, this method allows RAG systems to act like a super librarian, utilizing the relationships between books to improve retrieval accuracy and completeness. This approach offers a new perspective for information retrieval and generation, potentially impacting academia and industry.

ELI14 Explained like you're 14

Hey there! Imagine you're in a huge library trying to find a book about dinosaurs. A regular librarian can only help you find the book by its title, but if they're a super librarian, they can use the connections between books to find even more related books and information!

That's what this paper's method does! By using structured linked data, the system acts like that super librarian, using the connections between books to improve retrieval accuracy and completeness. This way, you get more comprehensive and accurate answers!

This method is especially useful for complex tasks that need a lot of information, like legal document retrieval, travel information integration, and e-commerce product recommendation. By using structured linked data, the system can better understand and integrate information, enhancing retrieval accuracy and completeness.

So, next time you're in a library looking for a book, think of this super librarian! They're the superheroes of information retrieval!

Glossary

RAG System (Retrieval-Augmented Generation)

An architecture combining information retrieval and generation to enhance the accuracy and completeness of generated results using retrieved information.

The paper explores how to use structured linked data to improve RAG system performance.

Structured Linked Data

Data represented through Schema.org markup and dereferenceable entity pages, providing rich metadata and linked relationships.

The paper uses structured linked data as a memory layer in RAG systems.

Schema.org Markup

A markup format used to embed structured data in web pages, helping search engines better understand page content.

The paper uses Schema.org markup as a source of structured linked data.

Knowledge Graph

A graph structure representing entities and their relationships, widely used in information retrieval and semantic understanding.

The paper investigates how to leverage structured data from knowledge graphs to improve RAG system performance.

Vertex AI Vector Search 2.0

An AI-native search engine designed for efficient information retrieval, combining dense semantic search and sparse keyword search.

The paper uses Vertex AI Vector Search 2.0 for information retrieval.

Google Agent Development Kit (ADK)

A production framework for building multi-tool agents, supporting complex multi-step reasoning and tool use.

The paper combines ADK for agentic reasoning.

Enhanced Entity Page Format

An optimized document representation format including llms.txt-style agent instructions, breadcrumbs, and neural search capabilities.

The paper proposes an enhanced entity page format to improve retrieval performance.

Multi-hop Link Traversal

A retrieval method integrating information through multiple link hops, mimicking the behavior of AI-powered search systems.

The paper explores the application of multi-hop link traversal in RAG systems.

llms.txt-style Agent Instructions

Instruction format providing explicit guidance for LLM agents, helping them better understand and use structured data.

The enhanced entity page format in the paper includes llms.txt-style agent instructions.

Breadcrumb Navigation

A navigation tool helping users understand their position within a website's structure, often used to enhance user experience.

The enhanced entity page format in the paper includes breadcrumb navigation.

Open Questions Unanswered questions from this research

  • 1 How can structured linked data as a memory layer be applied in broader domains? Current research focuses on specific domains, and future exploration is needed for its potential in other areas.
  • 2 How can the enhanced entity page format be further optimized? Although the current format significantly improves retrieval performance, there is room for improvement, especially in navigational features.
  • 3 How can more efficient multi-hop link traversal algorithms be developed? Existing algorithms may be inefficient in some cases, requiring further optimization to improve retrieval efficiency.
  • 4 How can the generalizability of this method be verified across different knowledge graphs and data platforms? Current experimental results may be influenced by specific domain datasets, requiring validation on broader platforms.
  • 5 How can the reproducibility issue caused by the complexity of experimental design be addressed? More standardized experimental frameworks are needed to improve the reproducibility of results.

Applications

Immediate Applications

Legal Document Retrieval

By leveraging structured linked data, the system can more accurately retrieve and integrate legal documents, enhancing retrieval accuracy and completeness.

Travel Information Integration

Using structured linked data, the system can better integrate travel information, providing users with more comprehensive and accurate travel recommendations.

E-commerce Product Recommendation

By leveraging structured linked data, the system can more accurately recommend e-commerce products, enhancing the shopping experience for users.

Long-term Vision

Cross-domain Information Integration

By further optimizing the application of structured linked data, the system can achieve information integration across broader domains, providing more comprehensive solutions.

Intelligent Search Engines

By leveraging structured linked data, future search engines can more intelligently understand and integrate information, providing users with more accurate and comprehensive search results.

Abstract

Retrieval-Augmented Generation (RAG) systems typically treat documents as flat text, ignoring the structured metadata and linked relationships that knowledge graphs provide. In this paper, we investigate whether structured linked data, specifically Schema.org markup and dereferenceable entity pages served by a Linked Data Platform, can improve retrieval accuracy and answer quality in both standard and agentic RAG systems. We conduct a controlled experiment across four domains (editorial, legal, travel, e-commerce) using Vertex AI Vector Search 2.0 for retrieval and the Google Agent Development Kit (ADK) for agentic reasoning. Our experimental design tests seven conditions: three document representations (plain HTML, HTML with JSON-LD, and an enhanced agentic-optimized entity page) crossed with two retrieval modes (standard RAG and agentic RAG with multi-hop link traversal), plus an Enhanced+ condition that adds rich navigational affordances and entity interlinking. Our results reveal that while JSON-LD markup alone provides only modest improvements, our enhanced entity page format, incorporating llms.txt-style agent instructions, breadcrumbs, and neural search capabilities, achieves substantial gains: +29.6% accuracy improvement for standard RAG and +29.8% for the full agentic pipeline. The Enhanced+ variant, with richer navigational affordances, achieves the highest absolute scores (accuracy: 4.85/5, completeness: 4.55/5), though the incremental gain over the base enhanced format is not statistically significant. We release our dataset, evaluation framework, and enhanced entity page templates to support reproducibility.

cs.IR cs.AI

References (18)

REALM: Retrieval-Augmented Language Model Pre-Training

Kelvin Guu, Kenton Lee, Zora Tung et al.

2020 2776 citations View Analysis →

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

H. Trivedi, Niranjan Balasubramanian, Tushar Khot et al.

2022 852 citations View Analysis →

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Akari Asai, Zeqiu Wu, Yizhong Wang et al.

2023 1542 citations View Analysis →

WordLift: Meaningful Navigation Systems and Content Recommendation for News Sites running WordPress

A. Volpini, David Riccitelli

2015 3 citations

Schema.org: Evolution of Structured Data on the Web

R. Guha, D. Brickley, Steve Macbeth

2015 442 citations

Linked Data - The Story So Far

Christian Bizer, T. Heath, T. Berners-Lee

2009 5768 citations

Unifying Large Language Models and Knowledge Graphs: A Roadmap

Shirui Pan, Linhao Luo, Yufei Wang et al.

2023 1271 citations View Analysis →

Graph Retrieval-Augmented Generation: A Survey

Boci Peng, Yun Zhu, Yongchao Liu et al.

2024 325 citations View Analysis →

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu et al.

2022 6302 citations View Analysis →

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann et al.

2021 1534 citations View Analysis →

MICO - Media in Context

P. Aichroth, Christian Weigel, T. Kurz et al.

2015 8 citations

The Semantic Web

G. Goos, J. Hartmanis, J. Leeuwen et al.

2011 6167 citations

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandara Piktus et al.

2020 11965 citations View Analysis →

Multi-hop Question Answering

Vaibhav Mavi, Anubhav Jangra, A. Jatowt

2022 71 citations View Analysis →

Toolformer: Language Models Can Teach Themselves to Use Tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì et al.

2023 3132 citations View Analysis →

Knowledge Graphs

Aidan Hogan, E. Blomqvist, Michael Cochez et al.

2020 2217 citations View Analysis →

GEO: Generative Engine Optimization

Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit et al.

2023 24 citations View Analysis →

LightRAG: Simple and Fast Retrieval-Augmented Generation

Zirui Guo, Lianghao Xia, Yanhua Yu et al.

2024 211 citations View Analysis →