OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

TL;DR

OpenSeeker democratizes frontier search agents by fully open-sourcing training data, utilizing controllable QA synthesis and denoised trajectory synthesis.

cs.AI 🔴 Advanced 2026-03-17 3 citations 131 views

Yuwen Du Rui Ye Shuo Tang Xinyu Zhu Yijun Lu Yuzhu Cai Siheng Chen

AI Reader Arxiv Page Download PDF

open source large language model search agent QA synthesis denoised trajectory

Key Findings

Methodology

OpenSeeker achieves democratization of frontier search agents through two technical innovations: 1) Fact-grounded scalable controllable QA synthesis, which reverse-engineers web graphs via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks with controllable coverage and complexity. 2) Denoised trajectory synthesis, employing a retrospective summarization mechanism to denoise trajectories, thus promoting teacher LLMs to generate high-quality actions.

Key Results

OpenSeeker, trained on only 11.7k synthesized samples in a single run, achieves state-of-the-art performance across multiple benchmarks, including BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. On BrowseComp, OpenSeeker significantly outperforms the second-best open-source agent DeepDive, scoring 29.5% compared to 15.3%.
On the BrowseComp-ZH benchmark, OpenSeeker scores 48.4%, surpassing industrial competitors like Alibaba's Tongyi DeepResearch (46.7%), which is trained via extensive continual pre-training, SFT, and RL.
OpenSeeker fully open-sources the complete training dataset and model weights to democratize frontier search agent research and foster a more transparent, collaborative ecosystem.

Significance

OpenSeeker breaks the monopoly of industrial giants in high-performance search agent development by providing complete training data and model weights. This research offers a powerful tool for academia and the open-source community to develop industrial-grade search agents without extensive resources. It not only democratizes search intelligence but also provides an open and collaborative platform for future research.

Technical Contribution

OpenSeeker's technical contributions lie in its innovative QA synthesis and trajectory denoising methods, which enhance data quality and complexity, enabling the model to excel in complex search tasks. Compared to existing state-of-the-art methods, OpenSeeker offers new theoretical guarantees and engineering possibilities, particularly in data synthesis and denoising techniques.

Novelty

OpenSeeker is the first fully open-source search agent providing complete training data and model weights. Its innovations in QA synthesis and trajectory denoising significantly enhance data quality and complexity, allowing the model to excel in complex search tasks compared to existing work.

Limitations

Due to resource constraints, OpenSeeker's effectiveness is validated in a single training run, limiting its verification on more challenging data.
The current training data volume is relatively small; although high-quality, it may still be insufficient to cover all possible scenarios in some complex tasks.
Due to resource constraints, the English data has not yet been updated to the latest QA standards, resulting in slightly lower difficulty compared to the Chinese data.

Future Work

Future research directions include expanding OpenSeeker's training dataset, exploring different parameters and data filtering strategies to further enhance model performance. Additionally, with increased resources, more training runs can be conducted to validate its effectiveness on more complex data.

AI Executive Summary

In the era of information explosion, obtaining accurate, real-time, and reliable information from the internet has become a fundamental pillar of modern decision-making. However, the development of high-performance search agents has been dominated by industrial giants, primarily due to the lack of transparent, high-quality training data. To break this monopoly, OpenSeeker emerges as the first fully open-source search agent, achieving frontier-level performance by open-sourcing training data and model weights.

OpenSeeker's core technical innovations include fact-grounded scalable controllable QA synthesis and denoised trajectory synthesis. The former reverse-engineers web graphs via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks with controllable coverage and complexity. The latter employs a retrospective summarization mechanism to denoise trajectories, promoting teacher LLMs to generate high-quality actions.

Experimental results demonstrate that OpenSeeker, trained on only 11.7k synthesized samples in a single run, achieves state-of-the-art performance across multiple benchmarks, including BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. Notably, on the BrowseComp-ZH benchmark, OpenSeeker scores 48.4%, surpassing industrial competitors like Alibaba's Tongyi DeepResearch (46.7%).

OpenSeeker's success lies not only in its technical innovations but also in its contribution to the democratization of search intelligence. By providing complete training data and model weights, OpenSeeker offers a powerful tool for academia and the open-source community to develop industrial-grade search agents without extensive resources.

However, due to resource constraints, OpenSeeker's effectiveness is validated in a single training run, limiting its verification on more challenging data. Future research directions include expanding the training dataset, exploring different parameters and data filtering strategies to further enhance model performance.

Deep Analysis

Background

In the field of information retrieval, the capabilities of search agents have significantly improved with the development of large language models (LLMs). However, the development of high-performance search agents has been monopolized by a few industrial giants, primarily due to the lack of transparent and high-quality training data. Existing open-source models, while providing model weights, often lack transparency in training data, or the data quality is insufficient to support complex reasoning tasks. This data scarcity has severely hindered the broader research community's development and innovation in this domain.

Core Problem

The core problem is how to break the monopoly of industrial giants in high-performance search agent development. Specifically, the lack of transparent and high-quality training data is a major bottleneck. This not only limits academic progress in this field but also hinders the open-source community from developing industrial-grade search agents. Therefore, providing a complete open-source solution, including high-quality training data and model weights, is a pressing issue that needs to be addressed.

Innovation

OpenSeeker's core innovations lie in its data synthesis and denoising techniques. First, fact-grounded scalable controllable QA synthesis reverse-engineers web graphs via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks with controllable coverage and complexity. Second, denoised trajectory synthesis employs a retrospective summarization mechanism to denoise trajectories, promoting teacher LLMs to generate high-quality actions. These innovations not only enhance data quality and complexity but also enable the model to excel in complex search tasks.

Methodology

�� Fact-grounded scalable controllable QA synthesis: Reverse-engineers web graphs via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks.
�� Denoised trajectory synthesis: Employs a retrospective summarization mechanism to denoise trajectories, promoting teacher LLMs to generate high-quality actions.
�� Dataset generation: Synthesizes 10.3k English and 1.4k Chinese samples for supervised fine-tuning (SFT).
�� Experimental validation: Performance evaluation on benchmarks such as BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch.

Experiments

The experimental design includes evaluating OpenSeeker's performance on multiple benchmarks, such as BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. The training dataset comprises 10.3k English and 1.4k Chinese samples. The benchmarks primarily assess the model's performance in multi-step navigation and complex information location tasks. The experiments also include performance comparisons with other open-source and closed-source models to validate OpenSeeker's superiority.

Results

Experimental results show that OpenSeeker achieves state-of-the-art performance across multiple benchmarks. On BrowseComp, OpenSeeker significantly outperforms the second-best open-source agent DeepDive, scoring 29.5% compared to 15.3%. On the BrowseComp-ZH benchmark, OpenSeeker scores 48.4%, surpassing industrial competitors like Alibaba's Tongyi DeepResearch (46.7%). These results validate OpenSeeker's innovations in data synthesis and denoising techniques.

Applications

OpenSeeker's application scenarios include academic research and industrial applications. In academic research, OpenSeeker provides a powerful tool for researchers to develop industrial-grade search agents without extensive resources. In industrial applications, OpenSeeker can be used to develop high-performance search engines and information retrieval systems, improving the efficiency and accuracy of information acquisition.

Limitations & Outlook

Despite OpenSeeker's excellent performance across multiple benchmarks, its effectiveness is validated in a single training run due to resource constraints, limiting its verification on more challenging data. Additionally, the current training data volume is relatively small; although high-quality, it may still be insufficient to cover all possible scenarios in some complex tasks. Future research directions include expanding the training dataset and exploring different parameters and data filtering strategies to further enhance model performance.

Plain Language Accessible to non-experts

Imagine you're in a massive library searching for a specific book. This library has countless shelves, each with thousands of books. To find the book you need, you require a very smart assistant who not only knows the location of every book but can also quickly find related information. This is what OpenSeeker does. It's like a super-intelligent library assistant that can quickly find the information you need on the internet. By using complex algorithms and techniques, OpenSeeker can find the most relevant information from vast amounts of data and present it to you in an easy-to-understand way. Just like finding a book in a library, OpenSeeker can help you find the information you need on the internet, no matter where it's hidden.

ELI14 Explained like you're 14

Hey there! Did you know OpenSeeker is like a super-smart internet detective? Imagine you're trying to find some really hard-to-find info online, like hunting for treasure in a giant maze. Regular search engines might get lost, but OpenSeeker has a super brain that can quickly analyze all kinds of info, just like a detective using a magnifying glass to find clues. It breaks down complex problems into smaller ones, solves them one by one like a puzzle, and finally helps you find the answer! Plus, it records all the steps, so next time it faces a similar problem, it can find the answer even faster. Isn't that cool?

Glossary

OpenSeeker

OpenSeeker is a fully open-source search agent providing complete training data and model weights, aiming to achieve frontier-level performance.

In the paper, OpenSeeker is used as an example to demonstrate how to achieve high-performance search agents by open-sourcing training data.

Large Language Model (LLM)

A large language model is an AI model capable of understanding and generating natural language text, typically with billions of parameters.

In OpenSeeker, LLMs are used as teacher models to generate high-quality actions.

Controllable QA Synthesis

Controllable QA synthesis is a method for generating complex, multi-hop reasoning tasks with controllable coverage and complexity.

In OpenSeeker, this method is used to generate training data.

Denoised Trajectory Synthesis

Denoised trajectory synthesis is a method that employs a retrospective summarization mechanism to denoise trajectories, promoting teacher LLMs to generate high-quality actions.

In OpenSeeker, this method is used to enhance data quality and complexity.

BrowseComp

BrowseComp is a benchmark for evaluating model performance in multi-step navigation and complex information location tasks.

In experiments, OpenSeeker performs excellently on BrowseComp.

BrowseComp-ZH

BrowseComp-ZH is a Chinese benchmark for evaluating model performance in multi-step navigation and complex information location tasks.

In experiments, OpenSeeker surpasses industrial competitors on BrowseComp-ZH.

xbench-DeepSearch

xbench-DeepSearch is a benchmark for evaluating model performance in complex deep research capabilities.

In experiments, OpenSeeker performs excellently on xbench-DeepSearch.

WideSearch

WideSearch is a benchmark for evaluating model reliability in broad information seeking across extensive sources.

In experiments, OpenSeeker performs excellently on WideSearch.

Supervised Fine-Tuning (SFT)

Supervised fine-tuning is a training technique that uses labeled data to fine-tune a model, improving its performance on specific tasks.

In OpenSeeker, SFT is used as a training technique.

Entity Obfuscation

Entity obfuscation is a method that increases the difficulty of reasoning tasks by obfuscating entity nodes.

In OpenSeeker, entity obfuscation is used to generate complex QA tasks.

Retrospective Summarization Mechanism

A retrospective summarization mechanism is a method that summarizes tool responses during trajectory generation to remove noise.

In OpenSeeker, this mechanism is used for trajectory denoising.

Topological Expansion

Topological expansion is a method that generates complex reasoning tasks by expanding web graphs.

In OpenSeeker, topological expansion is used to generate training data.

Data Synthesis

Data synthesis is a method that enhances datasets by generating new training samples.

In OpenSeeker, data synthesis is used to generate high-quality training data.

Trajectory Denoising

Trajectory denoising is a method that improves data quality by removing irrelevant information.

In OpenSeeker, trajectory denoising is used to enhance data quality and complexity.

Open Questions Unanswered questions from this research

1 How can OpenSeeker's performance be further improved with limited resources? Although OpenSeeker performs excellently across multiple benchmarks, its effectiveness is validated in a single training run due to resource constraints. Future research needs to explore different parameters and data filtering strategies to further enhance model performance.
2 How can OpenSeeker's training dataset be expanded to cover more complex scenarios? The current training data volume is relatively small; although high-quality, it may still be insufficient to cover all possible scenarios in some complex tasks. Future research needs to expand the training dataset to improve model performance in complex tasks.
3 How can OpenSeeker's training efficiency be improved without increasing resource consumption? The current training process requires substantial computational resources. Future research needs to explore more efficient training methods to reduce resource consumption.
4 How can OpenSeeker's adaptability be improved across different languages and cultural contexts? The current training data mainly focuses on English and Chinese. Future research needs to explore how to improve the model's adaptability across different languages and cultural contexts.
5 How can the diversity of training data be increased while maintaining data quality? The current data synthesis method primarily focuses on data quality. Future research needs to explore how to increase the diversity of training data while maintaining data quality.

Applications

Immediate Applications

Academic Research

OpenSeeker provides a powerful tool for academia, enabling researchers to develop industrial-grade search agents without extensive resources.

Information Retrieval Systems

OpenSeeker can be used to develop high-performance search engines and information retrieval systems, improving the efficiency and accuracy of information acquisition.

Educational Applications

OpenSeeker can be used in education to help students quickly find relevant study materials, enhancing learning efficiency.

Long-term Vision

Intelligent Assistants

OpenSeeker can be used to develop intelligent assistants that help users quickly find the information they need in complex tasks, improving work efficiency.

Cross-Language Search

In the future, OpenSeeker can be used to develop cross-language search systems, helping users quickly find relevant information across different languages and cultural contexts.

Abstract

Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by industrial giants due to a lack of transparent, high-quality training data. This persistent data scarcity has fundamentally hindered the progress of the broader research community in developing and innovating within this domain. To bridge this gap, we introduce OpenSeeker, the first fully open-source search agent (i.e., model and data) that achieves frontier-level performance through two core technical innovations: (1) Fact-grounded scalable controllable QA synthesis, which reverse-engineers the web graph via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks with controllable coverage and complexity. (2) Denoised trajectory synthesis, which employs a retrospective summarization mechanism to denoise the trajectory, therefore promoting the teacher LLMs to generate high-quality actions. Experimental results demonstrate that OpenSeeker, trained (a single training run) on only 11.7k synthesized samples, achieves state-of-the-art performance across multiple benchmarks including BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. Notably, trained with simple SFT, OpenSeeker significantly outperforms the second-best fully open-source agent DeepDive (e.g., 29.5% v.s. 15.3% on BrowseComp), and even surpasses industrial competitors such as Tongyi DeepResearch (trained via extensive continual pre-training, SFT, and RL) on BrowseComp-ZH (48.4% v.s. 46.7%). We fully open-source the complete training dataset and the model weights to democratize frontier search agent research and foster a more transparent, collaborative ecosystem.

cs.AI cs.CL

References (20)

WebSailor: Navigating Super-human Reasoning for Web Agent

Kuan Li, Zhongwang Zhang, Huifeng Yin et al.

2025 126 citations ⭐ Influential View Analysis →

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

Rui Lu, Zhenyu Hou, Zihan Wang et al.

2025 27 citations ⭐ Influential View Analysis →

BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

Peilin Zhou, Bruce Leon, Xiang Ying et al.

2025 75 citations ⭐ Influential View Analysis →

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

Jason Wei, Zhiqing Sun, Spencer Papay et al.

2025 303 citations ⭐ Influential View Analysis →

Tongyi DeepResearch Technical Report

Tongyi Li, Bo Zhang, Dingchu Zhang et al.

2025 10 citations ⭐ Influential View Analysis →

WideSearch: Benchmarking Agentic Broad Info-Seeking

Ryan Wong, Jiawei Wang, Junjie Zhao et al.

2025 30 citations ⭐ Influential View Analysis →

WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking

Zhengwei Tao, Haiyang Shen, Baixuan Li et al.

2025 12 citations ⭐ Influential View Analysis →

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Kuan Li, Zhongwang Zhang, Huifeng Yin et al.

2025 33 citations ⭐ Influential View Analysis →

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Zheng Chu, Xiao Wang, Jack Hong et al.

2026 2 citations ⭐ Influential View Analysis →

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

2025 114 citations View Analysis →

Scaling Agents via Continual Pre-training

Liangcai Su, Zhen Zhang, Guangyu Li et al.

2025 25 citations View Analysis →

GLM-5: from Vibe Coding to Agentic Engineering

GLM-4.5 Team Aohan Zeng, Xin Lv, Zhenyu Hou et al.

2026 11 citations View Analysis →

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

GLM-4.5 Team Aohan Zeng, Xin Lv, Qinkai Zheng et al.

2025 239 citations View Analysis →

AgentFold: Long-Horizon Web Agents with Proactive Context Management

Rui Ye, Zhongwang Zhang, Kuan Li et al.

2025 20 citations View Analysis →

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang et al.

2025 3642 citations View Analysis →

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu et al.

2022 6386 citations View Analysis →

Information Seeking in Electronic Environments

G. Marchionini

1995 1664 citations

Kimi K2.5: Visual Agentic Intelligence

Kimi Team Yifan Bai, Yifan Bai, Yiping Bao et al.

2026 32 citations View Analysis →

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

MiroMind Team, Song Bai, Lidong Bing et al.

2025 22 citations View Analysis →

OpenAI GPT-5 System Card

Aaditya K. Singh, A. Fry, Adam Perelman et al.

2025 133 citations View Analysis →

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

OpenSeeker

Large Language Model (LLM)

Controllable QA Synthesis

Denoised Trajectory Synthesis

BrowseComp

BrowseComp-ZH

xbench-DeepSearch

WideSearch

Supervised Fine-Tuning (SFT)

Entity Obfuscation

Retrospective Summarization Mechanism

Topological Expansion

Data Synthesis

Trajectory Denoising

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Academic Research

Information Retrieval Systems

Educational Applications

Long-term Vision

Intelligent Assistants

Cross-Language Search

Abstract

References (20)

Related Papers

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity

AgentSearchBench: A Benchmark for AI Agent Search in the Wild

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

Large Language Models Exhibit Normative Conformity