OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
OpenSeeker democratizes frontier search agents by fully open-sourcing training data, utilizing controllable QA synthesis and denoised trajectory synthesis.
Key Findings
Methodology
OpenSeeker achieves democratization of frontier search agents through two technical innovations: 1) Fact-grounded scalable controllable QA synthesis, which reverse-engineers web graphs via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks with controllable coverage and complexity. 2) Denoised trajectory synthesis, employing a retrospective summarization mechanism to denoise trajectories, thus promoting teacher LLMs to generate high-quality actions.
Key Results
- OpenSeeker, trained on only 11.7k synthesized samples in a single run, achieves state-of-the-art performance across multiple benchmarks, including BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. On BrowseComp, OpenSeeker significantly outperforms the second-best open-source agent DeepDive, scoring 29.5% compared to 15.3%.
- On the BrowseComp-ZH benchmark, OpenSeeker scores 48.4%, surpassing industrial competitors like Alibaba's Tongyi DeepResearch (46.7%), which is trained via extensive continual pre-training, SFT, and RL.
- OpenSeeker fully open-sources the complete training dataset and model weights to democratize frontier search agent research and foster a more transparent, collaborative ecosystem.
Significance
OpenSeeker breaks the monopoly of industrial giants in high-performance search agent development by providing complete training data and model weights. This research offers a powerful tool for academia and the open-source community to develop industrial-grade search agents without extensive resources. It not only democratizes search intelligence but also provides an open and collaborative platform for future research.
Technical Contribution
OpenSeeker's technical contributions lie in its innovative QA synthesis and trajectory denoising methods, which enhance data quality and complexity, enabling the model to excel in complex search tasks. Compared to existing state-of-the-art methods, OpenSeeker offers new theoretical guarantees and engineering possibilities, particularly in data synthesis and denoising techniques.
Novelty
OpenSeeker is the first fully open-source search agent providing complete training data and model weights. Its innovations in QA synthesis and trajectory denoising significantly enhance data quality and complexity, allowing the model to excel in complex search tasks compared to existing work.
Limitations
- Due to resource constraints, OpenSeeker's effectiveness is validated in a single training run, limiting its verification on more challenging data.
- The current training data volume is relatively small; although high-quality, it may still be insufficient to cover all possible scenarios in some complex tasks.
- Due to resource constraints, the English data has not yet been updated to the latest QA standards, resulting in slightly lower difficulty compared to the Chinese data.
Future Work
Future research directions include expanding OpenSeeker's training dataset, exploring different parameters and data filtering strategies to further enhance model performance. Additionally, with increased resources, more training runs can be conducted to validate its effectiveness on more complex data.
AI Executive Summary
In the era of information explosion, obtaining accurate, real-time, and reliable information from the internet has become a fundamental pillar of modern decision-making. However, the development of high-performance search agents has been dominated by industrial giants, primarily due to the lack of transparent, high-quality training data. To break this monopoly, OpenSeeker emerges as the first fully open-source search agent, achieving frontier-level performance by open-sourcing training data and model weights.
OpenSeeker's core technical innovations include fact-grounded scalable controllable QA synthesis and denoised trajectory synthesis. The former reverse-engineers web graphs via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks with controllable coverage and complexity. The latter employs a retrospective summarization mechanism to denoise trajectories, promoting teacher LLMs to generate high-quality actions.
Experimental results demonstrate that OpenSeeker, trained on only 11.7k synthesized samples in a single run, achieves state-of-the-art performance across multiple benchmarks, including BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. Notably, on the BrowseComp-ZH benchmark, OpenSeeker scores 48.4%, surpassing industrial competitors like Alibaba's Tongyi DeepResearch (46.7%).
OpenSeeker's success lies not only in its technical innovations but also in its contribution to the democratization of search intelligence. By providing complete training data and model weights, OpenSeeker offers a powerful tool for academia and the open-source community to develop industrial-grade search agents without extensive resources.
However, due to resource constraints, OpenSeeker's effectiveness is validated in a single training run, limiting its verification on more challenging data. Future research directions include expanding the training dataset, exploring different parameters and data filtering strategies to further enhance model performance.
Deep Analysis
Background
In the field of information retrieval, the capabilities of search agents have significantly improved with the development of large language models (LLMs). However, the development of high-performance search agents has been monopolized by a few industrial giants, primarily due to the lack of transparent and high-quality training data. Existing open-source models, while providing model weights, often lack transparency in training data, or the data quality is insufficient to support complex reasoning tasks. This data scarcity has severely hindered the broader research community's development and innovation in this domain.
Core Problem
The core problem is how to break the monopoly of industrial giants in high-performance search agent development. Specifically, the lack of transparent and high-quality training data is a major bottleneck. This not only limits academic progress in this field but also hinders the open-source community from developing industrial-grade search agents. Therefore, providing a complete open-source solution, including high-quality training data and model weights, is a pressing issue that needs to be addressed.
Innovation
OpenSeeker's core innovations lie in its data synthesis and denoising techniques. First, fact-grounded scalable controllable QA synthesis reverse-engineers web graphs via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks with controllable coverage and complexity. Second, denoised trajectory synthesis employs a retrospective summarization mechanism to denoise trajectories, promoting teacher LLMs to generate high-quality actions. These innovations not only enhance data quality and complexity but also enable the model to excel in complex search tasks.
Methodology
- οΏ½οΏ½ Fact-grounded scalable controllable QA synthesis: Reverse-engineers web graphs via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks.
- οΏ½οΏ½ Denoised trajectory synthesis: Employs a retrospective summarization mechanism to denoise trajectories, promoting teacher LLMs to generate high-quality actions.
- οΏ½οΏ½ Dataset generation: Synthesizes 10.3k English and 1.4k Chinese samples for supervised fine-tuning (SFT).
- οΏ½οΏ½ Experimental validation: Performance evaluation on benchmarks such as BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch.
Experiments
The experimental design includes evaluating OpenSeeker's performance on multiple benchmarks, such as BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. The training dataset comprises 10.3k English and 1.4k Chinese samples. The benchmarks primarily assess the model's performance in multi-step navigation and complex information location tasks. The experiments also include performance comparisons with other open-source and closed-source models to validate OpenSeeker's superiority.
Results
Experimental results show that OpenSeeker achieves state-of-the-art performance across multiple benchmarks. On BrowseComp, OpenSeeker significantly outperforms the second-best open-source agent DeepDive, scoring 29.5% compared to 15.3%. On the BrowseComp-ZH benchmark, OpenSeeker scores 48.4%, surpassing industrial competitors like Alibaba's Tongyi DeepResearch (46.7%). These results validate OpenSeeker's innovations in data synthesis and denoising techniques.
Applications
OpenSeeker's application scenarios include academic research and industrial applications. In academic research, OpenSeeker provides a powerful tool for researchers to develop industrial-grade search agents without extensive resources. In industrial applications, OpenSeeker can be used to develop high-performance search engines and information retrieval systems, improving the efficiency and accuracy of information acquisition.
Limitations & Outlook
Despite OpenSeeker's excellent performance across multiple benchmarks, its effectiveness is validated in a single training run due to resource constraints, limiting its verification on more challenging data. Additionally, the current training data volume is relatively small; although high-quality, it may still be insufficient to cover all possible scenarios in some complex tasks. Future research directions include expanding the training dataset and exploring different parameters and data filtering strategies to further enhance model performance.
Plain Language Accessible to non-experts
Imagine you're in a massive library searching for a specific book. This library has countless shelves, each with thousands of books. To find the book you need, you require a very smart assistant who not only knows the location of every book but can also quickly find related information. This is what OpenSeeker does. It's like a super-intelligent library assistant that can quickly find the information you need on the internet. By using complex algorithms and techniques, OpenSeeker can find the most relevant information from vast amounts of data and present it to you in an easy-to-understand way. Just like finding a book in a library, OpenSeeker can help you find the information you need on the internet, no matter where it's hidden.
ELI14 Explained like you're 14
Hey there! Did you know OpenSeeker is like a super-smart internet detective? Imagine you're trying to find some really hard-to-find info online, like hunting for treasure in a giant maze. Regular search engines might get lost, but OpenSeeker has a super brain that can quickly analyze all kinds of info, just like a detective using a magnifying glass to find clues. It breaks down complex problems into smaller ones, solves them one by one like a puzzle, and finally helps you find the answer! Plus, it records all the steps, so next time it faces a similar problem, it can find the answer even faster. Isn't that cool?
Glossary
OpenSeeker
OpenSeeker is a fully open-source search agent providing complete training data and model weights, aiming to achieve frontier-level performance.
In the paper, OpenSeeker is used as an example to demonstrate how to achieve high-performance search agents by open-sourcing training data.
Large Language Model (LLM)
A large language model is an AI model capable of understanding and generating natural language text, typically with billions of parameters.
In OpenSeeker, LLMs are used as teacher models to generate high-quality actions.
Controllable QA Synthesis
Controllable QA synthesis is a method for generating complex, multi-hop reasoning tasks with controllable coverage and complexity.
In OpenSeeker, this method is used to generate training data.
Denoised Trajectory Synthesis
Denoised trajectory synthesis is a method that employs a retrospective summarization mechanism to denoise trajectories, promoting teacher LLMs to generate high-quality actions.
In OpenSeeker, this method is used to enhance data quality and complexity.
BrowseComp
BrowseComp is a benchmark for evaluating model performance in multi-step navigation and complex information location tasks.
In experiments, OpenSeeker performs excellently on BrowseComp.
BrowseComp-ZH
BrowseComp-ZH is a Chinese benchmark for evaluating model performance in multi-step navigation and complex information location tasks.
In experiments, OpenSeeker surpasses industrial competitors on BrowseComp-ZH.
xbench-DeepSearch
xbench-DeepSearch is a benchmark for evaluating model performance in complex deep research capabilities.
In experiments, OpenSeeker performs excellently on xbench-DeepSearch.
WideSearch
WideSearch is a benchmark for evaluating model reliability in broad information seeking across extensive sources.
In experiments, OpenSeeker performs excellently on WideSearch.
Supervised Fine-Tuning (SFT)
Supervised fine-tuning is a training technique that uses labeled data to fine-tune a model, improving its performance on specific tasks.
In OpenSeeker, SFT is used as a training technique.
Entity Obfuscation
Entity obfuscation is a method that increases the difficulty of reasoning tasks by obfuscating entity nodes.
In OpenSeeker, entity obfuscation is used to generate complex QA tasks.
Retrospective Summarization Mechanism
A retrospective summarization mechanism is a method that summarizes tool responses during trajectory generation to remove noise.
In OpenSeeker, this mechanism is used for trajectory denoising.
Topological Expansion
Topological expansion is a method that generates complex reasoning tasks by expanding web graphs.
In OpenSeeker, topological expansion is used to generate training data.
Data Synthesis
Data synthesis is a method that enhances datasets by generating new training samples.
In OpenSeeker, data synthesis is used to generate high-quality training data.
Trajectory Denoising
Trajectory denoising is a method that improves data quality by removing irrelevant information.
In OpenSeeker, trajectory denoising is used to enhance data quality and complexity.
Open Questions Unanswered questions from this research
- 1 How can OpenSeeker's performance be further improved with limited resources? Although OpenSeeker performs excellently across multiple benchmarks, its effectiveness is validated in a single training run due to resource constraints. Future research needs to explore different parameters and data filtering strategies to further enhance model performance.
- 2 How can OpenSeeker's training dataset be expanded to cover more complex scenarios? The current training data volume is relatively small; although high-quality, it may still be insufficient to cover all possible scenarios in some complex tasks. Future research needs to expand the training dataset to improve model performance in complex tasks.
- 3 How can OpenSeeker's training efficiency be improved without increasing resource consumption? The current training process requires substantial computational resources. Future research needs to explore more efficient training methods to reduce resource consumption.
- 4 How can OpenSeeker's adaptability be improved across different languages and cultural contexts? The current training data mainly focuses on English and Chinese. Future research needs to explore how to improve the model's adaptability across different languages and cultural contexts.
- 5 How can the diversity of training data be increased while maintaining data quality? The current data synthesis method primarily focuses on data quality. Future research needs to explore how to increase the diversity of training data while maintaining data quality.
Applications
Immediate Applications
Academic Research
OpenSeeker provides a powerful tool for academia, enabling researchers to develop industrial-grade search agents without extensive resources.
Information Retrieval Systems
OpenSeeker can be used to develop high-performance search engines and information retrieval systems, improving the efficiency and accuracy of information acquisition.
Educational Applications
OpenSeeker can be used in education to help students quickly find relevant study materials, enhancing learning efficiency.
Long-term Vision
Intelligent Assistants
OpenSeeker can be used to develop intelligent assistants that help users quickly find the information they need in complex tasks, improving work efficiency.
Cross-Language Search
In the future, OpenSeeker can be used to develop cross-language search systems, helping users quickly find relevant information across different languages and cultural contexts.
Abstract
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by industrial giants due to a lack of transparent, high-quality training data. This persistent data scarcity has fundamentally hindered the progress of the broader research community in developing and innovating within this domain. To bridge this gap, we introduce OpenSeeker, the first fully open-source search agent (i.e., model and data) that achieves frontier-level performance through two core technical innovations: (1) Fact-grounded scalable controllable QA synthesis, which reverse-engineers the web graph via topological expansion and entity obfuscation to generate complex, multi-hop reasoning tasks with controllable coverage and complexity. (2) Denoised trajectory synthesis, which employs a retrospective summarization mechanism to denoise the trajectory, therefore promoting the teacher LLMs to generate high-quality actions. Experimental results demonstrate that OpenSeeker, trained (a single training run) on only 11.7k synthesized samples, achieves state-of-the-art performance across multiple benchmarks including BrowseComp, BrowseComp-ZH, xbench-DeepSearch, and WideSearch. Notably, trained with simple SFT, OpenSeeker significantly outperforms the second-best fully open-source agent DeepDive (e.g., 29.5% v.s. 15.3% on BrowseComp), and even surpasses industrial competitors such as Tongyi DeepResearch (trained via extensive continual pre-training, SFT, and RL) on BrowseComp-ZH (48.4% v.s. 46.7%). We fully open-source the complete training dataset and the model weights to democratize frontier search agent research and foster a more transparent, collaborative ecosystem.
References (20)
WebSailor: Navigating Super-human Reasoning for Web Agent
Kuan Li, Zhongwang Zhang, Huifeng Yin et al.
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
Rui Lu, Zhenyu Hou, Zihan Wang et al.
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese
Peilin Zhou, Bruce Leon, Xiang Ying et al.
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents
Jason Wei, Zhiqing Sun, Spencer Papay et al.
Tongyi DeepResearch Technical Report
Tongyi Li, Bo Zhang, Dingchu Zhang et al.
WideSearch: Benchmarking Agentic Broad Info-Seeking
Ryan Wong, Jiawei Wang, Junjie Zhao et al.
WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking
Zhengwei Tao, Haiyang Shen, Baixuan Li et al.
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Kuan Li, Zhongwang Zhang, Huifeng Yin et al.
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents
Zheng Chu, Xiao Wang, Jack Hong et al.
WebDancer: Towards Autonomous Information Seeking Agency
Jialong Wu, Baixuan Li, Runnan Fang et al.
Scaling Agents via Continual Pre-training
Liangcai Su, Zhen Zhang, Guangyu Li et al.
GLM-5: from Vibe Coding to Agentic Engineering
GLM-4.5 Team Aohan Zeng, Xin Lv, Zhenyu Hou et al.
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
GLM-4.5 Team Aohan Zeng, Xin Lv, Qinkai Zheng et al.
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Rui Ye, Zhongwang Zhang, Kuan Li et al.
Qwen3 Technical Report
An Yang, Anfeng Li, Baosong Yang et al.
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu et al.
Information Seeking in Electronic Environments
G. Marchionini
Kimi K2.5: Visual Agentic Intelligence
Kimi Team Yifan Bai, Yifan Bai, Yiping Bao et al.
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
MiroMind Team, Song Bai, Lidong Bing et al.
OpenAI GPT-5 System Card
Aaditya K. Singh, A. Fry, Adam Perelman et al.