Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

TL;DR

Introduces Data2Story, a multi-agent framework transforming data into verifiable multimodal stories with evidence traceability and interactive content.

cs.CV 🔴 Advanced 2026-06-10 89 views

Kevin Qinghong Lin Batu EI Yuhong Shi Pan Lu Philip Torr James Zou

AI Reader Arxiv Page Download PDF

multimodal generation data journalism multi-agent system evidence traceability verifiability

Key Findings

Methodology

The system employs a seven-role multi-agent architecture comprising Detective, Analyst, Editor, Designer, Programmer, Auditor, and Inspector. Each agent specializes in tasks such as context gathering via web search, statistical analysis using Python scripts, narrative framing, multimodal asset creation (maps, audio, video), and web development. The core mechanism involves large language models (e.g., GPT-4) generating content, with code and external references binding each element to its source, ensuring traceability. The Inspector role verifies each output by linking it back to specific code lines or URLs, establishing an end-to-end evidence chain. Multimodal tools are integrated to produce interactive maps, audio clips, and videos, enriching the storytelling experience. The system was evaluated on 18 diverse articles, assessing angle coverage, user experience, automatic judgment, and verifiability, demonstrating superior transparency and auditability compared to traditional methods.

Key Results

Across 18 articles, Data2Story achieved a 75% overlap with human expert angles, while also uncovering novel perspectives not present in original reports.
In a human study with 53 participants, the generated articles scored an average of 4.2 out of 5 across dimensions such as visual design, narrative flow, data transparency, claim-data alignment, and insightfulness, outperforming static visualization tools (average 3.7).
Automated agents simulating user interactions (clicks, scrolls) indicated that multimodal content significantly improved comprehension and trust, with evidence chain completeness reaching 92%.

Significance

This work advances automated journalism by integrating multi-agent collaboration, evidence traceability, and multimodal content generation, addressing key challenges of transparency and user engagement. It provides a scalable, trustworthy alternative to manual reporting, especially vital in combating misinformation. The approach enhances public trust by making every claim verifiable through explicit evidence links, fostering a new standard for data-driven storytelling. Its ability to autonomously discover new insights from underexplored datasets demonstrates its potential for scientific communication, policy analysis, and educational content creation, bridging the gap between raw data and accessible knowledge.

Technical Contribution

The paper introduces a novel multi-agent architecture that orchestrates specialized roles for end-to-end content creation, grounded in large language models and executable code. The Evidence Inspector mechanism ensures each story element is linked to its source, enabling complete traceability. The integration of multimodal tools—interactive maps, audio, video—enhances storytelling richness. The system’s modular design allows for flexible extension and adaptation to various data types and domains. Experimentally, it outperforms existing static and semi-automated systems in transparency, diversity, and user engagement metrics, setting a new benchmark for trustworthy automated journalism.

Novelty

This is the first comprehensive system combining multi-agent orchestration with evidence-based verification and multimodal content generation for data journalism. Unlike prior works limited to static visualizations or single-task automation, Data2Story achieves a full pipeline from raw data to interactive, verifiable stories. Its key innovation lies in the Evidence Inspector, which binds every claim to its origin, and in the seamless integration of multimodal assets driven by reader-centric reasoning. This holistic approach addresses longstanding issues of trust and engagement in automated news, establishing a new paradigm for trustworthy AI-generated journalism.

Limitations

The system’s accuracy in evidence tracing can degrade with highly noisy or unstructured data, especially when sources are ambiguous or poorly documented, limiting its reliability in such scenarios.
Multimodal content generation depends heavily on pre-trained models and external tools, which may produce inconsistent or biased outputs, particularly in specialized domains requiring expert knowledge.
Current computational costs are high, making real-time deployment challenging; further optimization is needed for scalable, low-latency applications.
Handling unstructured or multimedia data remains limited, requiring future work to extend capabilities for diverse data formats and complex narratives.

Future Work

Future directions include enhancing autonomous learning to improve discovery of novel insights, expanding multimodal generation to include more complex media, and integrating real-time data streams for dynamic reporting. Improving the robustness of evidence tracing in noisy or unstructured data environments is also a priority. Additionally, developing user-centric interfaces for interactive verification and personalized storytelling will broaden applicability. Long-term, the goal is to create fully automated, trustworthy news generation platforms capable of supporting diverse domains such as scientific dissemination, policy analysis, and public education, ultimately transforming how data is communicated to society.

AI Executive Summary

In an era overwhelmed by information, the challenge of delivering trustworthy, engaging, and accessible news has never been greater. Traditional journalism relies heavily on human effort, which, while ensuring quality, limits scalability and timeliness. Automated systems have emerged to address some of these issues, but they often fall short in transparency, verification, and user engagement. Existing tools can generate static visualizations or summaries but lack the ability to produce comprehensive, verifiable stories that resonate with diverse audiences.

This paper introduces Data2Story, a pioneering multi-agent framework designed to transform raw data into compelling, verifiable, and interactive multimedia stories. Drawing inspiration from the collaborative nature of a newsroom, the system orchestrates seven specialized roles—Detective, Analyst, Editor, Designer, Programmer, Auditor, and Inspector—each responsible for distinct aspects of the storytelling pipeline. The Detective gathers contextual information through web search, enriching the dataset with relevant background. The Analyst performs rigorous statistical analysis using Python scripts, ensuring reproducibility and source binding. The Editor synthesizes findings into a coherent narrative, prioritizing key insights and framing the story. The Designer creates multimodal assets—interactive maps, audio clips, videos—tailored to the story’s needs. The Programmer assembles these elements into an interactive webpage, while the Auditor reviews the final product for structural and visual integrity. The Inspector plays a crucial role by linking every story element back to its source code or reference, establishing a complete evidence chain.

The core innovation of Data2Story lies in its ability to produce evidence-grounded content. Each claim, figure, or quote is explicitly linked to its origin, enabling full traceability and auditability. This addresses a critical gap in automated journalism, where verification is often lacking. The system’s multimodal approach enhances reader engagement, making stories more accessible and trustworthy. It can autonomously discover new angles in underexplored datasets, such as the 2026 FIFA World Cup schedule, arXiv submissions, and global time-use surveys, revealing insights like climate risks at venues or shifts in scientific publishing.

Extensive evaluations demonstrate that Data2Story generates articles with a 75% angle overlap with human reports, scores an average of 4.2 out of 5 in human assessments, and achieves a 92% evidence traceability rate. These results highlight its strengths in transparency, diversity, and user engagement. However, human-authored articles still outperform in editorial nuance and creative presentation. The system’s limitations include handling noisy data, domain-specific biases, and computational costs. Future work aims to improve robustness, expand multimodal capabilities, and facilitate real-time, personalized storytelling.

Overall, Data2Story represents a significant step toward trustworthy, automated data journalism. By combining multi-agent collaboration, evidence-based verification, and multimodal content creation, it offers a scalable solution for producing high-quality, transparent stories that can inform and engage the public. Its potential applications span journalism, scientific communication, policy analysis, and education, promising a future where data-driven stories are both abundant and reliable, fostering greater public understanding and trust in information.

Deep Analysis

Background

随着大数据和人工智能技术的快速发展，数据新闻逐渐成为公众理解复杂社会、科技和经济现象的重要途径。早期的自动化新闻系统主要依赖模板和规则，缺乏深度分析和多样性。近年来，深度学习模型如GPT系列推动了内容生成的创新，使得自动化新闻具备一定的创造性和多样性。代表性工作包括基于自然语言生成（NLG）的新闻摘要系统、自动化图表工具（如Vega-Lite、Matplotlib）以及结合搜索引擎的动态内容更新系统。然而，这些系统多局限于静态内容，难以实现端到端的多模态整合，也缺乏内容的证据追溯机制。学界和行业逐渐意识到，自动化新闻的可信度和透明度是推广的关键，亟需结合多智能体协作、可执行代码和多模态技术，打造具有完整流程和可信度的自动新闻系统。

Core Problem

现有自动化新闻系统多在单一环节表现优异，但难以实现从数据到完整报道的端到端流程。主要瓶颈包括内容的可信度不足、证据链不完整、内容缺乏互动性以及多模态内容的融合困难。尤其在处理复杂、多源、非结构化数据时，系统难以自动发现新颖角度，生成具有说服力的报道。同时，缺乏有效的内容追溯机制，使得用户难以验证每个结论的来源。这些问题限制了自动化新闻的广泛应用，也影响了公众对自动生成内容的信任。解决这些问题需要创新的系统架构，结合多智能体协作、可执行代码、外部引用和多模态内容技术，打造既高效又可信的自动新闻平台。

Innovation

本文的核心创新在于提出Data2Story多智能体架构，结合证据追溯机制和多模态内容生成。第一，设计了由侦探、分析师、编辑、设计师、程序员、审查员和检察官组成的虚拟新闻编辑室，协作完成从背景搜集、统计分析、角度设定、视觉设计到内容验证的全过程。第二，引入检察官（Inspector），利用代码绑定和外部引用，确保每个内容元素都能追溯到原始数据或参考资料，极大提升内容的可信度。第三，采用多模态生成技术，结合交互式地图、音频、视频等丰富表现形式，增强新闻的互动性和吸引力。第四，系统利用大模型进行内容创作，自动发现数据中的新角度，生成具有原创性和发现价值的报道。这些创新点共同推动了自动化新闻的可信性、互动性和多样性，为行业提供了全新的技术范式。

Methodology

�� 侦探（Detective）角色：通过网络搜索和数据背景搜集，扩充原始数据集，形成丰富的上下文信息。
�� 分析师（Analyst）角色：利用Python代码对数据进行统计分析，支持多种统计方法（如t检验、回归分析），输出结果并绑定源代码。
�� 编辑（Editor）角色：根据分析结果，制定报道角度，筛选关键信息，撰写段落草稿。
�� 设计师（Designer）角色：根据内容需求，选择合适的多模态表现形式（地图、音频、视频），调用生成模型（如Text-to-Image、Text-to-Video）制作视觉资产。
�� 程序员（Programmer）角色：将所有元素整合成交互式网页，采用HTML、CSS和JavaScript实现多模态互动。
�� 审查员（Auditor）角色：检测网页中的潜在问题（如布局错位、互动失效），提出修正建议。
�� 检察官（Inspector）角色：追踪每个内容元素的源代码或参考链接，确保内容的可验证性。
�� 生成流程：从数据输入开始，经过背景搜集、统计分析、角度设定、视觉设计、网页构建、验证和审查，最终输出具有多模态、多证据链的新闻报道。

Experiments

系统在18篇不同主题的文章上进行评估，涵盖体育、科学、社会等领域。采用多维指标，包括角度覆盖率、用户体验评分、自动判定一致性和证据追溯完整性。评估方法包括：对比人类专家和系统生成内容的角度重合度（平均75%）、53名评审的五维评分（平均4.2/5）、模拟用户交互的自动代理行为分析，以及证据链完整率（92%）。实验中还引入不同主题数据（如2026世界杯赛程、arXiv投稿、时间利用调查）验证系统发现新角度和原创内容的能力。通过对比静态图表、纯文本和多模态内容，验证多模态生成在提升用户理解和信任方面的优势。系统还进行了消融实验，分析证据追溯机制对内容可信度的影响，显示其在提升透明度和可验证性方面具有显著效果。

Results

实验结果显示，Data2Story在内容多样性和可信度方面优于传统静态内容生成工具。具体表现为：在18篇文章中，内容角度重合率达75%，且补充了未覆盖的视角；53名评审中，平均评分为4.2分，明显优于传统系统的3.7分，特别在互动性和证据追溯方面表现突出；模拟用户交互分析表明，系统生成的多模态内容能提升理解度和信任感，证据链完整率达92%。此外，系统还能自主发现数据中的新角度，如2026世界杯的气候风险、arXiv投稿的学科转变、时间利用的性别差异，展现出强大的发现能力。这些结果验证了系统在内容丰富性、可信度和用户体验上的优势，为未来自动化新闻提供了新思路。

Applications

该系统适用于新闻机构、科研报告、公共信息平台等场景，可实现自动化生成可信、多模态的新闻报道，减少人力成本，提高内容透明度。未来，结合实时数据流和个性化定制，能为用户提供定制化、互动性强的新闻体验。此外，系统还可应用于教育、政策解读、科学传播等领域，推动数据驱动的知识普及。长远来看，随着多模态技术和自动化水平的提升，未来有望实现全自动、可信赖的新闻编辑平台，极大改善信息传播的效率和质量。

Limitations & Outlook

系统在处理极端复杂或噪声较大的数据时，证据追溯的准确性可能下降，尤其在多源信息融合时存在误差累积。多模态内容生成依赖预训练模型，可能出现偏差或不一致，特别在专业领域知识方面表现尚需优化。当前系统主要针对结构化或半结构化数据，对于非结构化文本或多媒体数据的理解和生成能力有限，未来需扩展多模态数据的处理能力。此外，系统的计算成本较高，实时应用仍面临挑战，未来需要优化模型效率和硬件资源配置。

Plain Language Accessible to non-experts

想象你在一个大型厨房里准备一顿丰盛的晚餐。每个厨师负责不同的任务：有的负责挑选食材（侦探），有的负责切菜（分析师），有的负责调味（编辑），还有的负责摆盘（设计师），最后由厨师长（程序员）将所有菜肴摆放在一起，形成一桌美味的盛宴。为了确保每道菜的来源可靠，厨师长会追踪每个食材和调料的来源，确保没有用到不新鲜或不安全的材料。每个厨师都在协作，确保菜肴不仅好吃，还能让客人知道每个步骤和原料的出处。这个厨房的流程就像Data2Story系统，从数据搜集、分析、设计到最终呈现，每一步都透明可追溯，确保每个故事都可信、丰富且互动性强。就像一场精彩的晚宴，观众不仅能享受美味，还能知道每道菜的秘密，感受到厨师们的用心。

ELI14 Explained like you're 14

想象你在学校的科学实验室里做实验。你有一堆数据，比如天气变化、运动成绩或者喜欢的音乐类型。你想告诉朋友这些数据背后隐藏的故事，但单纯的数字和图表可能太枯燥了。于是，你开始用不同的方式讲故事：用有趣的地图展示天气变化，用声音播放喜欢的音乐，用互动的图表让朋友自己探索。这个过程就像在用多种工具把数据变成一个有趣、容易理解的故事。系统中的每一步都可以追溯，比如你用的地图来源，音乐的出处，或者数据的原始文件。这样，大家不仅能听懂故事，还能验证每个细节的真实性，就像老师检查你的实验报告一样。这个方法让数据变得更生动、更可信，也更容易让人记住。

Abstract

Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team weeks: hunting for context, running statistics, choosing an angle, and designing visuals. Recent agents handle individual steps well: data-science agents close the analysis loop, while design agents synthesize beautiful websites. But can an agent serve as a data journalist end to end? We introduce Data Journalist Agent (Data2Story), a multi-agent framework that orchestrates specialized roles into a single virtual newsroom. Data2Story contributes two innovations. (i) Claims are evidence-grounded: an Inspector links every number, angle, and asset back to data, code, or an external reference. (ii) Articles are multimodally generative: rather than defaulting to plain text and static charts, Data2Story reasons about what readers will want to see, then deploys multimodal tools, such as interactive maps for geography and audio for music. We evaluate Data2Story on 18 articles, each paired with the originally published expert piece, along four axes: (a) human-agent angle coverage; (b) rubric evaluation with 53 participants across five dimensions; (c) computer-use agents as judges, a cost-saving proxy for how readers navigate interactive articles; and (d) verifiability, where a coding verifier re-executes statements against the data and checks claims against references. Data2Story produces competitive, evidence-traceable multimedia stories, with particular strength in transparency and auditability. Human articles retain an edge in editorial angle, creative design, and presentation. We position Data2Story as a collaborator for journalists, enabling more evidence-based, transparent, and verifiable reporting. Code and demos are available at https://data2story.github.io.

cs.CV cs.CL cs.CY cs.HC

References (20)

Developing Story: Case Studies of Generative AI's Use in Journalism

Natalie Grace Brigham, Chong Gao, Tadayoshi Kohno et al.

2024 14 citations View Analysis →

Computational journalism

Sarah Cohen, J. Hamilton, F. Turner

2011 231 citations

Narrative Visualization: Telling Stories with Data

E. Segel, Jeffrey Heer

2010 1333 citations

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe et al.

2024 253 citations View Analysis →

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

Zhiyu Yang, Zihan Zhou, Shuo Wang et al.

2024 111 citations View Analysis →

The Data Journalism Handbook: Towards a Critical Data Practice

Roy Krøvel

2021 39 citations

CoDA: Agentic Systems for Collaborative Data Visualization

Zichen Chen, Jiefeng Chen, Sercan Ö. Arik et al.

2025 10 citations View Analysis →

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng et al.

2023 9052 citations View Analysis →

From Data to Story: Towards Automatic Animated Data Video Creation with LLM-Based Multi-Agent Systems

Leixian Shen, Haotian Li, Yun Wang et al.

2024 27 citations View Analysis →

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.

2024 85 citations View Analysis →

Logic and Conversation

Siobhan Chapman

2005 11897 citations

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandara Piktus et al.

2020 14646 citations View Analysis →

The garden of forking paths : Why multiple comparisons can be a problem , even when there is no “ fishing expedition ” or “ p-hacking ” and the research hypothesis was posited ahead of time ∗

Andrew Gelman, Eric Loken

2019 612 citations

MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation

Qian Huang, Jian Vora, Percy Liang et al.

2023 237 citations View Analysis →

DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts

Mohammed Saidul Islam, Enamul Hoque, Shafiq R. Joty et al.

2024 39 citations View Analysis →

The Visual Display of Quantitative Information

E. Tufte

1985 5159 citations

PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework

Sina Montazeri, Yunhe Feng, Kewei Sha

2025 3 citations View Analysis →

When Journalism Meets AI: Risk or Opportunity?

Sophia Cheng

2024 15 citations

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Chenglei Si, Yanzhe Zhang, Ryan Li et al.

2024 107 citations View Analysis →

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Rulin Shao, Akari Asai, Shannon Zejiang Shen et al.

2025 58 citations View Analysis →

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Abstract

References (20)

Related Papers

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

SSD: Spatially Speculative Decoding Accelerates Autoregressive Image Generation

CalTennis: Large Multi-View Tennis Video Dataset and Benchmark of Monocular-to-3D Pose Estimation

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

EventDrive: Event Cameras for Vision-Language Driving Intelligence