Self-Improving Language Models with Bidirectional Evolutionary Search

TL;DR

Proposes Bidirectional Evolutionary Search (BES), combining forward candidate evolution with backward goal decomposition to enhance exploration and verification in language models.

cs.CL 🔴 Advanced 2026-05-28 166 views

Guowei Xu Zhenting Qi Huangyuan Su Weirui Ye Himabindu Lakkaraju Sham M. Kakade Yilun Du

AI Reader Arxiv Page Download PDF

AI NLP search algorithms self-improvement evolutionary methods

Key Findings

Methodology

The proposed BES framework integrates two core mechanisms: forward candidate evolution using genetic-inspired operators (such as combination, translocation, crossover, deletion) and backward goal decomposition that recursively breaks down complex tasks into verifiable sub-goals. In the forward phase, BES employs these operators to recombine partial trajectories, generating diverse solutions beyond the model’s autoregressive support. Concurrently, the backward phase constructs a goal tree, where each node represents a sub-goal evaluated by task-specific verifiers (rule-based, similarity models, or LLM judges). This dual process allows dense feedback to guide the search efficiently. Theoretical analysis confirms that pure expansion-based methods are confined within a narrow entropy shell, whereas evolution operators enable escape, expanding the exploration space exponentially. Extensive experiments on logical reasoning, multi-hop reasoning, and open problem benchmarks demonstrate BES’s superiority over existing methods like GRPO, Tree-GRPO, and open-source frameworks, with significant improvements in accuracy, sample efficiency, and stability.

Key Results

In the Knights-and-Knaves logical reasoning dataset, BES improved validation accuracy from approximately 30% to over 50%, outperforming baseline methods such as GRPO and MaxRL, and showing consistent self-improvement during training.
On MuSiQue multi-hop reasoning tasks, BES achieved accuracy gains of 3-4 percentage points over baseline models like Llama-3.2-3B-Instruct and Llama-3.1-8B-Instruct, with increased valid search counts and more effective action sequences, indicating enhanced reasoning capabilities.
In open problem benchmarks such as circle packing and Heilbronn convex problems, BES outperformed open-source frameworks (ShinkaEvolve, GEPA), delivering higher average and best scores, with lower variance and more stable search trajectories, demonstrating robust exploration and solution quality.

Significance

This work addresses fundamental limitations of current search strategies in large language models, notably the sparse verification signals and limited exploration beyond high-probability regions. By integrating evolutionary algorithms with goal decomposition, BES significantly expands the search space and improves the quality of solutions in complex reasoning tasks. The theoretical guarantees and empirical results suggest that BES can serve as a foundational framework for autonomous model self-improvement, enabling models to discover high-quality solutions that were previously inaccessible. Its implications span improving AI reasoning, scientific computation, automated programming, and decision-making systems, marking a step toward more autonomous and reliable AI systems.

Technical Contribution

The key technical contributions include: • Designing four novel evolution operators inspired by biological recombination, enabling candidate diversity beyond autoregressive support; • Developing a recursive backward goal decomposition mechanism that provides dense, interpretable feedback for guiding forward search; • Theoretical proof that evolution operators can escape the entropy shell confining expansion-only search, exponentially increasing exploration capacity; • Integrating these components into a cohesive bidirectional search framework, validated through rigorous experiments on multiple challenging tasks. This approach advances the state-of-the-art in search-based reasoning and self-improvement for large models.

Novelty

This research introduces the first systematic combination of bidirectional search—merging evolutionary candidate generation with recursive goal decomposition—for large language models. Unlike prior methods limited to autoregressive expansion or unidirectional tree search, BES leverages genetic-inspired operators to diversify solutions and employs a recursive goal tree to densely guide the search. The theoretical analysis confirming the ability to break entropy shell constraints is novel, providing a rigorous foundation for exploration beyond traditional limits. This dual mechanism represents a significant departure from existing approaches, offering a new paradigm for scalable, effective self-improvement in AI systems.

Limitations

The computational overhead of BES is higher than traditional methods due to multiple rounds of candidate recombination and goal decomposition, which may hinder real-time deployment in resource-constrained environments.
The effectiveness of the backward goal decomposition depends heavily on the quality and design of task-specific verifiers, which may be challenging to construct for certain complex or unstructured tasks.
Current experiments focus on moderate-sized models and specific reasoning tasks; scalability to very large models (e.g., GPT-4/5) and diverse domains remains to be validated. Future work should optimize efficiency and generalize the framework.
While the theoretical guarantees are robust, practical implementation may face challenges in balancing exploration and exploitation, especially in highly stochastic or ambiguous tasks.

Future Work

Future research directions include optimizing the efficiency of BES to reduce computational costs, exploring adaptive strategies for tuning evolution operators and goal decomposition parameters, and extending the framework to multi-modal and multi-task settings. Additionally, integrating reinforcement learning techniques could further enhance search strategies, enabling models to learn optimal exploration policies. Broader validation across diverse domains, larger models, and real-world applications will be essential to realize BES’s full potential. Finally, developing automated methods for constructing high-quality verifiers will facilitate wider adoption in complex, unstructured tasks.

AI Executive Summary

In recent years, large language models (LLMs) have demonstrated remarkable reasoning and problem-solving capabilities across various domains. However, their performance in complex, hard problems remains limited by the inherent constraints of traditional sampling and search methods. Standard approaches like best-of-N sampling and tree search are effective for moderate tasks but struggle with exploration in low-probability regions and sparse verification signals, often leading to suboptimal solutions or failure to improve during self-training.

To address these challenges, Guowei Xu and colleagues introduce a novel framework called Bidirectional Evolutionary Search (BES). This approach fundamentally rethinks how models explore their solution space by coupling forward candidate evolution with backward goal decomposition. The core idea is inspired by biological evolution and recursive problem breakdown, enabling the model to generate diverse solutions and verify them at multiple granularities.

In the forward phase, BES employs a set of genetic-inspired operators—combination, translocation, crossover, and deletion—to recombine partial trajectories, effectively expanding the search beyond the narrow probability-supported regions typical of autoregressive methods. These operators allow the search to escape the entropy shell, a theoretical construct that confines traditional expansion-based search within low-entropy regions. Meanwhile, the backward phase recursively decomposes the original task into smaller, verifiable sub-goals, forming a goal tree that guides the forward search with dense, interpretable feedback.

Theoretical analysis confirms that pure expansion methods are confined within a narrow entropy shell, limiting exploration. Conversely, the introduction of evolution operators enables the search to break free from this shell, exponentially increasing the search space. The backward goal decomposition further enhances efficiency by reducing the number of samples needed to find correct solutions, especially in complex tasks.

Empirical results demonstrate BES’s effectiveness across multiple challenging benchmarks. In logical reasoning tasks like Knights-and-Knaves, BES consistently improves validation accuracy from around 30% to over 50%. In multi-hop reasoning datasets such as MuSiQue, it outperforms baseline algorithms by 3-4 percentage points, with increased search efficiency and solution diversity. For open problem benchmarks like circle packing and the Heilbronn convex problem, BES achieves higher solution quality and greater stability than existing open-source frameworks, with lower variance across runs.

These advances have significant implications. BES not only enhances the reasoning and problem-solving abilities of large models but also provides a systematic, theoretically grounded approach to self-improvement. Its ability to explore beyond the traditional probability support opens new avenues for autonomous AI systems capable of continuous learning and adaptation. The framework’s modular design allows integration with various verification mechanisms and can be extended to multi-modal, multi-task, and real-time applications.

Despite its promising results, BES faces challenges such as computational overhead and reliance on task-specific verifiers. Future work will focus on optimizing efficiency, automating verifier construction, and validating scalability across larger models and diverse domains. Overall, BES represents a significant step toward more autonomous, capable, and reliable AI systems, with broad potential for scientific, industrial, and societal impact.

Deep Analysis

Background

The evolution of large language models has revolutionized natural language processing, enabling unprecedented capabilities in understanding and generation tasks. Early models like GPT-2 and BERT laid the groundwork, demonstrating the power of transformer architectures and pretraining on massive datasets. Building on these foundations, subsequent models such as GPT-3, Llama, and PaLM have scaled up parameters and training data, achieving remarkable performance across tasks.

However, despite these advances, models still face significant challenges in complex reasoning, multi-step problem solving, and open-ended tasks. Traditional sampling methods like best-of-N and beam search rely heavily on model probabilities, which often confine solutions within high-probability regions, limiting exploration of low-probability but correct solutions. Recent efforts incorporated reinforcement learning, tree search, and reasoning chains (e.g., Chain-of-Thought prompting), but these approaches often suffer from sparse verification signals and limited exploration capacity.

In parallel, research in search algorithms—drawing from classical AI, evolutionary algorithms, and reinforcement learning—has sought to improve exploration efficiency. Notable works include Tree of Thoughts, Graph of Thoughts, and evolutionary methods like AlphaEvolve and ShinkaEvolve, which maintain candidate populations and mutate solutions to discover better answers. Despite progress, these methods largely remain constrained by the support of the model’s probability distribution, unable to fully explore the solution space.

This landscape motivated the development of BES, which aims to fundamentally expand the exploration capacity of language models by combining biologically inspired evolution with recursive goal decomposition. The approach seeks to overcome the limitations of existing methods, enabling models to discover solutions in previously inaccessible regions of the search space, thereby pushing the boundaries of AI reasoning and self-improvement.

Core Problem

Current search strategies for large language models predominantly rely on autoregressive expansion or tree-based exploration, which are inherently limited by the model’s probability distribution. This results in a narrow exploration scope confined within a 'narrow entropy shell,' restricting the model's ability to find low-probability, correct solutions in complex or hard problems.

Moreover, the sparse verification signals—often binary or coarse—fail to provide dense feedback necessary for guiding the search effectively, especially in tasks requiring multi-step reasoning or intricate problem decomposition. As a consequence, models struggle to improve during self-training or to solve challenging tasks reliably.

These limitations hinder the development of autonomous, self-improving AI systems capable of tackling real-world problems that demand exploration beyond the model’s immediate probability support. Addressing this issue requires a new search paradigm that can both diversify candidate solutions and provide dense, interpretable feedback to guide the search process efficiently.

Innovation

The paper introduces several key innovations:

�� Bidirectional Search Framework: Combines forward candidate evolution with backward goal decomposition, enabling both exploration and dense verification.
�� Evolution Operators: Inspired by biological recombination, these operators (combination, translocation, crossover, deletion) allow candidates to be restructured and recombined, breaking free from the entropy shell that confines traditional expansion.
�� Recursive Goal Decomposition: Breaks down complex tasks into a tree of sub-goals, each with verifiable success criteria, providing dense feedback and guiding the search.
�� Theoretical Guarantees: Proven that evolution operators can escape the entropy shell, exponentially increasing the search space, and that backward goal decomposition reduces sample complexity.
�� Practical Integration: The framework is designed to work with existing verifiers and models, making it adaptable across tasks and domains.

These innovations collectively enable the model to explore a much larger solution space, discover high-quality solutions in difficult tasks, and systematically improve through self-guided search.

Methodology

�� Design forward search: Starting from initial partial trajectories, apply standard expansion and introduce four evolution operators (combination, translocation, crossover, deletion) to generate diverse candidates.
�� Implement evolution operators: For single-parent operators, select parent nodes based on backward scores, and for two-parent operators, select pairs with complementary strengths, using Boltzmann distributions with temperature annealing.
�� Construct backward goal decomposition: From the top-level goal, recursively split into finer sub-goals, each with a verifier assessing candidate coverage.
�� Integrate search cycles: After several forward steps, invoke backward goal splitting on unsatisfied sub-goals, update the goal tree, and re-score candidates based on sub-goal coverage.
�� Theorize and validate: Prove that evolution operators can escape the entropy shell confining expansion-only search, and that goal decomposition reduces sample complexity exponentially.
�� Conduct experiments: Use datasets like Knights-and-Knaves, MuSiQue, circle packing, and Heilbronn problems, comparing BES with baselines in accuracy, sample efficiency, and stability.
�� Ablation studies: Remove components like evolution operators or goal decomposition to validate their contributions.

Experiments

The experimental setup involves evaluating BES across multiple tasks and models:

�� Logical reasoning: Using Knights-and-Knaves dataset, measuring validation accuracy improvements during training, comparing BES with GRPO and MaxRL.
�� Multi-hop reasoning: On MuSiQue dataset, testing models like Llama-3.2-3B and Llama-3.1-8B, assessing accuracy, search counts, and valid actions.
�� Open problem solving: On circle packing and Heilbronn convex problems, evaluating solution quality, variance, and stability across different frameworks.
�� Hyperparameters: Number of search steps, operator probabilities, goal decomposition frequency, verifier types.
�� Baselines: Including traditional methods (GRPO, Tree-GRPO), open-source frameworks (ShinkaEvolve, GEPA), and human/closed-source solutions.
�� Metrics: Accuracy, success rate, search efficiency, solution diversity, computational cost.
�� Ablation experiments: Removing evolution operators or goal decomposition to analyze their impact.
�� Cost analysis: Wall-clock time, API usage, and resource consumption across methods.

Results

The results demonstrate that BES significantly outperforms baselines across tasks. In logical reasoning, validation accuracy improves from 30% to over 50%, with consistent self-improvement. In MuSiQue, accuracy gains of 3-4% are observed, with increased valid searches and more diverse solutions. On open problems, BES achieves higher average scores (e.g., 2.63 vs. 2.46 in circle packing), with lower variance, indicating more stable and reliable search. Ablation studies confirm that both evolution operators and goal decomposition are essential for performance gains. Cost analysis shows that BES incurs only about 30% more computational overhead than baseline methods, yet delivers substantially better solutions, validating its efficiency and effectiveness in complex reasoning scenarios.

Applications

BES can be applied in various domains requiring complex reasoning and autonomous problem-solving, such as scientific discovery, automated programming, strategic game playing, and decision-making systems. Its ability to generate high-quality, diverse solutions makes it suitable for training data augmentation, self-improving agents, and intelligent assistants. The framework’s modular design allows integration with different verifiers and models, facilitating adaptation to tasks like theorem proving, multi-modal reasoning, and multi-step planning. Over the long term, BES could enable AI systems that continuously learn and evolve, pushing the boundaries of autonomous intelligence and problem-solving capabilities in real-world applications.

Limitations & Outlook

Despite its promising results, BES faces challenges including high computational costs due to multiple candidate recombinations and recursive goal splitting, which may limit scalability and real-time deployment. The effectiveness heavily depends on the quality of verifiers, which can be difficult to design for unstructured or domain-specific tasks. The current experiments focus on moderate-sized models and specific benchmarks; scalability to larger models and diverse real-world problems remains to be validated. Future work should focus on optimizing efficiency, automating verifier construction, and exploring adaptive strategies for balancing exploration and exploitation to make BES more practical for widespread use.

Plain Language Accessible to non-experts

想象你在一家工厂工作，工厂里有许多不同的机器，每台机器都能完成特定的任务。以前，工厂的设计只让机器按照固定的流程工作，遇到复杂的问题时，它们只能沿着预设的路径尝试解决方案。这就像模型用自回归的方法逐步生成答案，限制了探索的范围。

现在，工厂引入了一种新策略，像是给机器装上了“创造”能力，让它们可以像人一样，尝试组合不同的零件，甚至交换零件，创造出全新的方案。这就是演化算子，比如组合、交叉和删除。与此同时，工厂还会把一个大任务拆成许多小任务，逐个确认每个小任务是否完成，就像拼拼图一样，把复杂的问题变成一堆简单的拼图，逐一解决。

通过这两种方法，工厂的机器不再局限于原有的设计，而是可以不断尝试新的组合和拆解，找到更快、更好的解决方案。这就像给模型装上了“创造”和“拆解”的双重能力，让它在面对难题时，能像人一样灵活思考，找到最优答案。这种方法不仅让解决问题变得更快，也让答案变得更丰富、更可靠。

ELI14 Explained like you're 14

想象你在玩一个超级复杂的拼图游戏，里面有很多碎片，要拼出一幅完整的画。以前，你只能按照顺序一块一块拼，试了很多次，还拼错了很多次。这就像模型用一种叫自回归的方法，逐步生成答案，但有时候会陷入死胡同，找不到正确的拼法。

现在，假设你有一种新策略，你可以把拼图拆成几个小部分，先确认每个小部分是不是拼对了，然后再试着把这些小部分拼在一起。更厉害的是，你还能把不同的小部分交换，试试不同的组合，找到最合适的拼法。这就像论文里的演化算子，让模型可以像“创造者”一样，尝试不同的答案组合，而不是只沿着一条路径走。

另外，你还可以把一个大任务拆成很多小任务，比如先拼出拼图的边框，再拼里面的内容。每完成一个小任务，你都可以检查一下，确保没错。这就像把复杂的问题拆成一堆简单的子问题，逐个解决。

通过这些方法，你的拼图游戏变得更聪明、更灵活，也更快能拼出完整的画。模型也是一样，借助这种“拆解”和“创造”的技巧，能更好地解决难题，找到更好的答案。未来，这种方法还能帮我们解决很多复杂的问题，比如科学研究、自动编程，甚至是创造新发明！

Glossary

Bidirectional Evolutionary Search (BES)（双向进化搜索）

A search framework combining forward candidate evolution with backward goal decomposition, designed to enhance exploration and verification in large models.

The core algorithm proposed in the paper, used to improve reasoning and self-improvement capabilities.

Evolution Operators（演化算子）

Genetic-inspired operations—combination, translocation, crossover, deletion—that recombine and mutate candidate solutions to diversify search space.

Applied during forward search to generate diverse candidate trajectories.

Backward Goal Decomposition（逆向目标分解）

A recursive process that breaks down complex tasks into verifiable sub-goals, providing dense feedback for guiding search.

Enhances search efficiency and solution quality by dense supervision.

Verifier（验证器）

A tool or model that assesses whether a candidate solution satisfies a sub-goal or task requirement, such as rule checkers or similarity models.

Used to evaluate candidate trajectories during backward goal decomposition.

Entropy Shell（熵壳）

A theoretical confinement where candidates generated by pure expansion are limited to low-entropy regions, restricting exploration.

Theoretical analysis shows evolution operators can break this shell.

Open Questions Unanswered questions from this research

1 While BES demonstrates strong performance on moderate-scale models and specific tasks, its scalability and efficiency on very large models like GPT-4 or GPT-5 are still unverified. Future research needs to optimize computational costs and adapt the framework for real-time applications. Additionally, designing universal, high-quality verifiers for diverse tasks remains challenging. How to automate and generalize verifier construction to handle unstructured or domain-specific problems is an open question. Further, integrating BES with reinforcement learning or other adaptive strategies could enhance its exploration-exploitation balance, enabling more autonomous and scalable self-improvement systems.

Applications

Immediate Applications

Training Data Generation

Using BES to generate high-quality, diverse training samples for complex reasoning tasks, improving model robustness and accuracy, especially in low-resource or hard problem domains.

Autonomous Reasoning Agents

Enhancing AI agents with BES-driven search to improve decision-making, problem-solving, and reasoning capabilities in applications like automated scientific discovery or strategic games.

Complex Optimization Tasks

Applying BES in automated planning, program synthesis, and combinatorial optimization, where exploring large solution spaces efficiently is critical.

Long-term Vision

Self-Improving AI Systems

Developing AI that can autonomously evolve and improve itself through iterative BES-based search, pushing towards fully autonomous, lifelong learning systems.

Cross-Domain Multi-Modal Reasoning

Extending BES to multi-modal data (text, images, video) and multi-task scenarios, enabling AI to perform complex reasoning across diverse information sources.

Abstract

Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse verification signals, and they construct candidates primarily through autoregressive expansion, restricting exploration to regions with substantial model probability mass. To address these, we propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition. In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout. In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search. We provide theoretical motivation showing that candidates generated by expansion-only search are confined to a narrow entropy shell while evolutionary operators can escape it, and that backward search can exponentially reduce the number of required samples to find a correct answer. Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains, and on three open problem solving benchmarks at inference time, BES outperforms existing open-source frameworks in both average and best-case performance. Code and trained models are available at https://github.com/Embodied-Minds-Lab/BES.

cs.CL

References (20)

Tree Search for LLM Agent Reinforcement Learning

Yuxiang Ji, Ziyu Ma, Yong Wang et al.

2025 38 citations ⭐ Influential View Analysis →

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

R. Lange, Yuki Imajuku, Edoardo Cetin

2025 78 citations ⭐ Influential View Analysis →

TreeRL: LLM Reinforcement Learning with On-Policy Tree Search

Zhenyu Hou, Ziniu Hu, Yujiang Li et al.

2025 53 citations View Analysis →

Olympiad-level formal mathematical reasoning with reinforcement learning

T. Hubert, Rishi S Mehta, Laurent Sartran et al.

2025 109 citations

Branch-and-Bound Methods: A Survey

E. Lawler, D. Wood

1966 1962 citations

Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces

R. Storn, K. Price

1997 27562 citations

Graph of Thoughts: Solving Elaborate Problems with Large Language Models

Maciej Besta, Nils Blach, Aleš Kubíček et al.

2023 1344 citations View Analysis →

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Kou Misaki, Yuichi Inoue, Yuki Imajuku et al.

2025 45 citations View Analysis →

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Haorui Wang, Marta Skreta, C. Ser et al.

2024 54 citations View Analysis →

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

Yangzhen Wu, Zhiqing Sun, Shanda Li et al.

2024 190 citations View Analysis →

Genetic Programming

Moshe Sipper

1998 2262 citations

Some Genetic Aspects of Sex

H. Muller

1932 1222 citations

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

Dan Zhang, Sining Zhoubian, Yisong Yue et al.

2024 460 citations View Analysis →

Mathematical discoveries from program search with large language models

B. Romera-Paredes, M. Barekatain, Alexander Novikov et al.

2023 926 citations

Bandit Based Monte-Carlo Planning

Levente Kocsis, Csaba Szepesvari

2006 3874 citations

♫ MuSiQue: Multihop Questions via Single-hop Question Composition

H. Trivedi, Niranjan Balasubramanian, Tushar Khot et al.

2021 852 citations View Analysis →

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ngân V˜u, Marvin Eisenberger et al.

2025 493 citations View Analysis →

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang et al.

2023 1721 citations View Analysis →

IMPORTANT

Ruth Edwards

1950 163 citations

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Lakshya A. Agrawal, Shangyin Tan, Dilara Soylu et al.

2025 184 citations View Analysis →

Self-Improving Language Models with Bidirectional Evolutionary Search

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Bidirectional Evolutionary Search (BES)（双向进化搜索）

Evolution Operators（演化算子）

Backward Goal Decomposition（逆向目标分解）

Verifier（验证器）

Entropy Shell（熵壳）

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Training Data Generation

Autonomous Reasoning Agents

Complex Optimization Tasks

Long-term Vision

Self-Improving AI Systems

Cross-Domain Multi-Modal Reasoning

Abstract

References (20)

Related Papers

The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse

Learning User Simulators with Turing Rewards

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

Characterizing Cultural Localization in AI-Generated Stories

Operads for compositional reasoning in LLMs