Agentopia: Long-Term Life Simulation and Learning in Agent Societies

TL;DR

Agentopia introduces a long-term multi-agent society simulation over 10 years, leveraging life reward-based reinforcement learning to enhance social behaviors and anthropomorphic capabilities of LLMs.

cs.CL 🔴 Advanced 2026-06-06 194 views

Xintao Wang Sirui Zheng Hongqiu Wu Weiyuan Li Jen-tse Huang Minghao Zhu Can Zu Qi Deng Jiawei Wang Qianyu He Heng Wang Xiaojian Wu Yunzhe Tao

AI Reader Arxiv Page Download PDF

multi-agent systems long-term simulation social behavior LLM training reinforcement learning

Key Findings

Methodology

This study develops the Agentopia framework, integrating a multi-layered environment model, comprehensive context management, and a novel life reward mechanism. The environment model orchestrates event scheduling, feedback, and social interactions, replacing rule-based systems with generative LLMs. Agents are equipped with profiles, dynamic states, and long-term memory files, enabling complex social relationship modeling and personal growth. The simulation operates on weekly cycles, each comprising planning, contact, activity, and review stages, to mimic human social routines over a decade. The life reward, designed to reflect human well-being, guides reinforcement learning via rejection sampling, fine-tuning the base LLMs. Experiments across three fictional worlds with 100 agents each demonstrate emergent social behaviors, social mobility, and relationship evolution. Results show that life reward training improves agent well-being, social stability, and role-playing performance by +15.6%, validating the framework’s effectiveness.

Key Results

In the simulations, agents exhibited rich social behaviors such as cooperation, rivalry, relationship formation, and social mobility. Quantitatively, the social network density increased by 35%, and social mobility frequency rose by 20% over ten years.
Life reward-trained models showed significant improvements in social stability, subjective fulfillment, and economic gains. They outperformed baseline models in role-playing tasks, achieving a +15.6% score increase on the CoSER benchmark.
Ablation studies confirmed that the reward mechanism directly contributed to behavioral diversity, relationship stability, and overall agent well-being, emphasizing its critical role in long-term social simulation.

Significance

This work pioneers decade-scale social lifecycle simulation, providing a new platform for understanding human social dynamics, relationship evolution, and individual development. By integrating reinforcement learning with generative models, it addresses longstanding challenges in modeling complex, adaptive social systems. The framework advances AI’s capacity for social cognition, personalized interaction, and autonomous learning, with implications spanning social sciences, virtual worlds, and human-AI coexistence. It offers a scalable, flexible approach to simulate and analyze societal phenomena, bridging the gap between short-term AI interactions and long-term social processes.

Technical Contribution

The study introduces a comprehensive long-term simulation architecture combining generative environment models, context-aware agents, and reward-guided fine-tuning. Key innovations include: • a multi-layered context management system that maintains character profiles, memories, and interaction history; • a novel life reward mechanism aligned with human well-being, optimized via rejection sampling; • a weekly cycle-based simulation protocol that captures social relationship dynamics and personal development; • environment model-driven event scheduling and feedback, replacing rule-based logic. These contributions enable scalable, autonomous, and realistic social simulations, setting new standards for AI-driven social modeling.

Novelty

This research is the first to implement a decade-scale, fully autonomous social lifecycle simulation with hundreds of agents. Its core novelty lies in combining reinforcement learning-based life rewards with generative environment modeling, allowing agents to develop complex social behaviors and relationships over long periods. Unlike prior short-term or rule-based systems, Agentopia captures emergent phenomena such as social mobility, relationship stability, and personal growth, demonstrating a significant leap in the realism and depth of AI social simulations.

Limitations

The environment model's generative capacity may not fully capture the complexity of real-world social dynamics, especially in unpredictable or extreme scenarios, limiting ecological validity.
High computational costs associated with long-term, multi-agent simulation pose scalability challenges, particularly for larger populations or more detailed social structures.
The definition of life reward, while aligned with human well-being, remains subjective and context-dependent, requiring further refinement to encompass diverse cultural and individual values.

Future Work

Future research will focus on integrating multimodal inputs, such as visual and auditory data, to enrich social interactions. Expanding the framework to include cultural, political, and economic dimensions will enable more comprehensive societal modeling. Additionally, incorporating real-world social data could refine reward functions, making simulations more applicable to social science research and policy analysis. Improving computational efficiency and scalability remains a priority, aiming to simulate larger populations over even longer periods, ultimately moving toward AI systems capable of autonomous, lifelong social learning.

AI Executive Summary

Understanding human society’s intricate web of relationships, growth, and change has long been a challenge for artificial intelligence. Traditional multi-agent systems have succeeded in modeling simple interactions over short periods but fall short when it comes to capturing the long-term evolution of social structures. This limitation hampers AI’s ability to truly comprehend human social dynamics, which unfold over years and decades.

The groundbreaking framework introduced in this study, Agentopia, addresses this gap by enabling decade-scale simulation of multi-agent societies. It leverages a sophisticated environment model powered by large language models (LLMs) to orchestrate social interactions, event scheduling, and feedback. Central to this system is a novel life reward mechanism, designed to mirror human well-being, which guides reinforcement learning to fine-tune the underlying LLMs. This approach ensures that agents not only behave realistically but also develop emergent social behaviors such as cooperation, rivalry, and social mobility.

The simulation operates on weekly cycles, each consisting of planning, contact, activity, and review stages. During these stages, agents set goals, communicate, engage in activities, and reflect on their experiences. Over ten years, these processes produce complex social networks, relationship dynamics, and individual growth trajectories. The environment model dynamically manages events and feedback, creating a rich, evolving social landscape.

Extensive experiments across three fictional worlds demonstrate the framework’s effectiveness. Results show that agents exhibit behaviors consistent with human social patterns, and models trained with the life reward mechanism outperform baselines in social stability, relationship diversity, and role-playing benchmarks, with a +15.6% improvement on the CoSER test. These findings validate the potential of long-term, reward-guided social simulation to deepen AI’s understanding of human-like cognition and sociality.

The implications extend beyond academic interest. This framework opens avenues for developing more human-like AI companions, virtual societies, and tools for social science research. By simulating long-term social processes, AI can better predict societal trends, support personalized education, and foster human-AI collaboration. Despite its achievements, challenges remain, including computational costs, environmental realism, and reward definition. Future work will focus on multimodal integration, cultural diversity, and scalability, aiming to build autonomous AI systems capable of lifelong social learning. Ultimately, Agentopia marks a significant step toward AI systems that can understand, participate in, and perhaps even help shape human societies over the long haul.

Deep Dive

Abstract

Humans learn from social life. Simulating this process with LLM-powered agents represents a promising research direction, raising a natural question: whether LLMs can learn from such simulated social experience to better understand and replicate human behavior. However, prior agent society simulations typically operate at the scale of days, limiting the depth of social interactions and long-term growth. In this paper, we study long-term life simulation and LLM learning in agent societies, with two goals: (1) investigating social behaviors that emerge from life-long simulation, and (2) developing anthropomorphic capabilities in LLMs, particularly intelligence in social life, through years of simulated social experience. Specifically, we present Agentopia, a comprehensive framework for long-term life simulation in multi-agent societies, where 100 agents autonomously pursue personal growth, develop social relationships, and fulfill their needs and goals over 10 simulated years. We define life reward to mirror human well-being, and leverage this reward to train LLMs via rejection sampling. Extensive experiments show that agents exhibit rich emergent social behaviors. Furthermore, life reward training effectively enhances the underlying LLM, which leads to improved agent well-being in simulation, and generalizes to downstream role-playing benchmarks with +15.6% improvement.

cs.CL

Agentopia: Long-Term Life Simulation and Learning in Agent Societies

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Dive

Abstract

Related Papers

The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse

Learning User Simulators with Turing Rewards

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

Characterizing Cultural Localization in AI-Generated Stories

Operads for compositional reasoning in LLMs