Rethinking Memory as Continuously Evolving Connectivity

TL;DR

FluxMem models memory as a dynamically evolving heterogeneous graph with three stages, achieving state-of-the-art results in complex reasoning and web navigation tasks.

cs.CL 🔴 Advanced 2026-05-28 207 views

Jizhan Fang Buqiang Xu Zhixian Wang Haoliang Cao Xinle Deng Baohua Dong Hangcheng Zhu Ruohui Huang Gang Yu Ying Wei Guozhou Zheng Feiyu Xiong Haofen Wang Huajun Chen Ningyu Zhang

AI Reader Arxiv Page Download PDF

memory augmentation graph structure self-evolution large language models dynamic connectivity

Key Findings

Methodology

FluxMem employs a three-stage memory evolution framework, representing memory as a heterogeneous graph comprising semantic, episodic, and procedural layers. The first stage establishes initial connections by fusing relevance scores from dense embeddings, lexical matching, and LLM verification. The second stage leverages environmental feedback to dynamically refine the graph, adding missing links, pruning interference, and adjusting abstraction levels. The third stage clusters successful trajectories to induce reusable skills, monitored by the Procedure Evolution Maturity Score (PEMS). This online real-time process is complemented by offline long-term consolidation, forming a self-optimizing memory substrate. Key algorithms include relevance-based connection retrieval, feedback-driven link correction, trajectory clustering, and skill induction, all guided by the PEMS metric.

Key Results

On the LoCoMo long-context reasoning benchmark, FluxMem achieves a LMJ score of 95.06, surpassing the baseline of 81.23 and all comparison models, demonstrating superior reasoning and memory adaptation.
In the Mind2Web web navigation task, success rate improves from 52.12 to 73.6 in a realistic setting without manual filtering, outperforming AWM (56.10) and MemoryOS (59.81), indicating strong generalization in real-world scenarios.
On the GAIA general assistant benchmark, success rate increases from 52.12 to 73.6 across multi-task, multi-website, and multi-domain settings, outperforming MemEvolve and Flash-Searcher, showcasing excellent cross-task transferability.

Significance

This work advances the field by transforming static memory storage into a continuously evolving connectivity network, significantly enhancing the adaptability and autonomous learning of large language models in dynamic environments. It provides a theoretical and practical foundation for long-term knowledge accumulation, reasoning, and self-organization, paving the way for intelligent agents capable of lifelong learning and complex decision-making.

Technical Contribution

The paper introduces FluxMem's three-stage memory evolution mechanism, innovatively modeling memory as a heterogeneous graph with feedback-driven connection refinement and skill induction. The Procedure Evolution Maturity Score (PEMS) offers a dynamic measure of memory stability and maturity, enabling self-optimization. This approach differs fundamentally from static or semi-dynamic methods by enabling continuous structural adaptation, leading to improved generalization and reasoning capabilities. The framework integrates relevance-based retrieval, feedback correction, trajectory clustering, and skill induction into a unified, scalable system, opening new avenues for self-evolving AI systems.

Novelty

This research is the first to model memory as a continuously evolving heterogeneous graph with a three-stage process involving initial connection formation, feedback-driven refinement, and long-term consolidation. Unlike prior static or semi-dynamic memory systems, FluxMem emphasizes dynamic connection adjustments guided by environmental feedback, enabling persistent structural evolution. The integration of trajectory clustering and skill induction, monitored by PEMS, provides a novel mechanism for long-term memory stabilization and reuse, representing a significant leap forward in autonomous, self-organizing memory architectures.

Limitations

The current system incurs high computational costs due to iterative feedback and connection updates, which may limit real-time deployment in resource-constrained environments.
Experiments are primarily conducted on static datasets; the performance in continuous, streaming environments with active memory decay remains to be validated.
Parameter sensitivity (e.g., number of refinement rounds T, convergence threshold ϵ) requires careful tuning for different tasks, lacking a fully adaptive mechanism.

Future Work

Future research will focus on optimizing online update efficiency, reducing computational overhead, and enabling real-time adaptive parameter tuning. Additionally, integrating multi-modal data (visual, auditory) and reinforcement learning could further enhance the system’s autonomous learning and robustness. Extending the framework to handle streaming data and active memory decay mechanisms will be crucial for deploying lifelong learning agents in real-world, dynamic environments.

AI Executive Summary

In the rapidly evolving field of artificial intelligence, the ability for models to remember, adapt, and learn continuously remains a fundamental challenge. Traditional memory-augmented models rely heavily on static storage structures, with fixed representations and rigid retrieval pipelines. While effective in controlled settings, these approaches struggle in real-world environments characterized by feedback loops, task variability, and heterogeneous signals. Static memory systems often suffer from brittle connections, irrelevant information retrieval, and inflexibility in updating stored knowledge, limiting their capacity for long-term autonomous learning.

Addressing these limitations, this paper introduces FluxMem, a novel memory framework that models memory as a dynamically evolving heterogeneous graph. This graph comprises three layers: semantic knowledge, episodic experiences, and procedural skills. The core innovation lies in a three-stage evolution process—initial connection formation, feedback-driven refinement, and long-term consolidation—that continuously adapts the memory topology based on environmental feedback and task success. During execution, the system repairs missing links, prunes irrelevant associations, and aligns abstraction levels, effectively transforming static storage into a self-organizing, self-optimizing memory network.

The first stage establishes initial connections by integrating relevance scores from dense embeddings, lexical matching, and LLM-based verification, ensuring a comprehensive initial memory structure. The second stage employs a feedback loop, where environmental signals guide the addition or removal of links, refining the memory graph to better support current tasks. This dynamic process allows the system to correct inaccuracies, reduce noise, and improve retrieval precision. The third stage involves offline clustering of successful trajectories, extracting common patterns into reusable procedural skills, which are then monitored by the Procedure Evolution Maturity Score (PEMS). This score provides a quantitative measure of the memory’s structural maturity, guiding further evolution and stabilization.

Extensive experiments across three benchmark datasets—LoCoMo, Mind2Web, and GAIA—demonstrate the effectiveness of FluxMem. In long-context reasoning tasks, it achieves a LMJ score of 95.06, outperforming all baselines. In web navigation, success rates improve significantly, with a jump from 52.12 to 73.6, validating its robustness in real-world scenarios. In general assistant tasks, it surpasses state-of-the-art models, confirming its strong transferability and adaptability. Ablation studies reveal that feedback-driven refinement (Stage II) is crucial for accuracy, while long-term consolidation (Stage III) enhances performance in complex multi-step tasks.

This work offers a new paradigm for memory in AI systems—moving away from static repositories towards a self-evolving, structurally adaptive network. Its implications extend to autonomous lifelong learning, knowledge management, and complex reasoning, potentially transforming how intelligent agents acquire, organize, and utilize knowledge over extended periods. Despite current limitations related to computational costs and parameter sensitivity, the framework sets a solid foundation for future research into scalable, resource-efficient, and truly autonomous self-evolving AI systems. Overall, FluxMem represents a significant step toward realizing intelligent agents capable of continuous, self-directed growth in complex, dynamic environments.

Deep Analysis

Background

The evolution of memory systems in AI has transitioned from simple static storage to more structured and dynamic architectures. Early models like Memory Networks (Weston et al., 2014) and Neural Turing Machines (Graves et al., 2014) attempted to mimic human-like long-term memory but faced limitations in flexibility and scalability. Recent advances introduced hierarchical and graph-based memory structures (Han et al., 2022; Long et al., 2023), aiming to improve connection richness and adaptability. Despite these efforts, most existing systems rely on fixed, hand-crafted pipelines, which hinder their ability to dynamically reconfigure based on environmental feedback. The need for a memory mechanism that can self-organize, evolve, and consolidate knowledge over time remains unmet. This paper builds upon these foundations, proposing a self-evolving, graph-based memory system that continuously refines its topology through feedback, inspired by cognitive science principles of structural and functional plasticity.

Core Problem

Current memory-augmented models often treat memory as a static repository, which leads to brittle performance in dynamic environments. The fixed representations and retrieval pipelines cannot adapt to feedback, task variations, or heterogeneous signals, resulting in issues like under-connection, irrelevant retrieval, and inflexibility in updating memory units. These problems severely limit the ability of models to perform long-term, autonomous learning and reasoning. Moreover, existing methods lack mechanisms for long-term consolidation, preventing the formation of stable, structural memory regions. Addressing these issues requires a fundamentally new approach that models memory as a flexible, self-organizing network capable of continuous evolution based on environmental cues.

Innovation

FluxMem introduces a three-stage memory evolution framework that models memory as a heterogeneous graph with semantic, episodic, and procedural layers. The first stage establishes initial connections by integrating relevance scores from dense embeddings, lexical matching, and LLM verification. The second stage employs a feedback loop, where environmental signals guide the addition or removal of links, refining the graph structure to better support current tasks. The third stage clusters successful trajectories to induce reusable skills, monitored by the PEMS metric, which quantifies the structural maturity of memory nodes. This approach enables continuous, self-guided adaptation of memory topology, overcoming the rigidity of static systems. The integration of relevance-based retrieval, feedback correction, trajectory clustering, and skill induction into a unified framework represents a significant innovation, providing a scalable and flexible solution for lifelong learning in AI agents.

Methodology

�� Initial Connection Formation: At each task step, the system retrieves relevant facts from the semantic layer by calculating a hybrid relevance score combining dense embedding similarity, lexical matching, and LLM verification. This establishes initial edges between current observations and stored knowledge. Simultaneously, episodic experiences are retrieved via embedding similarity, and applicable skills are inherited from related episodes through existing distillation links, forming a local subgraph.
�� Feedback-Driven Refinement: During task execution, environmental feedback signals (success, failure, errors) are used to dynamically modify the graph. Missing links are added by identifying semantically proximate but unactivated nodes; irrelevant links are pruned to reduce noise. Internal node content is reshaped to align abstraction levels, either by expanding details or abstracting redundant information. These edits are iteratively applied until task success or a maximum refinement round is reached.
�� Long-Term Consolidation: After task completion, trajectories are clustered based on semantic similarity. Skills are extracted via LLM induction and refined through iterative evaluation guided by PEMS. The stability and maturity of skills are monitored, and validated skills are stored as procedural nodes, enriching the memory for future tasks. Offline consolidation ensures the memory graph evolves into a stable, self-organized structure capable of supporting complex reasoning and multi-task learning.
�� Connection and skill updates are guided by relevance scores, feedback signals, and PEMS, ensuring continuous adaptation and structural stability. The entire process is integrated with online real-time updates during task execution and offline long-term consolidation, forming a cohesive, self-improving memory system.

Experiments

The experimental setup involves three benchmark datasets: LoCoMo for long-context reasoning, Mind2Web for web navigation, and GAIA for general assistant tasks. Baselines include static memory models like MemoryOS and Nemori, as well as evolving memory systems such as MemEvolve and Flash-Searcher. Evaluation metrics encompass LMJ scores, success rates, Action F1, and Element Accuracy. Hyperparameters like the number of refinement rounds T and convergence threshold ϵ are tuned to analyze their impact on performance. Ablation studies systematically remove each stage to assess their contribution, revealing that feedback-driven refinement (Stage II) is critical for accuracy, while long-term consolidation (Stage III) enhances performance in multi-step tasks. Results consistently show that FluxMem outperforms baselines, with significant improvements in reasoning accuracy, navigation success, and multi-task transferability. The experiments demonstrate the framework’s robustness, scalability, and adaptability across diverse scenarios.

Results

In the LoCoMo benchmark, FluxMem achieves a LMJ score of 95.06, far exceeding the baseline of 81.23, illustrating its superior long-context reasoning ability. On Mind2Web, success rates improve from 52.12 to 73.6 in realistic settings, outperforming models like AWM and MemoryOS by margins of 15-20%. In GAIA, success rate increases from 52.12 to 73.6, surpassing MemEvolve and Flash-Searcher, indicating excellent cross-task transfer. Ablation results highlight that removing feedback refinement reduces LMJ scores by over 10%, confirming its importance. The number of refinement rounds T correlates positively with performance, with diminishing returns beyond T=3. The Procedure Evolution Maturity Score (PEMS) stabilizes after several rounds, indicating convergence of memory structure. Overall, these findings validate the effectiveness of the three-stage evolution process in building adaptive, self-organizing memory networks.

Applications

FluxMem can be directly applied to intelligent assistants, autonomous agents, and knowledge management systems requiring lifelong learning. Its ability to dynamically refine memory connections makes it suitable for complex reasoning, multi-step planning, and real-time decision-making in industries like healthcare, finance, and customer service. The framework supports continuous knowledge updates, enabling systems to adapt to new information without retraining from scratch. Long-term, it paves the way for fully autonomous AI agents capable of self-organizing knowledge bases, improving scalability and robustness in real-world applications. Additionally, its modular design allows integration with multi-modal data sources, broadening its applicability to robotics, multimedia analysis, and interactive AI systems.

Limitations & Outlook

Despite its promising results, FluxMem faces challenges such as high computational overhead due to iterative feedback and connection updates, which may hinder deployment in resource-constrained environments. Its reliance on static datasets limits validation in real-time streaming scenarios, where memory decay and continuous data flow are critical. Parameter sensitivity (e.g., T, ϵ) requires careful tuning, lacking adaptive mechanisms for diverse tasks. Moreover, the offline consolidation process may introduce latency, affecting real-time responsiveness. Future work should focus on optimizing efficiency, developing adaptive parameter tuning, and extending the framework to handle active memory decay and streaming data for truly lifelong learning systems.

Plain Language Accessible to non-experts

Imagine you’re managing a big school where students learn different subjects every day. To keep everything organized, you create a network of folders and notes that connect related topics—like math formulas linked to problem-solving strategies, or historical events connected by themes. At first, you set up some basic links based on what you think is important. But as students work on projects and get feedback, you notice some links are missing, some are irrelevant, and some need to be more detailed or more abstract. So, you keep adjusting this network: adding new links, removing unnecessary ones, and summarizing common patterns into easy-to-use guides.

Over time, this network becomes smarter and more organized. When a student faces a new problem, they can quickly find the right notes and guides, because the connections are constantly refined based on real feedback. Sometimes, you gather similar projects and create a new summary or skill that everyone can reuse. This process makes the whole school run more smoothly, with everyone learning from past experiences and improving their methods.

In AI, this is similar to how FluxMem works. Instead of storing information in fixed files, it builds a flexible, evolving map of knowledge that keeps changing as it learns from new data and feedback. This way, the AI can adapt to new tasks, fix mistakes, and become better over time, just like a well-managed school that keeps improving its teaching methods based on student feedback and new challenges.

ELI14 Explained like you're 14

Imagine you have a big box of LEGO bricks, and every day you build different things—cars, castles, robots. Sometimes, you realize some pieces don’t fit well, or you forget where you put certain special bricks. So, you start reorganizing your LEGO collection: you add new bricks where needed, remove broken or useless pieces, and group similar parts together to make building easier.

Now, every time you want to build something new, you look at your reorganized collection. Because you kept fixing and improving it, you can build faster and better, making cool new creations without starting from scratch each time.

This is how FluxMem works. Instead of storing all knowledge in one big, unchangeable box, it creates a flexible map of information that keeps updating itself. When the AI makes a mistake or learns something new, it adjusts its connections—adding links, removing irrelevant info, or summarizing patterns into simple guides. Over time, this map becomes really smart and organized, helping the AI understand and solve new problems more quickly.

Just like your LEGO collection gets better with each rebuild, FluxMem’s memory keeps evolving, making the AI more clever, adaptable, and ready for anything. It’s like having a super-organized, self-improving LEGO set that learns from every build and gets better every day!

Glossary

Heterogeneous Graph (异构图)

一种由不同类型节点和边组成的图结构，用于表达复杂多层关系，支持多层次信息的动态连接。论文中用以建模记忆的多层次、多类型连接。

FluxMem将记忆表示为由语义、episodic和程序层组成的异构图，动态调整其连接关系以实现自我演化。

Procedure Evolution Maturity Score (PEMS)

衡量技能或程序节点演化成熟度的指标，结合成功率、复杂度和变化差异，用于监控记忆的结构稳定性和优化状态。

用以指导长远整合和技能归纳，确保记忆结构的稳定性和不断优化。

Feedback-Driven Refinement (反馈驱动修正)

利用环境反馈信号动态调整记忆连接，包括添加遗漏链接和剪除干扰连接，以优化信息流和结构。

在第二阶段实现，确保记忆连接的相关性和准确性。

Trajectory Clustering (轨迹聚类)

将相似的任务轨迹归为一类，用于提取共通技能或模式，支持技能归纳和长远整合。

在第三阶段，通过轨迹聚类实现经验的抽象和技能的归纳。

Abstraction Granularity (抽象粒度)

信息或节点的细节层级，调节粒度以匹配任务需求，优化记忆内容的表达和利用。

在第二阶段，通过调整抽象粒度，优化记忆单元内容。

Open Questions Unanswered questions from this research

1 未来需要验证FluxMem在持续在线学习中的表现，特别是在环境剧变和信息爆炸的情况下，系统的稳定性和自我调节能力仍待深入研究。

Applications

Immediate Applications

智能问答与助手系统

利用FluxMem的动态连接调整，提升智能问答系统在多轮对话和复杂推理中的表现，增强信息检索的准确性和上下文理解能力。

自动决策支持

在金融、医疗等行业，通过不断演化的记忆网络，提升系统对环境变化的敏感性和决策的连续性，实现更智能的自动化操作。

知识库管理与长远学习

企业或科研机构借助FluxMem实现知识库的持续更新和优化，支持长时间、多任务的知识积累与应用。

Long-term Vision

自主学习与自我演化智能体

未来系统可以实现无需人工干预的持续学习，自动调整记忆结构，适应新环境和新任务，推动自主智能体的广泛应用。

跨模态多任务自我优化

结合视觉、听觉等多模态信息，构建多层次、多类型的自我演化记忆体系，支持复杂、多任务的协同工作，开启智能系统的新时代。

Abstract

Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle in dynamic agentic environments where feedback, task variation, and heterogeneous signals continuously reshape what should be remembered and how it should be connected. To address this, we propose FluxMem, a connectivity-evolving memory framework that models memory as a heterogeneous graph and progressively refines its topology through three stages: initial connection formation, feedback-driven refinement, and long-term consolidation. During execution, FluxMem repairs missing links, prunes interference, aligns abstraction granularity, and distills recurrent successful trajectories into reusable procedural circuits, guided by one metric for memory generalizability and evolutionary maturity. Across three fundamentally distinct benchmarks including LoCoMo, Mind2Web, and GAIA, FluxMem achieves consistent state-of-the-art performance, demonstrating strong adaptation and generalization in complex agentic environments. The code will be open-sourced in https://github.com/zjunlp/LightMem.

cs.CL cs.AI cs.LG cs.MA cs.MM

References (20)

MemEvolve: Meta-Evolution of Agent Memory Systems

Guibin Zhang, Haotian Ren, Chong Zhan et al.

2025 41 citations ⭐ Influential View Analysis →

Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution

Tianrui Qin, Qianben Chen, Sinuo Wang et al.

2025 13 citations ⭐ Influential View Analysis →

StructMem: Structured Memory for Long-Horizon Behavior in LLMs

Buqiang Xu, Yijun Chen, Jizhan Fang et al.

2026 4 citations View Analysis →

LightMem: Lightweight and Efficient Memory-Augmented Generation

Jizhan Fang, Xinle Deng, Haoming Xu et al.

2025 71 citations View Analysis →

PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents

Ke Yang, Zixiang Chen, Xuan He et al.

2026 10 citations View Analysis →

Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

Chongrui Ye, Yuxiang Liu, Yu Wang et al.

2026 1 citations View Analysis →

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Yuchen Shi, Yuzheng Cai, Siqi Cai et al.

2025 4 citations View Analysis →

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong et al.

2025 44 citations View Analysis →

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Tianxin Wei, Noveen Sachdeva, Benjamin Coleman et al.

2025 72 citations View Analysis →

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

Jiahao Qiu, Xuan Qi, Tongcheng Zhang et al.

2025 98 citations View Analysis →

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Yu Wang, Xi Chen

2025 101 citations View Analysis →

Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Peng Xia, Peng Xia, Kaide Zeng et al.

2025 40 citations View Analysis →

GAIA: a benchmark for General AI Assistants

G. Mialon, Clémentine Fourrier, Craig Swift et al.

2023 807 citations View Analysis →

A Survey on the Memory Mechanism of Large Language Model-based Agents

Zeyu Zhang, Quanyu Dai, Xiaohe Bo et al.

2024 538 citations View Analysis →

The organization of behavior: A neuropsychological theory

J. Knott

1951 4984 citations

MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

Yining Chen, Jihao Zhao, Bo Tang et al.

2026 3 citations View Analysis →

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Xiangru Tang, Tianrui Qin, Tianhao Peng et al.

2025 53 citations View Analysis →

Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution

Zouying Cao, Jiaji Deng, Li Yu et al.

2025 31 citations View Analysis →

Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

Adam Fourney, Gagan Bansal, Hussein Mozannar et al.

2024 192 citations View Analysis →

Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation

A. AadharshAadhithya, S. SachinKumar, Soman K.p.

2024 4 citations View Analysis →

Rethinking Memory as Continuously Evolving Connectivity

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Heterogeneous Graph (异构图)

Procedure Evolution Maturity Score (PEMS)

Feedback-Driven Refinement (反馈驱动修正)

Trajectory Clustering (轨迹聚类)

Abstraction Granularity (抽象粒度)

Open Questions Unanswered questions from this research

Applications

Immediate Applications

智能问答与助手系统

自动决策支持

知识库管理与长远学习

Long-term Vision

自主学习与自我演化智能体

跨模态多任务自我优化

Abstract

References (20)

Related Papers

The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse

Learning User Simulators with Turing Rewards

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

Characterizing Cultural Localization in AI-Generated Stories

Operads for compositional reasoning in LLMs