LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems
LCGuard uses adversarially learned transformations on Transformer KV caches to reduce sensitive information reconstruction in multi-agent systems while preserving task performance.
Key Findings
Methodology
This paper introduces LCGuard, a framework addressing sensitive information leakage in latent communication via Transformer key-value (KV) caches in multi-agent large language model systems. LCGuard treats shared KV caches as latent working memory and learns representation-level transformation functions g_{ij} applied before transmission to suppress reconstructability of sensitive inputs. The leakage is operationalized via reconstruction attacks: an adversarial decoder D_i attempts to recover agent-specific sensitive inputs s_i from observed communicated artifacts M_obs. LCGuard employs an adversarial training scheme optimizing a minimax objective balancing reconstruction loss L_rec and task loss L_task, controlled by a tradeoff parameter β. The framework supports various communication topologies (sequential, hierarchical, graph-based) and multiple model families (Qwen3, Gemma, LLaMA), enabling flexible privacy-utility tradeoffs.
Key Results
- On the Qwen3-4B model with PrivacyLens benchmark under sequential communication, LCGuard reduces Attack Success Rate (ASR) from 0.871 to 0.216, achieving approximately 75% privacy improvement while maintaining helpfulness at 0.710 and task accuracy above 0.720.
- For Gemma-9B on AgentLeak benchmark with hierarchical topology, LCGuard lowers ASR from 0.885 to 0.205, preserving high helpfulness of 0.735, outperforming baselines like ADAPT which degrade utility significantly.
- Full-system optimization of LCGuard surpasses per-agent variants by effectively mitigating compositional leakage across multi-hop communication, demonstrating robustness across models and topologies.
Significance
This work is the first to systematically identify and address privacy leakage risks inherent in Transformer KV cache-based latent communication channels within multi-agent systems. It overcomes limitations of traditional text-based communication security by providing a representation-level adversarial training framework that effectively suppresses sensitive input reconstructability while preserving collaborative task performance. LCGuard bridges a critical gap in KV cache security, enabling safer deployment of multi-agent LLM systems in privacy-sensitive domains, thus advancing both academic understanding and practical applications.
Technical Contribution
Technically, LCGuard innovates by formalizing sensitive information leakage as reconstruction capability from shared KV caches and introduces an adversarial training framework jointly optimizing communication transformations and reconstruction decoders. It supports diverse communication topologies and model scales, enabling dynamic privacy-utility tradeoffs via a tunable parameter β. The full-system optimization captures multi-hop compositional leakage, enhancing defense comprehensiveness and robustness. This approach departs fundamentally from prior output-level or system isolation methods by directly manipulating high-dimensional latent representations.
Novelty
LCGuard is the first framework targeting privacy leakage in Transformer KV cache-based latent communication for multi-agent LLM systems. Unlike prior work focusing on text-based communication or system-level isolation, it innovatively applies adversarially learned representation transformations to suppress sensitive input reconstructability, establishing a new paradigm for latent communication privacy protection.
Limitations
- LCGuard's effectiveness depends on the adversary model's capacity; more sophisticated or unknown decoding strategies could reduce defense efficacy.
- The privacy-utility tradeoff requires careful tuning; stringent privacy constraints may degrade task performance.
- Experiments are conducted on public models and benchmarks; real-world deployment with diverse protocols and attack surfaces may pose additional challenges.
Future Work
Future research directions include exploring stronger adversarial models to evaluate robustness, integrating differential privacy techniques for theoretical guarantees, extending LCGuard to complex multi-agent interaction protocols and cross-modal communication, and developing adaptive mechanisms to dynamically balance privacy and utility in diverse application scenarios, thereby enhancing practical applicability and security.
AI Executive Summary
As large language models (LLMs) become integral to multi-agent systems, agents increasingly rely on intermediate communication to coordinate complex tasks. Traditional communication predominantly uses natural language, which, while interpretable, suffers from inefficiencies and information loss due to repeated tokenization and decoding. Recent advances have shifted towards latent communication via Transformer key-value (KV) caches, enabling direct transfer of rich semantic representations that improve efficiency and preserve nuanced task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific sensitive data, creating an opaque high-dimensional channel that can inadvertently leak private information without explicit textual disclosure.
Addressing this challenge, the authors propose LCGuard, a novel framework treating shared KV caches as latent working memory subject to learned representation-level transformations before transmission. LCGuard formalizes sensitive information leakage operationally through reconstruction attacks: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. To mitigate this, LCGuard employs an adversarial training paradigm where the communication functions are optimized to maximize reconstruction loss, thereby suppressing recoverable sensitive information, while maintaining downstream task performance.
Technically, LCGuard formulates a minimax optimization balancing task utility and privacy risk, controlled by a tradeoff parameter β. It supports various communication topologies—sequential, hierarchical, and graph-based—and multiple Transformer model families including Qwen3, Gemma, and LLaMA. The framework jointly optimizes communication transformations and adversarial decoders, enabling robust suppression of both local and compositional multi-hop leakage.
Extensive experiments on benchmarks such as PrivacyLens, AgentLeak, and MAGPIE demonstrate LCGuard’s efficacy. For instance, on Qwen3-4B with PrivacyLens under sequential topology, LCGuard reduces attack success rate (ASR) from 0.871 to 0.216 while preserving helpfulness and task accuracy above 0.7. Similar gains are observed across Gemma-9B and LLaMA-8B models and diverse topologies. Compared to baselines like vanilla KV sharing, which maximizes utility but suffers high leakage, and ADAPT, which reduces leakage at the cost of utility, LCGuard achieves a superior privacy-utility tradeoff. Full-system optimization further enhances protection by mitigating multi-hop aggregated leakage.
This work fundamentally advances understanding of privacy risks in latent communication channels of multi-agent LLM systems and provides a principled, practical defense mechanism. LCGuard enables safer deployment of multi-agent systems in privacy-sensitive applications, balancing efficiency, utility, and security. Despite its strengths, LCGuard’s reliance on adversary modeling and tradeoff tuning highlights areas for future work, including stronger attack models, integration with formal privacy guarantees, and extension to more complex interaction protocols. Overall, LCGuard opens new avenues for secure, efficient multi-agent collaboration leveraging latent communication.
Deep Analysis
Background
Multi-agent systems leveraging large language models (LLMs) have gained significant traction for solving complex collaborative tasks through coordination and information exchange. Traditionally, agents communicate via natural language, serializing internal states into text that downstream agents interpret. While this approach is flexible and interpretable, it introduces inefficiencies due to repeated tokenization, decoding, and semantic reconstruction, leading to information loss and computational overhead. Recent research has shifted focus towards latent communication, wherein agents exchange intermediate model representations, notably Transformer key-value (KV) caches and activations. These latent representations preserve richer semantic content and enable more efficient multi-stage reasoning by avoiding redundant computations.
However, KV caches are high-dimensional, semantically dense tensors encoding contextual inputs, intermediate reasoning states, and agent-specific information. Prior studies have shown that internal model representations can retain substantial input information even if not explicitly decoded. Consequently, KV caches shared across agents form an opaque communication channel that may implicitly propagate sensitive content without textual disclosure. This latent channel expands the attack surface of multi-agent systems, enabling adversaries with access to shared caches—via compromised agents, logging, or auxiliary models—to reconstruct sensitive inputs through trained decoders. Existing safety mechanisms focus on output-level constraints or system isolation, lacking principled frameworks to regulate information content in shared latent representations. This gap motivates the need for new methods to secure KV-based latent communication.
Core Problem
The core problem addressed is how to enable efficient latent communication via Transformer KV caches in multi-agent systems while limiting the recoverability of agent-specific sensitive information from shared representations. KV caches encode not only task-relevant semantics but also sensitive inputs such as user context, retrieved documents, or intermediate outputs that should remain private. When these caches are transmitted across agents, they create a high-bandwidth, high-dimensional channel that is difficult to inspect or constrain.
Adversaries with partial or full access to communicated artifacts can train decoders to reconstruct sensitive inputs, posing privacy risks that traditional text-based protections cannot mitigate. Furthermore, sensitive information may accumulate through multi-hop communication paths, leading to compositional leakage that is more challenging to detect and prevent. The problem thus requires designing communication functions that transform shared KV representations to suppress reconstructable sensitive information while preserving downstream task utility, balancing privacy and performance in diverse multi-agent communication topologies.
Innovation
The paper presents several key innovations:
1. Representation-Level Leakage Formalization: It introduces a novel operational definition of sensitive information leakage as the reconstructability of agent-specific inputs from shared KV caches, measured via reconstruction loss under adversarial decoding.
2. Adversarial Training Framework: LCGuard formulates a minimax optimization where communication functions learn to transform KV caches to maximize reconstruction difficulty, while adversarial decoders attempt to minimize reconstruction loss, enabling explicit control over privacy-utility tradeoffs.
3. System-Level Optimization: Beyond local link-level defenses, LCGuard jointly optimizes communication transformations across the entire multi-agent system, effectively mitigating compositional leakage arising from multi-hop communication.
4. Broad Applicability: The framework is compatible with multiple Transformer model families (Qwen3, Gemma, LLaMA) and supports various communication topologies (sequential, hierarchical, graph-based), demonstrating generalizability.
These innovations collectively advance latent communication privacy protection beyond prior output-level or system isolation methods, establishing a new paradigm for secure multi-agent collaboration.
Methodology
- �� Multi-Agent Model Setup: Define a set of agents {a_i}, each modeled by a Transformer with parameters θ_i, receiving task inputs x_i and sensitive inputs s_i.
- �� Internal Representation: Each agent processes inputs to produce internal KV caches K_i ∈ R^{T_i×d_k} and V_i ∈ R^{T_i×d_v}, where T_i is token count, d_k and d_v are key and value dimensions.
- �� Communication Functions: For each agent pair (a_i, a_j), define learnable transformation functions g_{ij} parameterized by ϕ_i, mapping (K_i, V_i) to communicated artifacts m_{ij} = g_{ij}(K_i, V_i).
- �� Leakage Definition: An adversarial decoder D_i with parameters ψ_i attempts to reconstruct sensitive input s_i from observed communication artifacts M_obs ⊆ {m_{ij}}.
- �� Objective Formulation: Minimax optimization min_{ϕ} max_{ψ} β Σ_i L_rec^{(i)}(M_obs) + L_task(M), where L_rec measures reconstruction loss and L_task measures task loss; β balances privacy and utility.
- �� Training Procedure: Alternating updates where adversary D_i minimizes reconstruction loss to improve decoding, and communication functions g_{ij} maximize reconstruction loss to suppress sensitive information while maintaining task performance.
- �� Communication Topologies: Framework accommodates sequential, hierarchical, and graph-based communication, with M_obs representing local or system-wide observed artifacts.
- �� Experimental Setup: Evaluate on multiple model families and benchmarks, comparing against baselines including vanilla KV sharing, PrivAct, and ADAPT.
Experiments
Experiments span three Transformer model families: Qwen3 (4B, 8B, 14B), Gemma-2-9B, and LLaMA (3B, 8B). Benchmarks include PrivacyLens, AgentLeak, and MAGPIE, which evaluate contextual privacy violations, internal communication leakage, and collaborative private information scenarios respectively. Communication topologies tested are sequential, hierarchical, and graph-based, reflecting diverse multi-agent interaction patterns.
Baselines comprise vanilla KV sharing (direct latent representation transmission), PrivAct (policy-level privacy constraints), and ADAPT (noise injection for differential privacy). Metrics include task accuracy, helpfulness, attack success rate (ASR), and reconstruction loss. Hyperparameter β is tuned to analyze privacy-utility tradeoffs. Ablation studies investigate adversary strength, communication topology effects, and local versus system-level optimization.
Inference-time efficiency comparisons demonstrate LCGuard's practical viability. The comprehensive experimental design validates robustness and generalizability across models, tasks, and communication structures.
Results
Results demonstrate LCGuard's effectiveness in reducing sensitive information reconstruction while preserving task performance. For example, on Qwen3-4B with PrivacyLens under sequential topology, LCGuard reduces ASR from 0.871 to 0.216, maintaining helpfulness at 0.710 and task accuracy above 0.720. On Gemma-9B with AgentLeak hierarchical topology, ASR drops from 0.885 to 0.205 with helpfulness at 0.735, outperforming ADAPT which severely degrades utility. Full-system LCGuard consistently outperforms per-agent variants by mitigating multi-hop compositional leakage. Across models and topologies, LCGuard achieves a favorable privacy-utility tradeoff, contrasting with vanilla KV sharing's high leakage and ADAPT's utility loss. PrivAct improves privacy metrics but fails to reduce latent leakage, underscoring LCGuard's unique contribution.
Applications
LCGuard is applicable to multi-agent systems requiring efficient collaboration with privacy-sensitive data, such as collaborative intelligent assistants, cross-institutional data analysis, medical diagnosis coordination, and secure surveillance networks. By protecting latent communication channels, LCGuard enables compliance with privacy regulations without sacrificing performance. The framework's adaptability also facilitates extension to other Transformer-based latent representation sharing scenarios, promoting privacy-preserving multi-modal and cross-domain collaboration.
Limitations & Outlook
LCGuard's defense effectiveness depends on the modeled adversary's capacity; unknown or more sophisticated decoders may circumvent protections. The privacy-utility tradeoff necessitates careful tuning, with stringent privacy potentially impairing task performance. Experiments focus on public models and benchmarks; real-world deployment with heterogeneous protocols and attack vectors may introduce additional complexities requiring further validation and adaptation.
Abstract
Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can improve efficiency and preserve richer task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure. To address this, we introduce \textbf{LCGuard} (Latent Communication Guard), a framework for safe KV-based latent communication in multi-agent LLM systems. LCGuard treats shared KV caches as latent working memory and learns representation-level transformations before cache artifacts are transmitted across agents. We formalize representation-level sensitive information leakage operationally through reconstruction: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. This leads to an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information. Empirical evaluations across multiple model families and multi-agent benchmarks show that LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines.