New benchmark reveals hidden privacy leaks in multi-agent LLM setups

Multi-agent LLM systems can spill sensitive information through internal channels that typical output‑only tests ignore.

The authors release AgentLeak, a benchmark that inspects seven communication pathways—inter‑agent messages, shared memory, tool arguments, and others—across 1,000 scenarios in healthcare, finance, legal and corporate settings. They run five production models (GPT‑4o, GPT‑4o‑mini, Claude 3.5 Sonnet, Mistral Large, Llama 3.3 70B) and collect 4,979 execution traces. Final‑output leakage drops from 43.2% in single‑agent runs to 27.2% in multi‑agent configurations, but internal‑channel leakage climbs to 68.9%, with inter‑agent messages alone leaking 68.8% of the time. In every model and domain, internal leaks equal or exceed final‑output leaks.

The result matters because most privacy audits focus on what the system says at the end of a task. AgentLeak demonstrates that such audits miss up to 41.7% of violations, implying that current compliance checks could give a false sense of security. Architects of multi‑agent pipelines will need to monitor and guard internal communications, not just the outward transcript.

The benchmark is a preprint and has not yet been peer reviewed. Until the community validates the methodology, its findings should be treated as an early warning rather than definitive proof of systemic risk.

← Back to the front page