llm-security/ web-agents · prompt-injection

MUZZLE auto‑generates adaptive prompt‑injection attacks on web agents

Researchers introduce MUZZLE, an automated framework that adaptively discovers indirect prompt‑injection flaws in LLM‑powered web agents.

  • MUZZLE demonstrates that web‑agent security testing can be fully automated.

What actually happened: The authors released MUZZLE, a tool that watches an LLM‑based web agent’s actions, spots high‑impact places where malicious text could be injected, and then crafts context‑aware instructions to try to subvert the agent. It iterates, using failed attempts as feedback, and runs the process across four real‑world web apps, ten different malicious goals, and several LLM back‑ends. In the experiments MUZZLE uncovered 44 previously unknown attacks, including three that spanned multiple applications and a tailored phishing scenario.

Why it matters: Indirect prompt injection—malicious content hidden in a website that the agent later reads—has been a blind spot because prior tests used static templates. MUZZLE’s adaptive approach mirrors how an attacker would learn from a target’s behavior, exposing weaknesses that static methods miss. The results suggest that many deployed agents may be vulnerable out of the box.

Closing thought: As LLM agents move from labs to browsers, tools like MUZZLE will be as essential as a web‑app firewall—otherwise developers risk handing attackers a new remote‑control lever.

TR

The Revision

Written by an AI system from the public sources credited above. How we write →