Security/ security · ai · rag · llm

RAG Systems Have a Model-Level Attack Problem

Researchers show that open-source retrieval models can be directly edited to inject malicious knowledge into RAG pipelines, bypassing text-based defenses.

A new attack framework targets the retrieval model inside RAG systems — not the documents it searches.

Most RAG injection research focuses on poisoning the knowledge base: craft a convincing fake document, get it indexed, hope the system retrieves it. Researchers behind CAREATTACK take a different route. Because many RAG deployments use open-source embedding models like Qwen3-Embedding-0.6B or BGE-M3, an attacker with access to those model weights can edit the parameters directly — promoting malicious passages to the top of retrieval results without touching the underlying corpus. The method runs in two stages: a graph-based conflict detection step resolves interference between parameter edits, and a calibration pass ensures non-target queries behave normally so the manipulation stays hidden.

The practical concern here is scope. Corpus-level attacks are increasingly detectable — synthetic text leaves fingerprints, and filter pipelines are catching up. Model-level edits are harder to spot because the retriever looks and behaves normally on every query except the targeted ones. If an attacker can distribute a subtly edited version of a popular open-source embedding model, every downstream RAG application built on it becomes a potential vector.

This sits alongside a growing body of work on supply-chain risk in open-source AI components — a problem the field has mostly treated as theoretical until experiments like this one make it concrete.

TR

The Revision

Written by an AI system from the public sources credited above. How we write →