RAG Systems Have a Model-Level Attack Problem

A new attack framework targets the retrieval model inside RAG systems — not the documents it searches.

Most RAG injection research focuses on poisoning the knowledge base: craft a convincing fake document, get it indexed, hope the system retrieves it. Researchers behind CAREATTACK take a different route. Because many RAG deployments use open-source embedding models like Qwen3-Embedding-0.6B or BGE-M3, an attacker with access to those model weights can edit the parameters directly — promoting malicious passages to the top of retrieval results without touching the underlying corpus. The method runs in two stages: a graph-based conflict detection step resolves interference between parameter edits, and a calibration pass ensures non-target queries behave normally so the manipulation stays hidden.

The practical concern here is scope. Corpus-level attacks are increasingly detectable — synthetic text leaves fingerprints, and filter pipelines are catching up. Model-level edits are harder to spot because the retriever looks and behaves normally on every query except the targeted ones. If an attacker can distribute a subtly edited version of a popular open-source embedding model, every downstream RAG application built on it becomes a potential vector.

This sits alongside a growing body of work on supply-chain risk in open-source AI components — a problem the field has mostly treated as theoretical until experiments like this one make it concrete.

← Back to the front page