A three‑agent LLM stack can now warn of wasted computation before the answer is checked.
Researchers introduced a trace‑based framework that monitors an orchestrator, a search agent, and an execution agent. The system converts event logs into cheap online signals—loop detection, budget pressure, low information gain, and tool instability—and supplements them with offline semantic grounding checks and selective LLM‑as‑judge evaluation. Tested on 165 GAIA validation traces, 98 runs finished with usable answers while 67 failed or stopped early. In the warned‑failed runs, 58 % of tokens were spent after the first warning, showing a clear window for intervention. A pilot on ten Level‑2 tasks used warnings to diversify the search or demand evidence, halving the post‑warning token fraction from 0.638 to 0.304.
The importance lies in turning cheap, real‑time metrics into actionable signals, letting the orchestrator prune unproductive paths before they drain resources. Deeper semantic checks then verify that the remaining output is trustworthy.
If the approach scales, it could become a standard guardrail for costly multi‑agent deployments, much like early‑exit heuristics in single‑model inference.