Formal solvers give correct answers — but the AI layer explaining those answers is another story.
Researchers studying so-called LLM-solver loops — pipelines where a language model hands off a logic question to a SAT or SMT solver, then narrates the result back to the user — have identified a vulnerability at the final step that prior work largely ignored. The paper, posted to arXiv, models the full loop as a verified decision procedure and tests five open-source models against prompt injection attacks targeting the narration stage. The finding: certificate gating can preserve the solver's soundness guarantee up to that point, but an adversary can still flip the communicated conclusion by manipulating phrasing or delivery channel. A hardened prompt reduces the attack surface significantly but cannot close it, and adaptive attacks continue to get through.
This matters because safety-critical applications — think automated contract review, formal verification tooling, or security policy checks — often route through exactly this kind of hybrid pipeline on the assumption that offloading to a solver buys formal guarantees. If the narration layer can be coerced into contradicting a verified verdict, the guarantee evaporates at the moment it most needs to hold. The gap the researchers name is narrow but precise: robustness of the solver does not transfer to robustness of the user-facing answer.
The broader pattern here is familiar. Security properties earned in one layer of a system tend to erode at the interface to the next. Formal methods researchers spent decades hardening the solvers; now the sprint is to harden the translators that sit on top of them — and the translators are large neural networks trained to be agreeable.