A research team has built a way to stop AI customer-service agents from quietly forgetting the rules mid-conversation.
Current tool-calling agents stuff everything — user messages, tool responses, policy instructions — into a single prompt and rely on the model to reconstruct what it needs each time it acts. That works until it doesn't: the model picks the right fact from three turns ago but acts on a stale version, or it makes a syntactically clean tool call that still violates a policy it technically "saw" earlier. LedgerAgent, described in a new paper, sidesteps this by maintaining a separate structured ledger of observed task state — facts, identifiers, constraints, conditions — and rendering that ledger into the prompt at each step. Before any environment-changing tool call fires, the ledger is also checked against state-dependent policy constraints, blocking violations before they happen. The system is inference-time only, meaning it wraps existing models rather than retraining them.
The gap it closes matters because customer-service agents are exactly the setting where a forgotten constraint has real consequences — a refund issued twice, a cancellation that shouldn't have gone through. Testing across four customer-service domains with a mix of open- and closed-weight models showed improved average pass-k scores, with the biggest gains on stricter multi-trial consistency metrics — the tests that punish an agent for getting it right once but wrong the next.
Explicit state registers are a well-worn idea in software engineering; it's telling that LLM agent research is only now rediscovering them as a reliability fix rather than treating prompt length as the only knob worth turning.