ToolPro Cuts Agent API Latency by Half With Executable Programs

AI agents are drowning in API round-trips, and a new research system called ToolPro proposes a way out.

Most AI agents today interact with web services the same way a junior developer might: one call at a time, waiting for each response before deciding what to do next. ToolPro, detailed in a new paper, replaces that pattern with something it calls an "executable tool program" — a compact representation of an entire multi-step workflow, including loops, conditionals, and retries, that gets sent to the service all at once. The system uses WebAssembly sandboxing for safety and includes a mechanism it calls "effect-aware replay" to ensure that state-changing calls — think placing an order or sending a message — happen exactly once even if something goes wrong. A built-in policy layer decides when to use the program approach versus the traditional step-by-step method.

The numbers are striking: the researchers report up to 53.4% reduction in end-to-end latency and up to 96.1% reduction in client-side traffic, with the biggest gains appearing when network latency is high and workflows are complex. That last point matters because the agents people are actually deploying in production tend to be exactly that — complex, multi-step, and increasingly dependent on chains of external service calls.

ToolPro is built on top of MCP-style service interfaces, which puts it squarely in the middle of the current agent infrastructure conversation. Whether the research translates into production tooling is another question — academic benchmarks on "diverse real-world workflows" have a habit of looking less impressive once they meet actual enterprise messiness.

← Back to the front page