A research team has built a routing algorithm that sends AI queries to the cheapest model that can still satisfy a quality contract — without needing a mountain of labeled training data to do it.
SLARouter is an online algorithm, meaning it adapts in real time rather than relying on a model trained offline before deployment. It works with the sparse, one-sided feedback that production systems actually generate — a thumbs up here, a retry there — rather than demanding complete signal on every response. The system provides theoretical guarantees for both cost efficiency and strict compliance with Service Level Agreements, the formal quality commitments companies make to customers. Across a range of benchmarks, it cut operating costs by up to 2.2x compared to existing routing baselines, without per-benchmark tuning.
This matters because inference costs are becoming a genuine line-item problem for anyone running LLMs at scale. Most existing routers assume rich feedback or offline training data that real deployments rarely have; SLARouter is designed around what operators actually get. The SLA guarantee closes a gap that most cost-aware routing research quietly ignores.
Routing cheaper models for simpler queries is not a new idea — startups like Martian and Unify have been pitching versions of this for a couple of years — but bolting a formal compliance guarantee onto an online learner working from minimal feedback is a meaningful step forward. The open question is how the algorithm performs when user feedback is not just sparse but systematically biased, which is almost always the case in real products.