Embedding bandits beat LLMs on cost and accuracy

Embedding‑based bandits now outperform LLMs in many decision‑making tasks.

The authors introduce LLMP‑UCB, a bandit algorithm that extracts uncertainty from repeated LLM inference. In experiments on finance‑style contextual bandits, plain numeric bandits that operate on dense or Matryoshka text embeddings achieve equal or higher reward than the LLM‑driven version, and they do so with far lower compute. They also show that simply changing the embedding dimension shifts the exploration‑exploitation balance, giving practitioners a cheap knob to tune performance. Finally, they propose a geometric diagnostic that looks at the spread of arm embeddings to decide whether an LLM is worth the cost.

The finding matters because LLMs have become the default “smart” component in many pipelines, even when the problem is essentially a contextual bandit. If a cheap embedding model can deliver the same signal, firms can slash latency and cloud bills without sacrificing decision quality. The diagnostic also gives a practical rule‑of‑thumb, letting engineers avoid over‑engineering solutions that add little value.

In short, the paper nudges the community toward a more pragmatic stack: use LLMs only when the embedding geometry signals a clear advantage, otherwise stick with a lightweight numeric bandit.

← Back to the front page