LLM agents stumble on complex finance spreadsheets, study finds

LLM agents were put through a spreadsheet‑focused benchmark that mimics real finance workflows.

The MBABench paper evaluates how well leading agents, notably Claude, can build complete financial models from scratch. Researchers measured three quality dimensions—Accuracy, Formula, and Format—using tasks such as forecasting and scenario analysis. Even the top‑performing Claude model produced readable sheets, but its outputs broke down once the task required more than a handful of linked calculations.

This matters because enterprises expect AI to automate the very spreadsheets that drive budgeting, risk assessment, and investment decisions. The gap between prototype performance and professional‑grade deliverables means companies cannot yet replace human analysts for complex modeling, and they risk errors if they do.

In short, the study highlights that while LLM agents can draft simple sheets, the field remains far from delivering the spreadsheet competence needed in high‑stakes finance.

← Back to the front page