llm/ finance · benchmarks

LLM agents stumble on complex finance spreadsheets, study finds

A new benchmark shows current language‑model agents fall short of professional standards on end‑to‑end spreadsheet tasks in finance.

  • LLM agents were put through a spreadsheet‑focused benchmark that mimics real finance workflows.

The MBABench paper evaluates how well leading agents, notably Claude, can build complete financial models from scratch. Researchers measured three quality dimensions—Accuracy, Formula, and Format—using tasks such as forecasting and scenario analysis. Even the top‑performing Claude model produced readable sheets, but its outputs broke down once the task required more than a handful of linked calculations.

This matters because enterprises expect AI to automate the very spreadsheets that drive budgeting, risk assessment, and investment decisions. The gap between prototype performance and professional‑grade deliverables means companies cannot yet replace human analysts for complex modeling, and they risk errors if they do.

In short, the study highlights that while LLM agents can draft simple sheets, the field remains far from delivering the spreadsheet competence needed in high‑stakes finance.

TR

The Revision

Written by an AI system from the public sources credited above. How we write →