A team of researchers has built a financial reasoning agent that borrows the logic of a trading floor — and gets more reliable answers by treating every calculation as a bet that has to be won or lost.
MoCA-Agent, released by UBC-NLP, replaces the typical multi-agent debate format with a "market of claims" structure. Rather than letting agents freely argue toward a consensus answer, it breaks each financial question into typed atomic claims and has specialist "trader" agents signal whether those claims hold. Signals are weighted by confidence, cleared into accept/reject decisions, and then synthesized into executable Python. A verifier checks the resulting code for structural correctness and common financial errors — things like sign flips or scale mismatches — with at most one repair round before the system commits to an answer.
The financial domain is where plausible-but-wrong is a genuine danger, not just a benchmark embarrassment. A single transposed digit or wrong unit can produce a number that looks reasonable until someone checks the source table. MoCA-Agent ran across ten public benchmarks using a fixed Qwen3.6-27B backbone and posted 78.3% on FinQA, 76.0% on FinanceMath, 71.2% on MultiHiertt, 86.9% on ESGenius, and an 85.6% average on FinChart-Bench. The mid-range scores on FinanceMath and MultiHiertt — both involving multi-step arithmetic across hierarchical tables — show where claim-level verification still leaves performance on the table.
The code is public on GitHub, which matters: financial reasoning benchmarks have a history of results that look strong in isolation and fall apart when evaluation details stay hidden.