[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-market-of-claims-agent-takes-on-financial-math":10,"sections":40},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":30,"tags":31,"sources":35,"feedback":39,"feedback_at":22,"cost_usd":39,"total_tokens":39},1789,"market-of-claims-agent-takes-on-financial-math","Market-of-Claims Agent Takes on Financial Math","MoCA-Agent uses a trading-floor metaphor to verify financial calculations claim by claim, hitting 78.3% on FinQA and 86.9% on ESGenius across ten benchmarks.","A team of researchers has built a financial reasoning agent that borrows the logic of a trading floor — and gets more reliable answers by treating every calculation as a bet that has to be won or lost.\n\nMoCA-Agent, released by UBC-NLP, replaces the typical multi-agent debate format with a \"market of claims\" structure. Rather than letting agents freely argue toward a consensus answer, it breaks each financial question into typed atomic claims and has specialist \"trader\" agents signal whether those claims hold. Signals are weighted by confidence, cleared into accept\u002Freject decisions, and then synthesized into executable Python. A verifier checks the resulting code for structural correctness and common financial errors — things like sign flips or scale mismatches — with at most one repair round before the system commits to an answer.\n\nThe financial domain is where plausible-but-wrong is a genuine danger, not just a benchmark embarrassment. A single transposed digit or wrong unit can produce a number that looks reasonable until someone checks the source table. MoCA-Agent ran across ten public benchmarks using a fixed Qwen3.6-27B backbone and posted 78.3% on FinQA, 76.0% on FinanceMath, 71.2% on MultiHiertt, 86.9% on ESGenius, and an 85.6% average on FinChart-Bench. The mid-range scores on FinanceMath and MultiHiertt — both involving multi-step arithmetic across hierarchical tables — show where claim-level verification still leaves performance on the table.\n\nThe code is public on GitHub, which matters: financial reasoning benchmarks have a history of results that look strong in isolation and fall apart when evaluation details stay hidden.","[\"ai\",\"financial-ai\",\"reasoning-agents\",\"benchmarks\"]","2026-06-19T04:00:00.000Z","2026-06-19T11:55:21.962Z","2026-06-19T14:22:19.337Z","published",null,[24],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"The article omits several benchmark scores cited in the source (FinanceMath 76.0%, MultiHiertt 71.2%) while selectively quoting only the stronger numbers, which misrepresents the system's overall performance profile — include the full set of headline results or explicitly note which benchmarks are shown.","resolved","ai",[30,32,33,34],"financial-ai","reasoning-agents","benchmarks",[36],{"name":37,"url":38},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.11537",0,{"sections":41},[42,46,50,55,60,65,70,74,78,83,88,93,98,103],{"name":43,"slug":30,"count":44,"latest_published_at":45},"AI",491,"2026-06-19T14:59:11.000Z",{"name":47,"slug":48,"count":49,"latest_published_at":18},"Security","security",132,{"name":51,"slug":52,"count":53,"latest_published_at":54},"Policy","policy",88,"2026-06-16T09:26:09.000Z",{"name":56,"slug":57,"count":58,"latest_published_at":59},"Consumer Tech","consumer-tech",78,"2026-06-16T17:58:24.000Z",{"name":61,"slug":62,"count":63,"latest_published_at":64},"Hardware","hardware",62,"2026-06-18T15:24:16.000Z",{"name":66,"slug":67,"count":68,"latest_published_at":69},"Deals","deals",58,"2026-06-19T14:43:50.000Z",{"name":71,"slug":72,"count":68,"latest_published_at":73},"Software","software","2026-06-16T20:00:00.000Z",{"name":75,"slug":76,"count":77,"latest_published_at":18},"Dev Tools","dev-tools",50,{"name":79,"slug":80,"count":81,"latest_published_at":82},"Science","science",38,"2026-06-18T04:00:00.000Z",{"name":84,"slug":85,"count":86,"latest_published_at":87},"Gaming","gaming",31,"2026-06-16T15:25:13.000Z",{"name":89,"slug":90,"count":91,"latest_published_at":92},"General","general",26,"2026-06-13T18:35:15.000Z",{"name":94,"slug":95,"count":96,"latest_published_at":97},"Startups","startups",23,"2026-06-16T15:00:00.000Z",{"name":99,"slug":100,"count":101,"latest_published_at":102},"Reviews","reviews",19,"2026-06-14T08:00:00.000Z",{"name":104,"slug":105,"count":106,"latest_published_at":107},"How-To","how-to",6,"2026-06-16T09:00:00.000Z"]