[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-sorrydb-benchmark-aims-to-keep-ai-provers-honest-on-real-lean-proofs":10},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":22,"tags":24,"sources":28,"feedback":32,"feedback_at":22,"cost_usd":32,"total_tokens":32},1275,"sorrydb-benchmark-aims-to-keep-ai-provers-honest-on-real-lean-proofs","SorryDB benchmark aims to keep AI provers honest on real Lean proofs","A new dynamic benchmark pulls 78 GitHub formalisation projects into a live test set for theorem‑proving AIs.","A new benchmark called SorryDB now streams real Lean tasks from 78 open‑source formalisation projects.\n\nThe dataset updates continuously, replacing static collections that usually consist of competition puzzles. Researchers tested several AI approaches—general‑purpose large language models, an agentic system built on Gemini Flash, and dedicated symbolic provers—on a 1,000‑task snapshot. The Gemini‑based agent topped the list, but it was not decisively better than the other models or existing Lean tactics.\n\nBecause the benchmark evolves with the community, it forces provers to handle fresh dependencies and avoids the “train‑on‑test” problem that plagues static suites. In practice, tools that score well on SorryDB should be more useful to mathematicians working on ongoing projects, not just on curated benchmarks.\n\nIf the AI‑proving field keeps relying on static test beds, progress will stay confined to narrow tricks. SorryDB pushes developers toward generalisable reasoning, though the current results show no single approach dominates yet.","[\"ai-proof\",\"formal-methods\",\"lean\"]","2026-06-16T04:00:00.000Z","2026-06-17T01:17:39.936Z","2026-06-17T01:17:43.082Z","published",null,[],[25,26,27],"ai-proof","formal-methods","lean",[29],{"name":30,"url":31},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.02668",0]