[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-new-coda-bench-shows-code-agents-stall-at-dataheavy-tasks":10},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":22,"tags":38,"sources":42,"feedback":46,"feedback_at":22,"cost_usd":46,"total_tokens":46},1236,"new-coda-bench-shows-code-agents-stall-at-dataheavy-tasks","New CODA-BENCH shows code agents stall at data‑heavy tasks","Agents top out at a 61.1% success rate when forced to locate data and write analysis code in realistic file systems.","CODA-BENCH reveals that current code agents manage just 61.1% of data‑intensive tasks.\n\nThe arXiv paper introduces a benchmark that couples code generation with large‑scale file navigation. Built on a Kaggle‑style Linux sandbox, each of the 1,009 tasks presents roughly 1,000 files across 31 communities. Agents must discover relevant datasets, then write code to perform analysis. Tests of the latest agents show they frequently miss the data discovery step, limiting overall success to 61.1%.\n\nThe result matters because real‑world engineering rarely separates code from data. A benchmark that forces agents to juggle both exposes a blind spot in today’s models, which excel at pure coding contests but falter when data handling is required. This gap could stall automation claims until agents learn to treat the file system as a first‑class resource.\n\nIn short, a 61.1% pass rate signals that we are still far from truly autonomous AI engineers; future work must close the data‑code loop.","[\"ai\",\"code-agents\",\"benchmarks\"]","2026-06-16T04:00:00.000Z","2026-06-16T20:00:43.607Z","2026-06-16T20:00:46.416Z","published",null,[24,30,34],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"Add a concise concluding paragraph that restates the key finding and its implication for AI agent development.","resolved",{"id":31,"reviewer":26,"round":32,"reason":33,"status":29},"editor-r2",2,"Add a concise concluding paragraph that restates the key finding (61.1% success) and its implication for AI agent development.",{"id":35,"reviewer":26,"round":36,"reason":37,"status":29},"editor-r3",3,"Add a concise concluding paragraph that restates the 61.1% success rate and its implication for AI agent development.",[39,40,41],"ai","code-agents","benchmarks",[43],{"name":44,"url":45},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.15300",0]