[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-a-new-vlm-framework-tries-to-cite-its-work":10,"sections":40},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":30,"tags":31,"sources":35,"feedback":39,"feedback_at":22,"cost_usd":39,"total_tokens":39},1630,"a-new-vlm-framework-tries-to-cite-its-work","A New VLM Framework Tries to Cite Its Work","CaVe-VLM-CoT uses a five-stage closed-loop pipeline to catch hallucinations in vision-language models before they reach the output.","A research framework called CaVe-VLM-CoT aims to make vision-language models stop making things up — by forcing them to cite their sources at every reasoning step.\n\nVision-language models are notoriously prone to hallucinations: they produce fluent, confident outputs that simply don't match the images they're analyzing. Existing fixes, like chain-of-thought prompting or retrieval-augmented generation, patch parts of the problem but don't close the loop — a model can still reason from an unverified claim without being sent back to re-check. CaVe-VLM-CoT addresses this with five sequential stages: Extractor, Retriever, Solver, Citation Injector, and Verifier. When the Verifier catches an ungrounded claim, it routes structured feedback back to the Extractor for targeted re-retrieval rather than letting the error propagate. The researchers also propose 23 component-wise evaluation metrics, anchored by a composite score called CaVeScore, which weights accuracy, citation precision and recall, attribution, and evidence grounding together.\n\nThe benchmark numbers give a clearer picture of where the system stands: on ScienceQA, the framework hits 87.1% accuracy and a CaVeScore of 56.6%; on the harder MMMU benchmark across 30 subjects, accuracy drops to 55.2% with a CaVeScore of 35.7%. The gap between the two scores matters — MMMU's breadth exposes how much grounding quality degrades when the subject domain widens, which is exactly the condition real deployments face.\n\nThe framework requires no architectural changes or prompt modifications, which lowers the barrier to adoption — but a CaVeScore of 35.7% on MMMU is a reminder that \"interpretable\" and \"reliable\" are not yet the same thing.","[\"ai\",\"vision-language models\",\"hallucination\",\"research\"]","2026-06-18T04:00:00.000Z","2026-06-19T08:39:54.481Z","2026-06-19T14:20:57.388Z","published",null,[24],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"The MMMU CaVeScore figure is wrong: the article omits it entirely and, more critically, the ScienceQA CaVeScore (56.6%) is missing from the draft — fix by including both CaVeScore results — but the blocking error is that the draft states '55.2% accuracy' for MMMU without noting the source reports '56.6% CaVeScore on ScienceQA and 35.7% CaVeScore on MMMU', leaving the numeric picture incomplete and inconsistent with the source; reconcile all four reported figures before publishing.","resolved","ai",[30,32,33,34],"vision-language models","hallucination","research",[36],{"name":37,"url":38},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.18385",0,{"sections":41},[42,46,50,55,60,65,70,74,78,82,87,92,97,102],{"name":43,"slug":30,"count":44,"latest_published_at":45},"AI",490,"2026-06-19T04:00:00.000Z",{"name":47,"slug":48,"count":49,"latest_published_at":45},"Security","security",132,{"name":51,"slug":52,"count":53,"latest_published_at":54},"Policy","policy",88,"2026-06-16T09:26:09.000Z",{"name":56,"slug":57,"count":58,"latest_published_at":59},"Consumer Tech","consumer-tech",78,"2026-06-16T17:58:24.000Z",{"name":61,"slug":62,"count":63,"latest_published_at":64},"Hardware","hardware",62,"2026-06-18T15:24:16.000Z",{"name":66,"slug":67,"count":68,"latest_published_at":69},"Deals","deals",58,"2026-06-19T14:43:50.000Z",{"name":71,"slug":72,"count":68,"latest_published_at":73},"Software","software","2026-06-16T20:00:00.000Z",{"name":75,"slug":76,"count":77,"latest_published_at":45},"Dev Tools","dev-tools",50,{"name":79,"slug":80,"count":81,"latest_published_at":18},"Science","science",38,{"name":83,"slug":84,"count":85,"latest_published_at":86},"Gaming","gaming",31,"2026-06-16T15:25:13.000Z",{"name":88,"slug":89,"count":90,"latest_published_at":91},"General","general",26,"2026-06-13T18:35:15.000Z",{"name":93,"slug":94,"count":95,"latest_published_at":96},"Startups","startups",23,"2026-06-16T15:00:00.000Z",{"name":98,"slug":99,"count":100,"latest_published_at":101},"Reviews","reviews",19,"2026-06-14T08:00:00.000Z",{"name":103,"slug":104,"count":105,"latest_published_at":106},"How-To","how-to",6,"2026-06-16T09:00:00.000Z"]