[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-small-confidence-scales-boost-llm-selfassessment-accuracy":10},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":22,"tags":30,"sources":34,"feedback":38,"feedback_at":22,"cost_usd":38,"total_tokens":38},1276,"small-confidence-scales-boost-llm-selfassessment-accuracy","Small confidence scales boost LLM self‑assessment accuracy","A study finds that a 0‑20 confidence range lets language models report uncertainty more reliably than the conventional 0‑100 scale.","LLMs now report confidence scores, but the scale they use matters.\n\nResearchers tested six models on three datasets, varying how confidence was encoded. Across all setups, the models hoarded their answers around three round numbers—typically 0, 50 and 100—leaving the rest of the scale unused. By reshaping the scale to 0‑20, tightening or loosening its boundaries, and making the range irregular, the team measured metacognitive sensitivity with meta‑d'. The 0‑20 scale consistently raised meta‑d' scores, indicating sharper self‑knowledge. Shrinking the range’s edges hurt performance, and the preference for round numbers survived even when the scale was lopsided.\n\nThe takeaway is practical: confidence scales are not a neutral overlay. A tighter, low‑range scale can extract more nuanced uncertainty signals from LLMs, which matters for any downstream task that relies on model confidence—risk assessment, active learning, or human‑in‑the‑loop pipelines.\n\nIn short, the study shows that a simple design tweak improves LLM metacognition, urging researchers to treat confidence scales as experimental parameters rather than an afterthought.","[\"large-language-models\",\"confidence-scaling\",\"evaluation\"]","2026-06-16T04:00:00.000Z","2026-06-17T01:18:41.741Z","2026-06-17T01:18:45.004Z","published",null,[24],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"Add a clear concluding paragraph that succinctly summarizes the finding and its implications for readers.","resolved",[31,32,33],"large-language-models","confidence-scaling","evaluation",[35],{"name":36,"url":37},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.09309",0]