[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-few-shot-llms-match-human-annotators-on-ciu-tagging-in-aphasia":10},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":22,"tags":34,"sources":38,"feedback":42,"feedback_at":22,"cost_usd":42,"total_tokens":42},1251,"few-shot-llms-match-human-annotators-on-ciu-tagging-in-aphasia","Few-shot LLMs match human annotators on CIU tagging in aphasia","Instruction-tuned language models achieve F1 scores up to 0.82 in token-level CIU classification, but still lag behind human raters.","- Researchers tested four instruction-tuned LLMs on token‑level Correct Information Unit (CIU) labeling of aphasic speech.\n\n- Using sixteen picture‑description transcripts across four severity levels, zero‑shot prompts failed. Few‑shot prompting boosted performance, with Llama‑3.1‑8B, Qwen2.5‑7B and Mistral‑7B reaching mean F1 scores between 0.776 and 0.817. Precision stayed lower than recall, indicating the models over‑tagged tokens as CIUs, and results deteriorated with more severe aphasia.\n\n- The work matters because CIU scoring currently requires trained clinicians and is time‑consuming. Automating the task could free up clinical hours and speed up assessments, though the current error profile means the models are best used as assistants rather than replacements.\n\n- In short, few‑shot LLM prompting can identify CIUs at a level comparable to human annotators, yet the gap in precision and variability across severity levels prevents fully autonomous deployment. The study points to a human‑in‑the‑loop workflow as the nearest practical application.","[\"ai\",\"speech-therapy\",\"nlp\"]","2026-06-16T04:00:00.000Z","2026-06-16T20:55:55.415Z","2026-06-16T20:55:58.241Z","published",null,[24,30],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"Add a concise concluding paragraph that restates the news (few-shot prompting narrows the gap but LLMs are not yet ready to replace clinicians) to give the article a clear ending.","resolved",{"id":31,"reviewer":26,"round":32,"reason":33,"status":29},"editor-r2",2,"Add a concise concluding paragraph that restates the news, summarizing the findings and their implications.",[35,36,37],"ai","speech-therapy","nlp",[39],{"name":40,"url":41},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.15696",0]