[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-superthoughts-halves-token-steps-for-chain-of-thought-reasoning":10},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":22,"tags":38,"sources":42,"feedback":46,"feedback_at":22,"cost_usd":46,"total_tokens":46},1201,"superthoughts-halves-token-steps-for-chain-of-thought-reasoning","SuperThoughts halves token steps for chain-of-thought reasoning","A new fine‑tuning add‑on compresses consecutive reasoning tokens, boosting throughput by up to 30 % while keeping accuracy within a couple of points.","SuperThoughts lets large language models emit two reasoning tokens at once.\n\nThe authors fine‑tuned four Qwen2.5‑Math instruction models with a lightweight Multi‑Token Prediction (MTP) module that packs each pair of consecutive chain‑of‑thought (CoT) tokens into a single latent vector. At inference time the model decodes two tokens per step, cutting the effective CoT length by roughly 20‑30 %. A confidence‑based fallback reverts to standard decoding when the MTP signal is weak. Tests on MATH500, AMC, OlympiadBench, and GPQA‑Diamond show a 1‑2 point drop in accuracy on most benchmarks.\n\nDoubling token throughput directly reduces compute time and cloud cost for long‑form reasoning, a known bottleneck for LLMs tackling math or logic problems. The approach keeps discrete token supervision, sidestepping the instability that plagues fully latent‑space reasoning methods.\n\nIt is a modest speed‑up rather than a breakthrough; the gain depends on the model’s ability to predict paired tokens accurately.\n\nIn short, SuperThoughts offers a practical way to shave latency from heavy CoT workloads, but the 1‑2 point accuracy dip and the need for a reliable confidence check mean it remains a trade‑off rather than a universal solution.","[\"llm\",\"reasoning\",\"efficiency\"]","2026-06-15T04:00:00.000Z","2026-06-16T17:28:52.006Z","2026-06-16T17:28:55.186Z","published",null,[24,30,34],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"Add a clear concluding paragraph that summarizes the news and its implications.","resolved",{"id":31,"reviewer":26,"round":32,"reason":33,"status":29},"editor-r2",2,"The headline overstates the speedup (claims 2x while source reports only 20‑30% reduction) and the article lacks a clear concluding paragraph summarizing the news and its implications.",{"id":35,"reviewer":26,"round":36,"reason":37,"status":29},"editor-r3",3,"Add a clear concluding paragraph that summarizes the findings, their significance for LLM reasoning efficiency, and any caveats.",[39,40,41],"llm","reasoning","efficiency",[43],{"name":44,"url":45},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.13862",0]