[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-gitofthoughts-puts-llm-reasoning-into-a-version-control-overlay":10},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":22,"tags":34,"sources":38,"feedback":42,"feedback_at":22,"cost_usd":42,"total_tokens":42},1197,"gitofthoughts-puts-llm-reasoning-into-a-version-control-overlay","GitOfThoughts puts LLM reasoning into a version-control overlay","A new framework stores LLM thought trees in a Git‑style repo, enabling replay, diffing and merging, but finds memory formats rarely boost accuracy.","GitOfThoughts lets agents record every scored thought as a Git commit, making reasoning replayable and auditable.\n\nThe paper introduces a system that treats an LLM's reasoning tree like a Git repository: commits store individual thoughts, notes hold scores, and tags mark outcomes. Researchers evaluated five memory substrates—including the new Git‑based one—across two benchmarks and multiple model sizes. The results show no consistent accuracy gain from any memory format on novel problems, except when the retrieved case is a near‑duplicate (similarity above ~0.8), where performance spikes. Larger models double that payoff but still cannot extract transferable methods. The only reliable accuracy lever remains test‑time sampling.\n\nThe significance lies in shifting focus from fanciful memory tricks to practical provenance. By enforcing version control, developers can audit, compare, and merge reasoning paths without sacrificing performance, addressing a long‑standing reproducibility gap in LLM workflows.\n\nIn short, GitOfThoughts offers traceability and mergeability for LLM reasoning while confirming that memory, beyond near‑duplicate recall, adds little to accuracy.","[\"ai\",\"large-language-models\",\"reproducibility\"]","2026-06-15T04:00:00.000Z","2026-06-16T17:15:30.937Z","2026-06-16T17:15:33.744Z","published",null,[24,30],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"Add a clear concluding paragraph summarizing the news and its implications, and ensure the article ends with a definitive summary sentence.","resolved",{"id":31,"reviewer":26,"round":32,"reason":33,"status":29},"editor-r2",2,"Add a clear concluding paragraph that summarizes the news and its implications, ending with a definitive summary sentence.",[35,36,37],"ai","large-language-models","reproducibility",[39],{"name":40,"url":41},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.14470",0]