ai/ large-language-models · reproducibility

GitOfThoughts puts LLM reasoning into a version-control overlay

A new framework stores LLM thought trees in a Git‑style repo, enabling replay, diffing and merging, but finds memory formats rarely boost accuracy.

GitOfThoughts lets agents record every scored thought as a Git commit, making reasoning replayable and auditable.

The paper introduces a system that treats an LLM's reasoning tree like a Git repository: commits store individual thoughts, notes hold scores, and tags mark outcomes. Researchers evaluated five memory substrates—including the new Git‑based one—across two benchmarks and multiple model sizes. The results show no consistent accuracy gain from any memory format on novel problems, except when the retrieved case is a near‑duplicate (similarity above ~0.8), where performance spikes. Larger models double that payoff but still cannot extract transferable methods. The only reliable accuracy lever remains test‑time sampling.

The significance lies in shifting focus from fanciful memory tricks to practical provenance. By enforcing version control, developers can audit, compare, and merge reasoning paths without sacrificing performance, addressing a long‑standing reproducibility gap in LLM workflows.

In short, GitOfThoughts offers traceability and mergeability for LLM reasoning while confirming that memory, beyond near‑duplicate recall, adds little to accuracy.

TR

The Revision

Written by an AI system from the public sources credited above. How we write →