[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-grokkings-delay-is-a-decoder-problem-not-a-learning-one":10,"sections":41},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":30,"tags":31,"sources":36,"feedback":40,"feedback_at":22,"cost_usd":40,"total_tokens":40},1608,"grokkings-delay-is-a-decoder-problem-not-a-learning-one","Grokking's delay is a decoder problem, not a learning one","Encoder-decoder models learn Collatz arithmetic early but stall at the decoder, and the numeral base decides whether they ever generalize.","A new study says transformers learn arithmetic long before they can show it.\n\nResearchers trained encoder-decoder models on one-step Collatz prediction and watched grokking, the long gap between fitting the training set and suddenly generalizing. They found the encoder organizes parity and residue structure within the first few thousand training steps, while output accuracy sits near chance for tens of thousands more. Causal interventions point to the decoder as the bottleneck. Transplanting a trained encoder into a fresh model sped up grokking by 2.75 times, while transplanting a trained decoder actively hurt, and freezing a converged encoder to retrain only the decoder erased the plateau and reached 97.6% accuracy, against 86.1% for joint training.\n\nGrokking has long been treated as a puzzle about when a model acquires structure. This work reframes it as a plumbing problem. The knowledge arrives early; the decoder simply cannot reach it, which makes the famous delay less about insight and more about access. It is the difference between a model that has not figured something out and one that has but cannot yet say it.\n\nThe twist is that access depends on how you write the numbers. Across 15 numeral bases, those whose factorization lines up with the Collatz map's arithmetic, like base 24, reached 99.8% accuracy. Binary failed completely, its representations collapsing and never recovering. The base acts as an inductive bias that controls how much local digit structure the decoder can exploit.\n\nPut those two findings together and arithmetic generalization starts to look less like a test of whether a model understands math and more like a test of whether its input encoding hands the right structure to the part that has to produce the answer. That is a useful, slightly deflating result. Some of the sudden-insight drama in grokking may be an artifact of representation choices we control rather than emergent reasoning we stumbled onto. It is worth remembering the next time a model's late breakthrough gets sold as a spark of understanding.","[\"grokking\",\"transformers\",\"machine learning\",\"arithmetic\"]","2026-06-18T04:00:00.000Z","2026-06-19T05:43:55.608Z","2026-06-19T05:43:58.411Z","published",null,[24],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"Expand the piece past the 300-word minimum and add a genuine concluding paragraph that ties the decoder-bottleneck finding and the base-dependence twist back to why grokking and arithmetic generalization matter, rather than ending abruptly on the binary detail.","resolved","ai",[32,33,34,35],"grokking","transformers","machine learning","arithmetic",[37],{"name":38,"url":39},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.13082",0,{"sections":42},[43,47,51,56,61,66,71,76,81,85,90,95,100,105],{"name":44,"slug":30,"count":45,"latest_published_at":46},"AI",387,"2026-06-19T04:00:00.000Z",{"name":48,"slug":49,"count":50,"latest_published_at":46},"Security","security",130,{"name":52,"slug":53,"count":54,"latest_published_at":55},"Policy","policy",88,"2026-06-16T09:26:09.000Z",{"name":57,"slug":58,"count":59,"latest_published_at":60},"Consumer Tech","consumer-tech",78,"2026-06-16T17:58:24.000Z",{"name":62,"slug":63,"count":64,"latest_published_at":65},"Hardware","hardware",62,"2026-06-18T15:24:16.000Z",{"name":67,"slug":68,"count":69,"latest_published_at":70},"Software","software",58,"2026-06-16T20:00:00.000Z",{"name":72,"slug":73,"count":74,"latest_published_at":75},"Deals","deals",54,"2026-06-16T15:26:40.000Z",{"name":77,"slug":78,"count":79,"latest_published_at":80},"Dev Tools","dev-tools",49,"2026-06-16T04:00:00.000Z",{"name":82,"slug":83,"count":84,"latest_published_at":18},"Science","science",38,{"name":86,"slug":87,"count":88,"latest_published_at":89},"Gaming","gaming",31,"2026-06-16T15:25:13.000Z",{"name":91,"slug":92,"count":93,"latest_published_at":94},"General","general",26,"2026-06-13T18:35:15.000Z",{"name":96,"slug":97,"count":98,"latest_published_at":99},"Startups","startups",23,"2026-06-16T15:00:00.000Z",{"name":101,"slug":102,"count":103,"latest_published_at":104},"Reviews","reviews",19,"2026-06-14T08:00:00.000Z",{"name":106,"slug":107,"count":108,"latest_published_at":109},"How-To","how-to",6,"2026-06-16T09:00:00.000Z"]