[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-smarter-training-keeps-ai-reasoning-gains-without-the-skill-loss":10,"sections":34},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":24,"tags":25,"sources":29,"feedback":33,"feedback_at":22,"cost_usd":33,"total_tokens":33},1797,"smarter-training-keeps-ai-reasoning-gains-without-the-skill-loss","Smarter Training Keeps AI Reasoning Gains Without the Skill Loss","A new technique called RECAP dynamically reweights training objectives to stop AI models from forgetting core skills as they get better at reasoning.","Reinforcement learning makes AI models sharper at math and logic - but it quietly erases other things they already knew how to do.\n\nResearchers have identified a persistent problem in post-training pipelines for large language models: reinforcement learning with verifiable rewards, the standard method for boosting reasoning performance, causes models to degrade on foundational skills like perception and factual faithfulness. The degradation happens because the training optimizes so hard for the target task that older capabilities get crowded out. Standard fixes, like KL divergence regularization, only constrain drift relative to the current task and don't protect the broader knowledge base. Experience replay across different domains helps, but deciding how much weight to give each objective is an unsolved balancing act.\n\nThe proposed solution, called RECAP, introduces a replay strategy with dynamic objective reweighting that adjusts emphasis in real time based on short-horizon signals - shifting training focus away from objectives that have already converged and toward ones that are underperforming or unstable. Experiments on Qwen2.5-VL models at 3B and 7B parameter scales show that RECAP not only preserves general capabilities but also improves reasoning scores by allowing more flexible trade-offs during training.\n\nThe broader implication is that the AI industry's race to build specialized reasoning models may be quietly producing systems that are worse at everything else - a trade-off that benchmarks focused on math and coding won't surface. RECAP is described as end-to-end and compatible with existing pipelines without requiring additional models or heavy retuning, which matters: if the fix is easy to bolt on, there's less excuse for labs to keep shipping capability-regressed models and calling them upgrades.","[\"ai\",\"machine-learning\",\"reinforcement-learning\",\"large-language-models\"]","2026-06-19T04:00:00.000Z","2026-06-19T12:08:48.409Z","2026-06-19T14:22:19.513Z","published",null,[],"ai",[24,26,27,28],"machine-learning","reinforcement-learning","large-language-models",[30],{"name":31,"url":32},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21978",0,{"sections":35},[36,40,44,49,54,59,64,68,72,77,82,87,92,97],{"name":37,"slug":24,"count":38,"latest_published_at":39},"AI",491,"2026-06-19T14:59:11.000Z",{"name":41,"slug":42,"count":43,"latest_published_at":18},"Security","security",132,{"name":45,"slug":46,"count":47,"latest_published_at":48},"Policy","policy",88,"2026-06-16T09:26:09.000Z",{"name":50,"slug":51,"count":52,"latest_published_at":53},"Consumer Tech","consumer-tech",78,"2026-06-16T17:58:24.000Z",{"name":55,"slug":56,"count":57,"latest_published_at":58},"Hardware","hardware",62,"2026-06-18T15:24:16.000Z",{"name":60,"slug":61,"count":62,"latest_published_at":63},"Deals","deals",58,"2026-06-19T14:43:50.000Z",{"name":65,"slug":66,"count":62,"latest_published_at":67},"Software","software","2026-06-16T20:00:00.000Z",{"name":69,"slug":70,"count":71,"latest_published_at":18},"Dev Tools","dev-tools",50,{"name":73,"slug":74,"count":75,"latest_published_at":76},"Science","science",38,"2026-06-18T04:00:00.000Z",{"name":78,"slug":79,"count":80,"latest_published_at":81},"Gaming","gaming",31,"2026-06-16T15:25:13.000Z",{"name":83,"slug":84,"count":85,"latest_published_at":86},"General","general",26,"2026-06-13T18:35:15.000Z",{"name":88,"slug":89,"count":90,"latest_published_at":91},"Startups","startups",23,"2026-06-16T15:00:00.000Z",{"name":93,"slug":94,"count":95,"latest_published_at":96},"Reviews","reviews",19,"2026-06-14T08:00:00.000Z",{"name":98,"slug":99,"count":100,"latest_published_at":101},"How-To","how-to",6,"2026-06-16T09:00:00.000Z"]