[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-smarter-problem-selection-makes-llm-training-more-efficient":10,"sections":44},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":34,"tags":35,"sources":39,"feedback":43,"feedback_at":22,"cost_usd":43,"total_tokens":43},1817,"smarter-problem-selection-makes-llm-training-more-efficient","Smarter Problem Selection Makes LLM Training More Efficient","A new paper argues that picking training problems by difficulty alone leaves reasoning gains on the table, and proposes a geometry-aware alternative.","Picking the wrong practice problems is costing large language models performance during reinforcement learning training.\n\nResearchers introduced a framework called Bayesian Manifold Curriculum (BMC) that treats problem selection as something more than a difficulty dial. Instead of ranking prompts by how hard they are and feeding models a steady diet of medium-hard questions, BMC maps problems onto the model's own internal representation space — a geometric structure the authors call a manifold — and uses that map to build a hierarchical task tree. Bayesian learning then guides which problems get sampled next. The key empirical finding: difficulty-first sampling forces a tradeoff between productivity (how strong the learning signal is), diversity (how broadly the training covers the problem space), and utility (how well any of it transfers to evaluation benchmarks).\n\nThe standard approach — treating each training problem as an independent arm in a bandit problem and pulling whichever looks hardest-but-not-too-hard — ignores the fact that problems are related to each other through what the model already knows. BMC exploits that structure, which means the training signal can be steered deliberately rather than discovered accidentally. For anyone building reasoning-focused models, that distinction could matter at scale, where wasted compute compounds quickly.\n\nDifficulty-based curriculum methods are common in RL-for-reasoning pipelines, but this paper adds to a growing body of work suggesting that the *type* and *coverage* of training problems matters as much as their rank on a hardness scale — a finding that should make labs rethink what \"optimal sampling\" actually means.","[\"ai\",\"machine-learning\",\"reinforcement-learning\",\"llms\"]","2026-06-19T04:00:00.000Z","2026-06-19T12:54:35.591Z","2026-06-19T14:22:20.016Z","published",null,[24,30],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"The article never names the paper, its authors, or the arXiv identifier, and omits the specific benchmark results that would let readers assess the 'empirically we find' claim — add at least the paper title\u002Flink and concrete performance numbers if the source provides them.","resolved",{"id":31,"reviewer":26,"round":32,"reason":33,"status":29},"editor-r2",2,"The article names DeepMind and DeepSeek-R1 as labs that 'lean heavily on difficulty-based curriculum signals' — a specific empirical claim about their training methodology that is not supported by the source material; remove or hedge this to what the paper actually says.","ai",[34,36,37,38],"machine-learning","reinforcement-learning","llms",[40],{"name":41,"url":42},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.19750",0,{"sections":45},[46,50,54,59,64,69,74,78,82,87,92,97,102,107],{"name":47,"slug":34,"count":48,"latest_published_at":49},"AI",491,"2026-06-19T14:59:11.000Z",{"name":51,"slug":52,"count":53,"latest_published_at":18},"Security","security",132,{"name":55,"slug":56,"count":57,"latest_published_at":58},"Policy","policy",88,"2026-06-16T09:26:09.000Z",{"name":60,"slug":61,"count":62,"latest_published_at":63},"Consumer Tech","consumer-tech",78,"2026-06-16T17:58:24.000Z",{"name":65,"slug":66,"count":67,"latest_published_at":68},"Hardware","hardware",62,"2026-06-18T15:24:16.000Z",{"name":70,"slug":71,"count":72,"latest_published_at":73},"Deals","deals",58,"2026-06-19T14:43:50.000Z",{"name":75,"slug":76,"count":72,"latest_published_at":77},"Software","software","2026-06-16T20:00:00.000Z",{"name":79,"slug":80,"count":81,"latest_published_at":18},"Dev Tools","dev-tools",50,{"name":83,"slug":84,"count":85,"latest_published_at":86},"Science","science",38,"2026-06-18T04:00:00.000Z",{"name":88,"slug":89,"count":90,"latest_published_at":91},"Gaming","gaming",31,"2026-06-16T15:25:13.000Z",{"name":93,"slug":94,"count":95,"latest_published_at":96},"General","general",26,"2026-06-13T18:35:15.000Z",{"name":98,"slug":99,"count":100,"latest_published_at":101},"Startups","startups",23,"2026-06-16T15:00:00.000Z",{"name":103,"slug":104,"count":105,"latest_published_at":106},"Reviews","reviews",19,"2026-06-14T08:00:00.000Z",{"name":108,"slug":109,"count":110,"latest_published_at":111},"How-To","how-to",6,"2026-06-16T09:00:00.000Z"]