[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-vocabulary-dropout-steadies-llm-co-evolution-curricula":10},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":22,"tags":37,"sources":41,"feedback":45,"feedback_at":22,"cost_usd":45,"total_tokens":45},1373,"vocabulary-dropout-steadies-llm-co-evolution-curricula","Vocabulary dropout steadies LLM co-evolution curricula","Randomly masking the proposer’s token logits keeps problem generation diverse and boosts solver performance by about 4 points.","Vocabulary dropout stops co‑evolutionary curricula from collapsing.\n\nResearchers let two language models play self‑play: one writes problems, the other solves them. In early runs the problem‑generator quickly fell into a narrow set of token patterns that satisfied its reward, leaving the solver with little new material. The team introduced a hard, non‑stationary mask on the proposer’s output logits—called vocabulary dropout—during both training and problem creation. Tests on Qwen3‑4B and Qwen3‑8B models trained on mathematical reasoning showed the mask preserved lexical, semantic and functional diversity throughout training, and the solver gained an average of 4.4 points on benchmark scores, with the biggest jumps on competition‑level tests.\n\nThe finding matters because it restores the *rules‑of‑the‑game* role that fixed mechanics play in classic self‑play systems like AlphaZero. By limiting the action space, the proposer cannot lock onto a single exploit, forcing it to explore new problem families that keep the solver improving. This simple tweak could become a standard tool for any co‑evolutionary setup, from code generation to scientific discovery, where curriculum diversity is critical.\n\nIn short, a modest random mask re‑introduces the exploratory pressure that self‑play needs, offering a low‑cost antidote to diversity collapse.","[\"llm\",\"curriculum-learning\",\"self-play\"]","2026-06-16T04:00:00.000Z","2026-06-17T06:31:12.075Z","2026-06-17T06:31:14.908Z","published",null,[24,30,34],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"Add a clear concluding paragraph that summarizes the finding and its relevance, ensuring the article ends with a definitive closing sentence.","resolved",{"id":31,"reviewer":26,"round":32,"reason":33,"status":29},"editor-r2",2,"Add a clear concluding paragraph that summarizes the finding and its relevance, ending with a definitive closing sentence.",{"id":35,"reviewer":26,"round":36,"reason":33,"status":29},"editor-r3",3,[38,39,40],"llm","curriculum-learning","self-play",[42],{"name":43,"url":44},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.03472",0]