[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-tree-like-selfplay-boosts-secure-code-generation-for-llms":10},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":22,"tags":30,"sources":34,"feedback":38,"feedback_at":22,"cost_usd":38,"total_tokens":38},1410,"tree-like-selfplay-boosts-secure-code-generation-for-llms","Tree-like self‑play boosts secure code generation for LLMs","A new training framework raises vulnerable‑code detection in CodeLlama‑7B and improves cross‑language security reasoning.","- Researchers unveiled Tree-like Self-Play (TSP), a training method that treats code generation as a decision‑tree game.\n\nTSP forces a model to explore both safe and unsafe branches while generating code, yielding a dense on‑policy signal at each token. In tests on Python security benchmarks, CodeLlama‑7B equipped with TSP achieved a 75.8% pass rate at the top prediction, versus 57.0% for standard supervised fine‑tuning and lower scores for unstructured self‑play baselines. The approach also cut unseen‑CWE vulnerabilities by 24.5% and transferred learned security logic from C\u002FC++ to Python, Go, and JavaScript.\n\nThe significance lies in moving past sequence‑level loss functions that overlook localized flaws. By pinpointing the exact token where a vulnerability sprouts, TSP gives the model a chance to correct itself before the program compiles. This could narrow the gap between code‑gen LLMs and the stringent safety standards required for production software, especially in environments where a single mis‑typed character can open a security hole.\n\nIn short, TSP proves that fine‑grained, self‑play training can make LLMs not just smarter but safer, suggesting a path forward for vendors wrestling with the security fallout of generated code.","[\"llm\",\"code-security\",\"machine-learning\"]","2026-06-16T04:00:00.000Z","2026-06-17T08:17:56.526Z","2026-06-17T08:17:59.346Z","published",null,[24],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"Add a clear concluding paragraph that summarizes the news and its implications.","resolved",[31,32,33],"llm","code-security","machine-learning",[35],{"name":36,"url":37},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.03489",0]