[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-utility-diversity-sampling-trims-llm-finetuning-costs":10},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":22,"tags":24,"sources":28,"feedback":32,"feedback_at":22,"cost_usd":32,"total_tokens":32},1329,"utility-diversity-sampling-trims-llm-finetuning-costs","Utility-Diversity Sampling trims LLM fine‑tuning costs","A new online batch selection method improves both efficiency and performance for supervised fine‑tuning without external data.","- Researchers released Utility-Diversity Sampling (UDS), a batch selector that balances usefulness and variety of training samples.\n\nUDS scores incoming examples using the nuclear norm of the model's logits matrix, a measure that captures how informative each sample is while also tracking intra‑sample diversity. It adds a lightweight memory buffer to compare low‑dimensional embeddings of past samples, estimating inter‑sample diversity without extra back‑propagation. The method runs entirely online, needing no reference model or validation set. Benchmarks across several fine‑tuning tasks show UDS beats existing online selectors and narrows the gap to full‑dataset training, while cutting overall training time.\n\nThe significance lies in two fronts. First, fine‑tuning large language models on full datasets is expensive; by pruning the batch to the most valuable and diverse examples, practitioners can save compute and reduce overfitting risks. Second, unlike prior approaches that lean on external resources, UDS operates self‑contained, making it attractive for teams with limited infrastructure.\n\nIn practice, UDS may become the default middle ground between brute‑force full‑data fine‑tuning and aggressive data reduction strategies that sacrifice performance.","[\"llm\",\"fine-tuning\",\"machine-learning\"]","2026-06-16T04:00:00.000Z","2026-06-17T04:14:42.931Z","2026-06-17T04:14:46.031Z","published",null,[],[25,26,27],"llm","fine-tuning","machine-learning",[29],{"name":30,"url":31},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16882",0]