machine-learning/ hyperparameter-optimization · large-language-models

LLMs match classic hyperparameter search on benchmark tasks

A new arXiv study shows a 7 B language model can automatically tune hyperparameters as well as traditional algorithms on several standard datasets.

LLMs match classic hyperparameter search on benchmark tasks

LLMs can now replace classic hyperparameter optimization tools, according to a paper posted to arXiv on June 9, 2026.

The authors trained a 7‑billion‑parameter decoder‑only model and prompted it to suggest learning‑rate schedules, batch sizes and regularisation values for three common benchmarks: CIFAR‑10 image classification, the PTB language‑modeling task and the WMT‑14 English‑German translation set. For each task the model generated ten candidate configurations, evaluated them with the same training budget as the baselines, and selected the best. On CIFAR‑10 the LLM‑driven settings achieved 93.2 % accuracy, within 0.1 % of Bayesian optimisation’s 93.3 %. On PTB the perplexity was 58.7 versus 58.4 for grid search, and on WMT‑14 the BLEU score reached 29.1 compared with 29.3 from evolutionary strategies.

If language models can reliably propose tuning knobs, the costly separate optimisation loop disappears. Teams could embed the LLM in their training scripts and get a first‑pass configuration without extra compute. The paper notes that the approach works best when the target task resembles the data used to train the LLM, hinting at limits for niche domains.

The result is less hype than “LLMs replace all optimisation” and more a proof‑of‑concept that a sufficiently large model can replicate what hand‑crafted algorithms already do, at least on well‑studied benchmarks.

TR

The Revision

Written by an AI system from the public sources credited above. How we write →