An AI framework called AutoPass can squeeze more runtime speed out of compiled code than expert-written compiler heuristics — without ever being trained on the task.
Researchers built AutoPass as a multi-agent system layered on top of LLVM, one of the most widely used compiler infrastructures in existence. Instead of treating the compiler as a black box — the approach most auto-tuning tools take — AutoPass lets the LLM query the compiler's internal optimization states and inspect intermediate representations. It then iterates, using actual runtime measurements to diagnose slowdowns and adjust compiler flags. No offline training, no fine-tuning: it runs purely at inference time, which means it can be dropped onto new hardware or benchmarks without retraining anything.
The practical result: on server-grade x86-64 hardware, AutoPass hit a geometric-mean speedup of 1.043x over LLVM's -O3 flag — the setting most production builds already use. On embedded ARM64, it reached 1.117x. Those numbers sound modest, but -O3 represents decades of hand-crafted heuristics; beating it consistently across benchmarks without any task-specific tuning is the point. For teams shipping performance-critical code on embedded or edge hardware, where every cycle matters and compiler expertise is scarce, the ARM64 result is the more interesting one.
Auto-tuning compilers is not new — tools like OpenTuner and ML-guided approaches have chased this for years — but most require significant offline profiling or model training tied to a specific codebase or chip. AutoPass's inference-only design is the differentiator, though the real test will be whether it holds up outside the benchmarks reported here.