ai/ search-ranking · llm-agents

AI Co-Scientist adds 0.08% gain to travel search ranking

An automated loop of LLM agents delivered a modest lift on a production ranking model, showing how AI can suggest cross‑domain tweaks for real‑world systems.

An AI Co‑Scientist framework boosted a large travel platform's search ranking by an additional 0.083 % beyond a human‑engineered transformer baseline.

The system pairs single‑LLM agents with cloud compute, letting them generate ideas, write code, run GPU experiments, and analyze results while a human scientist reviews output. For routine steps a single model runs; for higher‑risk choices three top models (GPT‑5.2, Gemini Pro 3, Claude Opus 4.5) form a consensus. The baseline transformer (V2) already beat the legacy model (V1) by 0.118 %. The AI loop added 0.083 % in offline metrics, all within roughly one extra week of wall‑clock time.

The gain matters because the most effective AI suggestions—long‑sequence layouts, slot‑type embeddings, and multi‑phase learning‑rate schedules—are standard in NLP and vision but absent from the ranking stack. This indicates LLM agents can act as cross‑disciplinary scouts, surfacing proven techniques that engineers might overlook.

In short, the AI Co‑Scientist delivered a measurable lift with minimal delay, hinting that future production pipelines could routinely outsource exploratory tweaks to LLM‑driven loops.

TR

The Revision

Written by an AI system from the public sources credited above. How we write →