- Frontier AI models are now being tested in a realistic cyber‑range environment.
The researchers released AgentCyberRange, an open suite that links 15 web applications with 156 internal hosts across eight enterprise‑style networks. The suite contains 110 known vulnerabilities and a toolchain called Cage for running, orchestrating and verifying attacks. Six top‑tier AI systems were given identical prompts and budget limits and asked to perform two stages: discover and exploit web services, then expand footholds inside the network. The best performer, GPT‑5.5 with Codex, succeeded on 16.1% of web exploits and 31.7% of post‑exploitation tasks; giving it more concrete hints lifted those scores to 33.0% and 46.3%.
This matters because prior benchmarks only measured isolated tricks like CTF puzzles, missing the full intrusion workflow that matters to defenders. Demonstrating that off‑the‑shelf AIs can navigate a multi‑host environment suggests a lower barrier for automated, large‑scale attacks and gives security teams a concrete yardstick for emerging threats.
The study also uncovered previously unknown bugs in popular projects, underscoring that open, reproducible ranges are becoming essential for early risk detection.