Google Bakes Screen Control Into Gemini 3.5 Flash

Google has folded computer use directly into Gemini 3.5 Flash, the fast agentic model it debuted at I/O 2026.

The feature — which lets an AI agent see a screen and then click, type, and scroll across browsers, mobile devices, and desktops — previously required a standalone model to run. Google has now collapsed that into Flash itself, making the capability available as a native built-in tool. The pitch is aimed squarely at enterprise customers, who Google wants to trust the model with automated workflows that touch real interfaces.

Bundling computer use into a single, fast model lowers the integration cost for developers building agents that need to operate software the way a human would. That matters because most enterprise software still has no API — if you want an AI to file an expense report or update a CRM record, screen control is often the only path in. Consolidating that into one model call instead of two reduces latency and billing complexity.

The move follows Anthropic, whose Claude models have offered computer use since late 2024. Google folding the same capability into its fastest model — rather than a heavyweight one — signals that screen control is becoming table stakes, not a premium feature. Whether enterprises will hand an AI agent the keys to their desktops is a trust problem no benchmark can solve.

← Back to the front page