Perplexity AI routes queries between PC and cloud in real time

Perplexity AI now decides on the fly whether your prompt runs locally or in the cloud.

At Computex, the company demonstrated a system that watches each incoming AI query, estimates its compute load, and assigns it either to the user’s CPU/GPU or to Perplexity’s data‑center servers. The handoff happens in milliseconds, keeping the interface seamless. Perplexity says the split can lower inference expenses and shave latency for lighter tasks.

The approach lets simple prompts stay on‑device, saving bandwidth and power, while heavy lifts still use cloud horsepower. That matters for privacy‑sensitive use cases and for users with spotty connections, and it gives developers a single API instead of separate edge and cloud models.

In practice, the gains hinge on network speed and the user’s hardware—nothing magical, just smarter task routing.

← Back to the front page