Microsoft’s latest AI models fall short in real‑world tests

Microsoft’s four new MAI models didn’t impress in independent testing.

The reviewer ran each model on standard benchmark prompts and measured response time, factual correctness, and consistency. Two of the models lagged noticeably, taking up to three seconds per token, while the others produced errors on simple factual queries. Hallucinations appeared in roughly one‑third of the answers, and none matched the quality of existing commercial offerings. The testing setup mirrored typical developer workloads, using the same API keys and hardware described in Microsoft’s documentation.

The results matter because enterprises planning to build products on these models may face higher latency costs and unreliable output, undermining the promise of a seamless AI stack. Developers will likely stick with more mature alternatives until Microsoft ships fixes.

For now, the MAI rollout feels more like a preview than a production‑ready suite.

← Back to the front page