nosql/ llm · benchmark

New benchmark exposes limits of LLMs on natural-language NoSQL queries

TEND reveals that large language models struggle to translate everyday questions into MongoDB aggregation pipelines.

A new benchmark called TEND tests how well systems turn plain English into MongoDB queries. It contains 1,210 verified tasks across 11 real‑world document stores, each with nested arrays, optional fields and dynamic keys.

The authors also present a solver, SAG, that first grounds schema details from existing documents before generating a bounded aggregation pipeline. They measure success with execution accuracy (EXC) and result‑set F1. Even top LLMs that excel at NL‑to‑SQL fall short, indicating that schema‑less document reasoning is a separate challenge.

This matters because many modern applications rely on NoSQL back‑ends, yet developers still write code to bridge the natural‑language gap. A reliable Text‑to‑NoSQL layer could cut development time and lower the barrier for non‑technical users.

For now, the results suggest that simply repurposing SQL‑oriented models will not suffice; specialized grounding and repair steps appear necessary.

TR

The Revision

Written by an AI system from the public sources credited above. How we write →