LLMs still churn out faulty SQL when asked to translate natural language questions.
A team examined four popular in-context learning (ICL) approaches across two benchmarks and two model sizes. They catalogued 27 error types in seven categories, confirming that mistakes are the norm, not the exception. Existing fix‑up tricks trimmed errors only marginally while adding heavy compute costs and often introducing new bugs.
Enter MapleDoctor, a detection‑and‑repair pipeline built on those findings. In head‑to‑head tests it repaired 13.8% more queries than prior methods, mis‑repairing virtually none, and cut repair time by 67.4%. The code is already on GitHub, inviting independent verification.
The result is a modest but concrete step toward trustworthy LLM‑driven data pipelines. It shows that, without systematic debugging, in‑context learning remains brittle, especially compared with fine‑tuned models that have long‑standing error‑handling tricks.