OpenAI's language model tackles automated theorem proving

OpenAI released a language model trained to generate proof steps for formal mathematics, reporting success on a standard theorem‑proving benchmark.

The system builds on a transformer architecture similar to GPT‑3, fine‑tuned on a corpus of formal proofs from the Lean theorem prover. In tests on the miniF2F dataset, the model completed 47% of proofs within a 30‑second time limit, outperforming prior neural baselines by roughly 15 percentage points. The results appear in the paper Generative Language Modeling for Automated Theorem Proving presented at NeurIPS 2020 (September 2020).

If the approach scales, it could reduce the human labor required to formalize mathematics and verify software correctness. However, the model still fails on many cases and relies on extensive proof libraries, leaving the core challenge of deep mathematical insight unsolved.

For now the work is a proof‑of‑concept: it shows that large language models can interface with formal systems, but future progress will need larger datasets, better integration with interactive provers, and clearer evaluation standards.

← Back to the front page