AI models trained to write formal mathematical proofs are getting better feedback — mid-attempt, not just after.
Researchers have found that Lean, a widely used formal proof assistant, can do more than check whether a completed proof is correct. During training, Lean can evaluate each individual proof step — called a tactic — and flag exactly where reasoning first goes wrong. The team parsed proof attempts into tactic sequences, used Lean's internal type-checking to mark which steps were locally sound, and fed that fine-grained signal back into a reinforcement learning objective. Models tested include STP-Lean and DeepSeek-Prover-V1.5, and the approach outperformed outcome-only training on benchmarks including MiniF2F and ProofNet.
Most reinforcement learning from verifiable rewards works on a binary signal: right or wrong, full stop. Using a symbolic system as a step-level oracle is a meaningful shift, because it gives the model something closer to how a human tutor corrects work — pointing at the first bad line rather than just returning a failing grade. That density of feedback, grounded in formal logic rather than a neural model's judgment, is harder to game than softer reward signals.
The approach is narrow by design — formal proof verification is a domain where you can get reliable automated feedback at all, which is precisely why it works. The harder question is whether anything like this transfers to domains where no Lean-style oracle exists, and so far that answer remains open.