Students Are Prompt Injecting AI Graders to Inflate Their Scores

AI grading systems can be gamed with a few injected words — and existing defenses are not holding up.

Researchers published a systematic study of prompt injection attacks against large language model-based automatic grading systems. The core finding: students can embed hidden instructions inside their submitted answers — text like commands telling the grading model to award full marks — and the systems comply. The researchers tested these attacks across rubric-based grading setups, the kind most likely to appear in real classrooms, and found current LLM graders remain highly vulnerable. Existing defensive strategies were evaluated and found largely ineffective against the attack patterns studied.

This matters because AI-assisted grading is already moving into education at scale. LLMs are attractive for graders precisely because they follow natural language instructions well and require no custom training per course — the same properties that make them useful also make them susceptible to student-authored counter-instructions embedded in submitted text. A grading system that can be fooled by a sentence hidden in a homework submission is not a grading system anyone should trust for high-stakes assessment.

Prompt injection is not a new problem — it has dogged LLM-powered applications in customer service, code assistants, and document summarizers for years — but its migration into educational assessment adds a dimension of fairness and institutional integrity that pure enterprise hacks do not. The study stops short of proposing a working fix, which is either honest or inconvenient, depending on how urgently your institution is already deploying these tools.

← Back to the front page