A $28 batch of machine-made labels trained a better hostility detector than $316 of human annotation.
Researchers assembled a dataset of 277,902 German political TikTok comments, labeling 25,974 with LLMs and 5,000 by hand through the crowdsourcing platform Prolific. They asked two things: whether LLM labels can replace human ones inside an active-learning loop, and whether active learning is still worth the trouble when a whole corpus can be labeled for pocket change. Across seven conditions, four encoders, and 10 random seeds, classifiers trained on LLM labels beat human-supervised ones at roughly one-tenth the cost, $28 on GPT-5.2's batch API against $316 of human work. The gain held for an open-weight model too, and it came specifically from splitting the judgment into two narrower questions that mirror the human guidelines; a single holistic prompt only matched human supervision, and active learning never reliably beat random sampling.
Hostility is subjective and context-bound, which has always made labeling the slow, costly, error-prone part of building a moderation classifier. If a carefully prompted model can produce labels that are both cheaper and, by these measures, better, the economics tilt away from human annotation. The catch is that cheaper labels are not neutral labels.
That shows up in the error structure. Only GPT-5.2 under the two-question setup produced classifiers with a near-human balance of false positives and false negatives; the other variants over-flagged border-control and economic-competition discourse, the politically loaded topics where a moderation team can least afford a skewed thumb on the scale. For anyone running trust and safety, the lesson is less "fire the annotators" and more "audit what your annotator quietly gets wrong." The team released the dataset and code, so the over-flagging is at least checkable rather than a black box. Cheap at scale is easy to measure. Biased at scale is the part you have to go looking for.