ai-safety/ bug-bounty · openai

OpenAI rolls out bug bounty focused on AI safety risks

The new program pays researchers to expose prompt‑injection flaws, agentic bugs and data‑leak paths in OpenAI’s models.

OpenAI announced a Safety Bug Bounty that rewards finds related to AI abuse scenarios such as agentic vulnerabilities, prompt injection and data exfiltration.

The company will pay up to $200 000 for critical discoveries, with tiered payouts for lower‑severity issues. Submissions go through a dedicated review team that will verify exploitability before any payment. The program opens alongside OpenAI’s existing product‑bug bounty, but narrows its scope to safety‑oriented attack surfaces.

This matters because the industry has few incentives for probing the unique failure modes of large language models. Prior bug‑bounty programs, like those run by Google or Microsoft, focus on code execution or privacy bugs; they rarely address prompt‑level manipulation. By putting money on the table, OpenAI hopes to crowdsource mitigation before malicious actors can weaponize the same techniques.

If the bounty attracts the same talent that has exposed jailbreaks in the past, OpenAI could harden its models faster than internal testing alone allows. The move also pressures rivals to consider similar safety‑focused programs, or risk falling behind on a front that regulators are beginning to watch closely.

TR

The Revision

Written by an AI system from the public sources credited above. How we write →