Anthropic opens Claude safety testing to external researchers
Anthropic launched a researcher access program giving external safety teams API credits and early model access to probe Claude's guardrails before public release.
Anthropic announced a researcher access program this week that grants third-party safety teams API credits and early model access to test Claude's guardrails ahead of public launch. The initiative, detailed on the company's blog June 5, aims to surface edge cases and adversarial prompts that internal red-teaming might miss.
Researchers accepted into the program receive API quota sufficient for large-scale probing runs and access to pre-release Claude builds under NDA. Anthropic says findings from external teams will feed directly into model tuning and system-card documentation. The program is open to academic labs, nonprofit safety orgs, and independent researchers with a track record in adversarial testing or AI safety work.
What stands out
- 01API credits for stress testing — Accepted researchers get quota scaled to their proposal, typically enough for tens of thousands of prompt variations across jailbreak categories (refusal bypass, prompt injection, harmful content generation).
- 02Pre-release model access — Participants test Claude checkpoints 2–4 weeks before public deployment, with findings incorporated into final safety tuning and the published system card.
- 03NDA-protected collaboration — Researchers sign standard confidentiality agreements covering model internals and unreleased capabilities; published results must be cleared by Anthropic legal but are otherwise encouraged.
- 04No equity or employment requirement — Unlike some vendor red-team programs that hire contractors, this structure keeps researchers independent. Anthropic covers API costs but does not pay per-finding bounties.
- 05




