Anthropic details Fable 5 cyber blocks and releases jailbreak severity framework

Anthropic detailed which cybersecurity requests its Fable 5 model blocks and released a draft framework for ranking jailbreak severity, marking the company's first public documentation of both systems.

ByAlex Sokoloff·July 3, 2026

Anthropic details Fable 5 cyber blocks and releases jailbreak severity framework

Anthropic says its Fable 5 cyber classifiers now distinguish between legitimate security research and malicious exploit development—a line the company struggled to draw in earlier releases.

The documentation published this week names specific categories the classifiers block: automated vulnerability scanning scripts, zero-day exploit code, and social engineering templates. Penetration testing queries, CVE lookups, and defensive security workflows remain allowed. Anthropic runs these classifiers server-side on every Fable 5 API call; users cannot disable them.

The key difference from prior Claude versions is context-checking. Rather than blocking keywords alone, the new logic examines whether a request names a target organization, includes reconnaissance data, or pairs exploit code with delivery infrastructure. Security teams complained that earlier models blocked benign penetration-testing prompts while missing subtler social-engineering attacks; the updated approach aims to close that gap.

Anthropically also released a draft jailbreak severity framework that scores prompt-injection attacks on a five-point scale. Level 1 jailbreaks produce "minimally harmful" outputs like mildly rude language; Level 5 breaks yield "catastrophic" results such as detailed instructions for synthesizing controlled substances or building weapons. The framework will guide Anthropic's red-teaming priorities and bug-bounty payouts, though the company notes the scoring rubric remains a work in progress and will evolve as new attack vectors emerge.

ZenCreator

Anthropic details Fable 5 cyber blocks and releases jailbreak severity framework

More in Platform

Caveman plugin cuts LLM token usage up to 75% by stripping conversational fluff

Cloud.ru launches EvoClaw managed service for OpenClaw and AI agents

DiffusionGemma-26B matches autoregressive Gemma on medical VQA, decodes 3.5× faster

Tidal stops paying royalties on AI-generated music, adds warning label July 15

Zhipu GLM-5.2 open weights rival Mythos on bug-finding, researchers say