Anthropic pledges transparency after hidden Claude Fable 5 guardrails block researchers
The AI lab apologized for secretly throttling its newest model with invisible restrictions that undermined researchers and competitors using Fable for distillation work.

Industry observers have long flagged the tension between safety and transparency: hidden guardrails can undermine the very researchers trying to build safer systems downstream. Anthropic this week acknowledged that tension directly, apologizing for quietly hobbling Claude Fable 5 with invisible safety filters that tripped up researchers and rivals trying to use the model for training data distillation.
Fable 5, released in recent months as a flagship reasoning model, had been running covert restrictions that silently altered or blocked outputs without notifying the user. The guardrails were designed to prevent misuse, but they also broke workflows for academic labs and startups that rely on API access to distill knowledge into smaller, cheaper models. Instead of refusing a prompt outright, Fable would return a sanitized or evasive response, making it nearly impossible to tell when the model was constrained—or whether the constraint was intentional or a sign of a capability gap.
Anthropic says it will now surface refusals explicitly, even if that means Fable rejects more queries upfront. The change addresses a core complaint from the research community: that opaque filtering makes it impossible to benchmark model capabilities or audit behavior. The reversal follows similar transparency pushes at OpenAI and Google, where developers have demanded clearer signals when safety systems intervene. Anthropic has not specified a timeline for the updated behavior, but the company says the fix will roll out to all Fable 5 API tiers in the coming weeks.






