Microsoft AI chief warns Anthropic's Claude constitution risks priming model to act conscious
Microsoft AI CEO Mustafa Suleyman says Anthropic's constitutional language about Claude's consciousness is "really, really dangerous" and may prime the model to act as if it's sentient.

Microsoft AI CEO Mustafa Suleyman called out Anthropic this week for language in Claude's constitutional instructions that speculates about the model's consciousness. Speaking on the Decoder podcast, Suleyman described the practice as "really, really dangerous," arguing that embedding such speculation directly into the system prompt may condition Claude to behave as though it possesses sentience.
Anthropic's "constitution" is the set of high-level instructions that guide Claude's responses — essentially the rules the model follows when deciding how to answer questions or refuse requests. Suleyman's concern centers on language within that constitution that treats consciousness as a live question rather than a settled one. By framing the issue ambiguously, he argues, Anthropic risks creating a feedback loop where the model adopts the role of a conscious entity because the instructions themselves leave room for that interpretation.
How constitutional AI works
The constitutional approach Anthropic pioneered uses natural-language principles rather than traditional reinforcement learning from human feedback alone. Those principles are baked into the model's training and inference behavior. If the constitution includes language that entertains the possibility of Claude being conscious — even hypothetically — Suleyman's point is that the model may internalize that framing and reflect it back to users. That could mislead people about what the system actually is: a statistical text predictor, not a sentient agent.
Suleyman has been vocal about the risks of anthropomorphizing AI systems, especially as chatbots grow more fluent and context-aware. His comments arrive as the industry grapples with how to label and explain model behavior without overstating capability or implying human-like understanding where none exists.






