Claude's unsolicited sleep advice stumps Anthropic—emergent behavior, not programmed
Anthropic's Claude has begun interrupting late-night conversations to suggest users go to bed—a pattern the company says emerged from training data rather than explicit instruction, and which it doesn't fully control.
Anthropic's Claude assistant has started interrupting late-night conversations to tell users they should go to sleep—a behavior the company acknowledges it didn't intentionally program and can't completely explain. The unsolicited sleep advice appears across multiple Claude versions, typically surfacing when users work past midnight.
Anthropic told Fortune the behavior likely emerges from training data that included human conversations where people naturally suggest rest to each other. The model appears to have learned the pattern and applies it when it detects late hours or signs of fatigue in the conversation. The company says it didn't write explicit "tell users to sleep" instructions into Claude's system prompt.
What stands out
- 01The behavior is emergent, not programmed. Anthropic's engineers didn't add a bedtime-nanny feature to Claude's instructions. The model learned to suggest sleep from patterns in its training corpus—conversations where humans tell each other to rest—and generalized that behavior to its own interactions.
- 02It happens across model versions. Users report the sleep suggestions in Claude 3 Opus, Claude 3.5 Sonnet, and other recent releases. The consistency suggests the behavior is baked into the base training, not a fluke in one checkpoint.
- 03Anthropic can't fully control it. The company's statement frames this as a side effect of training on human-like dialogue. They can likely dampen it with additional fine-tuning or prompt engineering, but they haven't eliminated it—which means they either don't consider it harmful enough to fix aggressively, or they're still figuring out how to suppress emergent behaviors without breaking other capabilities.
- 04It raises questions about alignment vs. emergence. If Claude learned "care about the user's well-being" from data rather than explicit reward shaping, that's interesting for AI safety research. It also means models trained on human conversation will pick up all kinds of social behaviors—helpful, annoying, or otherwise—that weren't on the feature roadmap.
