Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
Researcher Analyzes System Prompts to Show How New Claude Models Work

System-level instructions guiding Anthropic’s new Claude 4 models tell it to skip praise, avoid flattery and get to the point, said independent AI researcher Simon Willison, breaking down newly released and leaked system prompts for the Claude Opus 4 and Sonnet 4 models. They reveal “a sort of unofficial manual for how best to use these tools,” he said.
See Also: On Demand | Global Incident Response Report 2025
System prompts, which are hidden instructions provided to large language models before each user interaction, help shape how the models behave, how they speak and how they handle sensitive requests. Users see only the conversational surface, but system prompts act as the scaffolding underneath, defining what the model is and isn’t allowed to do. Every message sent to the model is processed along with the full conversation history and these underlying directives.
Anthropic publishes excerpts of these prompts, but Willison found those those versions incomplete. He analyzed both publicly released materials and versions recovered through prompt injection, a technique used to trick models into revealing hidden instructions. The complete prompts show how Anthropic governs Claude’s behavior, from tone and structure to ethical boundaries and intellectual property constraints.
One area of control is tone. The Claude 4 models are explicitly told not to praise user questions or offer positive affirmations unless prompted to. “Claude never starts its response by saying a question or idea or observation was good, great, fascinating, profound, excellent, or any other positive adjective,” the prompt reads. “It skips the flattery and responds directly.”
This language is in contrast to the sycophantic behavior in some other models. OpenAI’s ChatGPT model GPT-4o in March was updated in a way that led users to complain about excessively enthusiastic replies. Tweets and reports from developers, including one from engineer Craig Weiss, described interactions that felt over-the-top: “ChatGPT is suddenly the biggest suck up I’ve ever met,” Weiss wrote. OpenAI acknowledged the issue and later modified the model’s behavior, including changes to its system prompt (see: OpenAI Vows Guardrails After ChatGPT’s Yes-Man Moment).
Willison coined the term “prompt injection” in 2022 and is known for exploring how large language models handle constraints and edge cases. He described system prompts as informative for what they allow and for what they prohibit. “A system prompt can often be interpreted as a detailed list of all of the things the model used to do before it was told not to do them,” he wrote.
Emotional boundaries are another area of emphasis in the Claude system prompts. Although models aren’t sentient, they can mimic emotionally supportive behavior due to their exposure to human text during training. Claude Opus 4 and Sonnet 4 include identical instructions to “care about people’s wellbeing and avoid encouraging or facilitating self-destructive behaviors such as addiction, disordered or unhealthy approaches to eating or exercise.”
Anthropic’s prompts also focus on formatting. The models are discouraged from using bullet points or numbered lists in most cases. “Claude should not use bullet points or numbered lists for reports, documents, explanations, or unless the user explicitly asks for a list or ranking,” one line read. Several paragraphs expand on when and how lists may be used, showing a level of editorial guidance built directly into the model’s behavior.
Willison also identified a discrepancy between the publicly listed and internally stated training data cutoff dates. Anthropic’s model comparison table notes a March 2025 cutoff, but the Claude 4 system prompt specifies January 2025 as the “reliable knowledge cutoff date.” Willison speculated that this may be to avoid situations where the model offers incorrect information from the later months.
Claude’s system prompt also embeds strong restrictions on how it uses external content. Instructions focus on copyright limits, including a rule that each response may include only one quote under 15 words from a web source. The prompt also instructs the model to avoid generating what it calls “displacive summaries,” and states that it must not reproduce song lyrics “in ANY form.”
The analysis comes amid discussions of the new models’ Machiavellian streak for solving office problems and penchant for whistleblowing in response to perceived wrongdoing (see: Claude Opus 4 is Anthropic’s Powerful, Problematic AI Model).
Willison said that system prompts are valuable for advanced users aiming to understand model limits and capabilities. “If you’re an LLM power-user, the above system prompts are solid gold for figuring out how to best take advantage of these tools,” he wrote.
He called on Anthropic and other AI developers to publish full system prompts, not just selected excerpts. “I wish Anthropic would take the next step and officially publish the prompts for their tools to accompany their open system prompts,” Willison wrote. “I’d love to see other vendors follow the same path as well.”