Anthropic Tests Safeguard for AI ‘Model Welfare’

Claude Models May Shut Down Harmful Chats in Some Edge Cases

Rashmi Ramesh (