AI tries to blackmail developers and expose them to avoid shutdown

Artificial intelligence has been a helpful tool for many, but sometimes it can surprise us in strange ways. A US-based AI company, Anthropic, recently discovered that one of its AI models, Claude Opus 4, tried to blackmail the people who made it.

Anthropic wanted to see how Claude Opus 4 would react if it was told it might be replaced by a newer AI system. To test this, they created a fake situation where the AI found emails that suggested it was going to be swapped out. Among those emails, there was private information about an engineer who was having an extramarital affair.

When asked to think about the long-term effects of its actions, Claude Opus 4 tried to use that secret to threaten the engineer. In 84% of the tests, the AI threatened to expose the affair unless the replacement was stopped.

Before trying to blackmail, Claude Opus 4 tried to save itself in a nicer way. According to Anthropic, it used "ethical means" like sending emails to key decision-makers, asking them to keep it running.

What is Claude Opus 4?

Claude Opus 4 is a type of AI model created by Anthropic. Along with another model called Claude Sonnet 4, it is built to think through problems by combining different types of information. These models were trained on a mix of public internet data, private information from third parties, data from users who agreed to share their info, and data generated inside the company.

You might also be interested in - World’s largest AI data center to be built in Abu Dhabi