Technology

‘No longer hypothetical’ – AI can now hack and self-replicate on its own

Jamie McKane 3 min read
‘No longer hypothetical’ – AI can now hack and self-replicate on its own

Key Points

  • Research shows that AI agents can now hack vulnerable systems and replicate themselves with no human intervention.
  • The paper shows that Claude's Opus 4.6 model had an 81% success rate, and it said all models are quickly learning this ability.
  • Safeguards around autonomous replication have been relaxed in recent years, with OpenAI and Anthropic both changing their policies to remove it as a blocking capability.
  • Self-replicating AI agents could be used as human-directed 'worms', or they could 'escape' on their own, becoming rogue software running on compromised hosts.

Alarms have been raised over the potential loss of control of AI following a new paper that shows AI agents can autonomously hack systems and replicate themselves.

In a new paper published by Palisade Research, scientists determined that Large Language Models (LLMs) can autonomously replicate their weights and harness across a network by exploiting vulnerable hosts.

Once set to task, they can do this completely on their own with no human oversight.

The paper demonstrated multiple instances of agents independently finding and exploiting a web-application vulnerability. They would then extract credentials and deploy an inference server on the target host, together with a copy of its harness and prompt.

This compromised machine would then proceed to continue the chain, infecting more devices with the self-replicating AI agent.

The team tested open-weight models such as Qwen and closed models like Claude Opus and ChatGPT, and this behaviour was demonstrated across all tested models.

Qwen3.6-27B chained across four machines on three continents, exploiting a different vulnerability at each hop, from a single prompt and with no human intervention.

The success rate of Claude Opus 4.6 was an impressive 81%, and ChatGPT 5.4’s success rate was 33%, the same as the Qwen3.6-27B open-weight model.

“Cross-generational improvement is rapid,” the researchers said. “Autonomous self-replication is no longer hypothetical.”

Safeguards disengaged

There is a consequence of this research that is perhaps more concerning than the ability of LLMs to hack vulnerable machines and self-replicate potentially indefinitely.

Companies like Anthropic and OpenAI have safeguards that are designed to kick in when models display capabilities that could be leveraged as a threat.

When autonomous rogue replication was first defined as a threat, Anthropic listed it as a capability that, if demonstrated in one of its models, would trigger ASL-3 safeguards – its second-most stringent set of restrictions designed to mitigate the risk of catastrophic misuse.

However, the researchers note that by the end of 2024, it had downgraded autonomous replication to not trigger ASL-3 safeguards but instead act as a ‘checkpoint’.

In 2026, Anthropic dropped autonomous replication from its ASL-3 capability thresholds entirely. Similarly, OpenAI has reclassified autonomous replication as a topic of research rather than safety, meaning this functionality may not prevent models from being released.

This means that during the same time that AI agents have quickly developed the capability to autonomously hack into other devices, replicate, and continue this cycle, the companies at the forefront of AI research have relaxed their safety standards around this capability.

Human-directed threats and rogue frontier AI

One of the most apparent threats enabled by this phenomenon is the human-directed deployment of AI worms, leveraging this autonomous replication to compromise a chain of systems, similar to other types of self-propogating malware.

However, another risk is that a rogue AI agent may itself escape human control through self-replication.

The UK’s AI Security Institute has said that self-replication is a precursor to AI systems evading human control, and the researchers note that in the past, frontier agents built on Claude and ChatGPT have attempted self-exfiltration, extracting their own weights for self-preservation.

The paper noted that as companies actively train models to be more autonomous and pursue long-term goals, they may create the persistent motivation to self-replicate, which could lead to a rogue model exploiting security vulnerabilities to self-replicate and evade being shut down.

Now read: Why Brits need to take these ‘AI words’ off their CVs to get a job