How Anthropic’s Claude Opus 4.7 is better than previous models

Claude

Anthropic released Claude Opus 4.7 on Thursday (17 April), an incremental but significant upgrade to its Opus 4.6 model that shines particularly in advanced software engineering, complex multi-step workflows, and high-resolution vision tasks.

While the company notes that Opus 4.7 remains “less broadly capable” than its recently previewed Claude Mythos model, the new release brings tangible improvements for professional users tackling difficult, real-world work.

Big leaps in coding and agentic performance

Opus 4.7 stands out most in software engineering scenarios, where it handles long-running, complex tasks with greater rigor, consistency, and self-verification.

Key benchmark gains over Opus 4.6 include:

  • 93-task coding benchmark: 13% improvement in task resolution; solved four problems that neither Opus 4.6 nor Sonnet 4.6 could crack.
  • CursorBench: Jumped to 70% from 58%.
  • Rakuten-SWE-Bench: Resolved 3x more production-level tasks, with double-digit gains in code quality and test quality.
  • Notion Agent complex workflows: +14% performance while using fewer tokens and producing one-third the tool errors.
  • Hebbia evals: Double-digit improvements in tool-calling accuracy and planning.

Early testers highlighted the model’s ability to autonomously build complex systems, such as a Rust text-to-speech engine from scratch, while catching its own errors, fixing race conditions, and maintaining focus over extended sessions.

Anthropic also introduced a new “xhigh” effort level (between “high” and “max”) for finer control over reasoning depth versus speed, now the default for many coding tasks in Claude Code.

Opus 4.7
Opus 4.7

Stronger vision and document understanding

Vision capabilities saw a dramatic upgrade. On the XBOW visual-acuity benchmark, Opus 4.7 scored 98.5%, compared to just 54.5% for Opus 4.6.

The model now supports significantly higher-resolution images (up to ~3.75 megapixels) and performs better on technical diagrams, chemical structures, dense screenshots, and pixel-perfect references — making it more useful for computer-use agents and professional document workflows.

It also showed a 21% reduction in errors on Databricks’ OfficeQA Pro for document reasoning and delivered strong results on finance analysis tasks.

Users report that Opus 4.7 follows prompts more literally, resists “dissonant data” traps, and maintains better memory across long, multi-session projects using file system-based context.

It produces higher-quality professional outputs, from UI designs and slide decks to data-rich dashboards, with improved aesthetic taste and creativity.

The model also demonstrates more sustained reasoning, honest acknowledgment of its limits, and graceful error recovery during tool use.

Opus 4.7 is available immediately across all Claude products. Pricing remains unchanged from Opus 4.6: $5 per million input tokens and $25 per million output tokens.

Note that the updated tokeniser can map the same input to 1.0–1.35× more tokens, and the model tends to be more verbose at higher effort levels. Anthropic recommends re-tuning prompts and monitoring usage during migration.

Now read: UK launches £500 million Sovereign AI initiative

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *