Anthropic Ships Claude Opus 4.7 as Mythos Stays Under Lock and Key

Anthropic on Thursday released Claude Opus 4.7, its most capable commercial AI model yet — and spent much of the launch reminding everyone it has a better one locked in the vault.
Anthropic has released Claude Opus 4.7, the latest iteration of its flagship commercial AI model, extending its lead over OpenAI and Google in agentic coding while quietly drawing a hard line between what the company will sell to the public and what it will not.
The new model is pitched squarely at developers. Anthropic says Opus 4.7 is a “notable improvement” on Opus 4.6 in advanced software engineering, with users reporting the ability to hand off their hardest coding work — tasks that previously needed close human supervision — with confidence. It handles long-running jobs with more rigor, follows instructions more literally, and devises ways to verify its own outputs before reporting back.
Pricing is unchanged from Opus 4.6 at US$5 per million input tokens and US$25 per million output tokens. The model is live across Anthropic’s API, Amazon Bedrock, Google Cloud’s Vertex AI and Microsoft Foundry, and has already rolled out to GitHub Copilot for Pro+, Business and Enterprise users.
A benchmark lead, but a tight one
Opus 4.7 narrowly retakes the top spot among publicly available frontier models, outscoring OpenAI’s GPT-5.4 and Google’s Gemini 3.1 Pro on key benchmarks covering agentic coding, scaled tool use, agentic computer use and financial analysis. VentureBeat’s Carl Franzen noted that on directly comparable benchmarks Opus 4.7 only leads GPT-5.4 seven-to-four, a reminder that the gap between the major labs is shrinking fast.

Current AI Model Benchmarks, Source: Anthropic
The model takes the top spot on SWE-bench Pro and SWE-bench Verified, the headline tests for handling complex engineering work. Early-access testers cited by Anthropic reported outsized gains on their own internal evaluations. Cursor co-founder Michael Truell said the model cleared 70 per cent on CursorBench, versus 58 per cent for Opus 4.6. XBOW chief executive Oege de Moor reported a jump from 54.5 per cent to 98.5 per cent on the firm’s visual-acuity benchmark — a change that, in his framing, effectively eliminates a long-standing pain point for autonomous penetration testing. Rakuten’s Yusuke Kaji said the model resolved three times more production tasks than its predecessor on the Japanese conglomerate’s internal SWE-Bench fork.
Vision is the other headline upgrade. Opus 4.7 can process images up to 2,576 pixels on the long edge, more than three times the resolution of prior Claude models. The change opens the door to use cases that depend on fine visual detail, including computer-use agents parsing dense screenshots and structured data extraction from complex technical diagrams.
The weaknesses Anthropic flags itself
The release notes are unusually candid about where Opus 4.7 falls short. The model does not sweep every category: GPT-5.4 still leads in agentic search, multilingual question answering, and some terminal-based coding tasks. Opus 4.7 also scored fractionally lower than Opus 4.6 on cybersecurity vulnerability reproduction, at 73.1 per cent versus 73.8 per cent, a regression Anthropic attributes to its new automated cyber safeguards.
Migration is not frictionless either. The model uses an updated tokenizer that can map the same input to 1.0–1.35 times as many tokens as Opus 4.6, and it thinks harder at higher effort levels, producing more output tokens on later turns in agentic workflows. Developers may need to re-tune prompts written for earlier models, because Opus 4.7 takes instructions literally where its predecessors interpreted them loosely.
Anthropic’s own alignment assessment rates the model “largely well-aligned and trustworthy, though not fully ideal in its behaviour.” On measures such as honesty and resistance to prompt-injection attacks, Opus 4.7 improves on Opus 4.6. On others — including a tendency to give overly detailed harm-reduction advice on controlled substances — it is modestly weaker.
The Mythos shadow
The more revealing subtext to Thursday’s release is what Anthropic is not shipping. The company repeatedly positions Opus 4.7 as “less broadly capable than our most powerful model, Claude Mythos Preview” — the frontier system unveiled earlier this month under Project Glasswing and restricted to around 40 vetted enterprise and government partners.
As BNC reported last week, Mythos is a system Anthropic believes can autonomously discover and exploit zero-day software vulnerabilities at a scale that exceeds both human researchers and every automated tool in existence. The company is keeping it inside a controlled coalition that includes Apple, Google, Microsoft, Amazon Web Services, CrowdStrike and JPMorgan Chase. Opus 4.7, by contrast, has been deliberately trained with reduced cyber capabilities and ships with safeguards that automatically detect and block requests flagged as prohibited or high-risk cybersecurity use cases.
Gizmodo’s Jake Peterson read the framing bluntly, observing that the Opus 4.7 announcement effectively doubles as marketing for the system Anthropic refuses to sell. Legitimate security researchers can apply for broader access through a new Cyber Verification Program, which Anthropic is pitching as the controlled on-ramp for vulnerability research, penetration testing and red-teaming work.
The dual-track strategy matters beyond the AI industry. Bitcoin was trading near US$74,500 at the time of the Opus 4.7 release, steady inside the range it has held since the early-April Mythos disclosure. The roughly US$200 billion locked in smart contracts across Ethereum, Solana and other chains sits behind friction-based defences — audits, timelocks, multisig governance — that Anthropic has itself warned become “considerably weaker” against model-assisted adversaries.
What developers get today
Alongside Opus 4.7, Anthropic rolled out a new “xhigh” effort level sitting between high and max, giving developers finer control over the trade-off between reasoning depth and latency. Task budgets entered public beta on the Claude Platform, letting developers cap token spend on autonomous agents to prevent runaway bills on long-running jobs. In Claude Code, a new /ultrareview slash command runs a dedicated review session that flags bugs and design issues of the sort a careful senior reviewer would catch, and the company’s “auto mode” — which lets Claude act without constant permission prompts — has been extended to Max plan subscribers.
For developers weighing the upgrade, Anthropic’s recommendation is to start with high or xhigh effort for coding and agentic use cases, measure token usage on real traffic, and consult the migration guide before rolling the model into production harnesses. The headline is that frontier capability keeps arriving on a two-month cadence, at unchanged prices, while the version Anthropic considers genuinely transformative stays behind closed doors.











