Claude Opus 4.6 for software engineering

Claude Opus 4.6 for software engineering

Anthropic's latest SOTA model, Opus 4.6, raises the bar again for LLM-written code. Here's how it changes the day-to-day for engineering teams.

by on

Anthropic just dropped the latest SOTA model Opus 4.6, and a mere couple months after their last flagship model Opus 4.5 raised the bar for LLM-written code. We tried it out to see how it changes the day-to-day for engineering teams.

The biggest improvement: better task focus

Opus 4.6 can work longer and more continuously without dropping context. This is a huge improvement for Claude Code/coding agent performance, since agents can now work more independently without needing as much supervision/correction.

Much of this is can be attributed to 4.6’s 1 million token context window (only for API users, subscribers will still get the 200k token window). Longer tasks will no longer hit that limit, so Opus 4.6 can maintain higher quality output for more complex problems.

Token usage + efficiency

Opus models are notorious for their token consumption, and many users have to carefully budget what they use these models for. Users on Pro plans (the entry-level paid tier) might only get a few prompts before they hit their usage limits.

Claude already offers the ability to toggle “effort”, e.g. how hard the model will work to solve a given prompt. Opus 4.6 automatically decides when to opt into “extended thinking” based on the context of a prompt. It also has a smarter approach to context compaction, which summarizes the history of a convo so that it uses fewer tokens to retrieve it.

These improvements help mitigate the already-high token demand that a model like Opus 4.6 requires. However, it still is demanding, even more so than Opus 4.5 (likely due to longer task follow-through). To really take advantage of its coding capacity, you may need to upgrade to one of the higher tiers. Here’s how to keep track of your CC token usage.

Software engineering task performance

Since Opus 4.6 has added some of the optimizations above, it’s now better suited for long-winded engineering tasks, especially refactoring and test writing, given you have the token budget. Previously, Gemini 3 Pro was the model of choice for massive scale tasks in large codebases, thanks to its 1M token context window, but Opus might now overtake that (for API users, at least).

It benchmarks well (highest marks on Terminal-Bench, BrowserComp, and OSWorld), and engineering leads have spoken very highly about its beta. However, Anthropic still might need to balance the token spend/token allotment for its plans so subscribers can use these models to their capacity.

Environments for Claude Code

Claude Code works best when you give it access to on-demand preview environments so it can validate the code it writes. With Opus 4.6’s ability to get into a “flow state”, you can let it work longer and more continuously by pushing code to environments, viewing the changes live, pulling logs, and running tests.

Shipyard makes these workflows easy, for you and for Claude. Try it free today for 30 days, and watch the quality of your agent-written code improve.

Try Shipyard today

Get isolated, full-stack ephemeral environments on every PR.

About Shipyard

Shipyard manages the lifecycle of ephemeral environments for developers and their agents.

Get full-stack review environments on every pull request for dev, product, agentic, and QA workflows.

Stay connected

Latest Articles

Shipyard Newsletter
Stay in the (inner) loop

Hear about the latest and greatest in cloud native, agents, engineering, and more when you sign up for our monthly newsletter.