Cursor Composer 2.5: Frontier Coding at a Fraction of the Cost

Cursor's Composer 2.5 is the in-house model that now sits behind the default agent in the editor. The headline from independent testing: it matches Opus 4.7 and GPT-5.5 on the coding benchmarks that matter, while costing somewhere between 10x and 60x less per task. For anyone watching a usage-based bill climb, that gap is the whole story.

The Benchmark Numbers, Plainly

Composer 2.5 scores 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1 — both in the same band as the frontier models from Anthropic and OpenAI. Against the previous Composer 2, the biggest jump is on the hardest tier of SWE-Bench Pro, up 35 points. Translation: it got meaningfully better at the long, multi-file changes that used to need a top-tier model.

Pricing tells you why teams are switching the default. The standard variant runs $0.50 per million input tokens and $2.50 per million output. The Fast variant — which is the default inside Cursor — runs $3.00 and $15.00. Same model, different serving speed and price.

Standard vs. Fast: Which to Pick

The Fast variant finishes a task about 30% quicker (roughly 6.7 minutes versus 9.3) but costs about 6x more per task. That math points to a simple rule: use Fast when you're in a tight edit-test loop and waiting on the agent is the bottleneck. Switch to standard for background or batch work where three extra minutes don't matter and the cost saving compounds across dozens of runs.

Pro Tip

Cursor defaults you to Fast. If most of your agent work is fire-and-forget — overnight migrations, bulk refactors, test generation — flip to the standard variant in model settings and watch your per-task cost drop by roughly 6x with only a few minutes added per task.

How to Switch and Test It Yourself

Open the model picker in the agent panel, choose Composer 2.5, and pick the variant. Then run the same real task you'd normally hand a frontier model and compare. Don't trust the benchmark in the abstract — give it one of your actual gnarly tickets.

plaintext

Refactor the checkout flow so payment, tax, and shipping each live in
their own module with a shared interface. Keep every existing test
green, add tests for the new module boundaries, and don't touch the
public API. Show me the file list before you start editing.

If the result holds up on a task like that, the cost difference makes Composer 2.5 the obvious default for most work, with a frontier model held in reserve for the rare problem that genuinely stumps it.

When to Still Reach for a Frontier Model

Matching on benchmarks is not the same as matching on every task. For deep architectural reasoning, ambiguous specs, or problems where one wrong assumption cascades, a top-tier model still earns its price. The right setup is a default of Composer 2.5 with a one-click escalation to Opus or GPT-5.5 when a task fights back.

Frequently Asked Questions

Is Composer 2.5 as good as Opus 4.7 or GPT-5.5?

On coding benchmarks like SWE-Bench Multilingual (79.8%) and CursorBench v3.1 (63.2%), it lands in the same band. For deep architectural reasoning or ambiguous specs, a frontier model can still pull ahead — keep one available for the hardest tasks.

What's the difference between the standard and Fast variants?

Same model, different serving speed and price. Fast finishes about 30% quicker but costs roughly 6x more per task. Use Fast for tight interactive loops and standard for background or batch work.

Which variant does Cursor use by default?

The Fast variant is the default inside Cursor. If most of your agent work is fire-and-forget, switch to the standard variant in model settings to cut per-task cost by about 6x.