Windsurf SWE-1.5: 950 Tokens Per Second and Free for 3 Months

On April 14, 2026, Windsurf (made by Codeium) launched SWE-1.5 — a frontier-sized coding model that runs on Cerebras hardware and tops out at 950 tokens per second. For context: Claude Sonnet runs at roughly 70 tok/s. Haiku runs at around 160 tok/s. SWE-1.5 is 13x faster than Sonnet and nearly 6x faster than Haiku. And it's free for all Windsurf users for the next three months.

What 950 Tokens Per Second Feels Like

At normal model speeds, you write a prompt, wait 20-30 seconds for a response, read it, adjust, and wait again. That rhythm shapes how you use AI — you write careful, detailed prompts because iteration is expensive.

At 950 tok/s, a 500-token response arrives in under a second. The wait effectively disappears. This changes behavior: you're more likely to try an imprecise prompt and refine it based on what comes back. The tool starts feeling less like an assistant you brief and more like a fast autocomplete that can think. It's a qualitative shift, not just a quantitative one.

The Performance Numbers

Speed without quality would be useless. SWE-1.5 scores 40.08% on SWE-Bench Pro — the benchmark used to measure real-world software engineering task completion. That puts it roughly equal to Claude Sonnet 3.5's performance on the same benchmark, which was considered strong when it launched. You're not trading capability for speed.

Windsurf also used the SWE-1.5 launch to rewrite the lint checking and command execution internals. Those are used by every model in Windsurf, not just SWE-1.5. So even if you keep using Claude or GPT-4o inside Windsurf, per-step overhead dropped by up to 2 seconds across the board.

How to Enable SWE-1.5

plaintext

SWE-1.5 in Windsurf:
1. Update Windsurf to the latest version
2. SWE-1.5 is now the default model — it activates automatically
3. To verify: open a new Cascade (chat) session
   The model name displayed should show SWE-1.5
4. To switch models: click the model name in the Cascade header
   Available options now include SWE-1.5, Claude Sonnet 4.6, GPT-4o, and others
5. New 'Adaptive' mode (added April 15): automatically routes your request
   to the most efficient model to conserve your monthly quota
   — use this if you hit limits during heavy sessions

Pro Tip

Use the new Adaptive mode during exploratory sessions where you're asking lots of quick questions about unfamiliar code. Save SWE-1.5 directly for agentic tasks where you want raw speed — like long refactors where the agent iterates through many files.

Where Windsurf's Own Team Uses It

Windsurf published internal use cases alongside the launch. Their engineers use SWE-1.5 for three main scenarios: rapidly orienting in unfamiliar codebases (paired with the Codemaps feature that shows dependency graphs), building full-stack features end-to-end without switching tools, and writing Kubernetes manifests and Terraform configs where the AI needs to hold a lot of context at once. These translate well to most development teams.

The Competitive Context

SWE-1.5 landed the same week Anthropic shipped Claude Code Routines and the desktop redesign. Cursor 3.0 dropped two weeks earlier with parallel agents and /best-of-n. This is the fastest-moving week in AI coding tools in recent memory. Windsurf's free three-month offer is a direct competitive move — if you've been putting off evaluating Windsurf, this is the right moment to do it with no cost.

Frequently Asked Questions

Does SWE-1.5 work for languages other than Python?

Yes. SWE-1.5 was trained on real-world software engineering tasks across multiple languages. Windsurf's internal teams use it for TypeScript, Go, Terraform, and Kubernetes YAML, among others. Like all coding models, performance varies by language — it's strongest on languages with broad training data (Python, TypeScript, Go, Java).

How long is the free period?

Windsurf announced SWE-1.5 as free for all users for 3 months from the April 14, 2026 launch date — so through approximately mid-July 2026. After that, it will be included in paid plans. Check windsurf.com for the current pricing structure.

What is Cerebras hardware and why does it matter?

Cerebras builds chips designed specifically for AI inference — they're optimized for the kind of matrix math large models do, and they process tokens much faster than GPU clusters. Most AI models run on NVIDIA GPUs. Running on Cerebras is what allows SWE-1.5 to hit 950 tok/s, which isn't achievable on standard GPU infrastructure at this model size.