Advanced Advanced Claude Code 8 min read ·

Subagent Token Costs: When 7x is Actually Worth It

Subagent-heavy workflows in Claude Code can use 7x the tokens of a single thread. Here's how to tell when that cost pays off and when to skip them.

If you read any 2026 Claude Code guide, you know the pitch: subagents handle specialized tasks in their own context window, and the parent agent only sees the output. What most guides skip is the cost. Subagent-heavy workflows can use roughly 7 times the tokens of an equivalent single-thread session, because each subagent maintains its own context — system prompt, tool definitions, file reads, the work itself.

That 7x is not a typo and it is not always worth it. After three months of measuring subagent runs across a backend team, four patterns emerged that justify the cost, and three patterns where subagents were a clear loss. Knowing the difference is what separates a $4 task from a $28 task that produces the same code.

Where the 7x Comes From

Each subagent invocation copies the parent's relevant findings into a new context window, then loads its own system prompt, tool list, and any files it needs to read. When the subagent finishes, its output goes back to the parent — but the parent does not get a token rebate for the subagent's work. The subagent paid for it; the parent now reads it on top of everything else it had.

Run six subagents in one session and you have paid for six full context windows of overhead, plus the parent's growing context that now includes all six summaries. Compared to one thread doing the same work in series, the token total can land at 7x, sometimes higher.

Four Cases Where Subagents Win

Case 1: Independent Parallel Searches

If you need to find references to a function across five microservices, sending five subagents in parallel finishes in the time of the slowest single search. A single-thread version would take five times as long in wall-clock time even if it used fewer tokens. When wall-clock matters more than token cost — interactive work, blocking on a result — subagents win.

plaintext
Single thread: search service A → service B → service C → service D → service E
Wall clock: 5 minutes. Tokens: 1x.

Five subagents in parallel: all five run at once.
Wall clock: 1 minute. Tokens: 4-5x.

Case 2: Protecting the Parent's Context Window

When the parent agent is doing long-horizon planning work, dropping 30,000 tokens of search results into its context can derail it. Sending the search to a subagent and getting back a summary keeps the parent focused. The token cost is real, but so is the cost of a confused parent that has to be re-prompted.

This is the most common justified use of subagents in our data. Long planning tasks, multi-step refactors, and any session where you expect the parent to run for more than 20 turns benefit from offloading high-context lookups.

Case 3: Specialized Tool Permissions

Subagents can have different tool permissions than the parent. If you want the parent to plan but only a specific subagent to have write access to the codebase, the boundary makes the workflow safer. You pay tokens for the isolation, but you get a cleaner blast radius.

This pattern shows up a lot in production setups where a planning agent dispatches an executor agent. The planning agent does not need write tools, and the executor agent does not need to read planning context. Each context is smaller because of the split, even if the total across both is higher.

Case 4: Independent Code Review Passes

Spawning three subagents to review the same diff for security, performance, and readability gives you independent opinions that did not influence each other. A single-thread version would have to choose one frame at a time, and the second pass is biased by the first.

This is expensive — three full context windows for a code review — but it catches bugs a single pass misses. Worth the cost on production code; not worth it on a one-off script.

Three Cases Where Subagents Lose

Anti-Pattern 1: Sequential Subagents That Do Not Need Isolation

If you have five tasks that depend on each other in order, sending each to a subagent is pure waste. Every subagent reloads its system prompt and tool definitions. The dependency means they cannot run in parallel. You pay 5x the token overhead for zero wall-clock benefit.

The fix is to keep the work in the parent thread. Subagents are an isolation tool. If you do not need isolation, you do not need subagents.

Anti-Pattern 2: Subagents for Tiny Tasks

Spawning a subagent to run one bash command is a clear loss. The subagent's startup context cost dwarfs the work it actually does. The parent could have run the command directly with no overhead beyond the tool call itself.

Pro Tip

Rule of thumb: if the work the subagent does is smaller than its startup context (~3,000 tokens for a typical setup), use the parent's tools instead. Subagents pay off when the work is bigger than the overhead.

Anti-Pattern 3: Recursive Subagent Trees

If a subagent spawns its own subagents and those spawn their own subagents, you can hit token costs in the high tens of dollars for a single task. Each level adds a multiplier. We have seen sessions where a four-level tree of subagents burned 1.5 million tokens to do work a flat single-thread session could have done in 200,000.

Cursor 3 and Claude Code both allow nested subagent calls. Both will let you build the tree. Neither will warn you that it is going to cost 30x what you expect.

How to Decide in 30 Seconds

Before you spawn a subagent, ask three questions. Will I need this work to run in parallel with other work? If no, skip the subagent. Will the subagent's output be more than 5,000 tokens or take more than 10 turns of work? If no, the overhead probably isn't worth it. Does the parent have a context-budget reason to offload? If yes, the subagent is justified even for serial work.

Two yeses, use a subagent. One or zero, keep the work in the parent. This rule has held up across most of the workflows we've measured.

Measuring Your Own Cost

Both Claude Code and Cursor surface per-session token totals. Run the same task once with subagents and once without, and look at the difference. You will be surprised how often the subagent version costs 5-10x more for the same output. Once you see your numbers, the decision becomes much easier.

Key Takeaway

Subagents cost roughly 7x the tokens of a single thread because each one runs its own context window. They pay off for parallel work, parent context protection, tool permission boundaries, and independent review passes — and lose for serial work, tiny tasks, and recursive trees.

Frequently Asked Questions

Is the 7x cost the same on every Claude Code plan?

The token multiplier is the same regardless of plan, since it is a function of how subagents work, not how billing works. What changes is how much you feel it. On Pro plans with rate limits, a 7x session can hit caps fast. On API usage with per-token billing, you see the cost on the invoice. Either way, the underlying token math is identical.

Can I cap how many subagents a parent agent can spawn?

Claude Code does not have a built-in spawn cap as of May 2026. You can set custom rules in your CLAUDE.md to instruct the parent agent to limit subagent use, but enforcement is soft — the agent can still spawn more if it decides the task needs them. The hard limit is the rate limit on your plan.

Are subagents worth it for hobby projects or just for production work?

On hobby projects, subagents usually lose. The work is rarely big enough to justify the overhead, and the cost matters relative to a smaller monthly budget. The 7x multiplier shows up most painfully on small tasks. Save subagents for the genuinely large or genuinely parallel work, regardless of whether the project is hobby or production.

Personalized for your role

Get Your AI Career Action Plan

Our AI Advisor builds you a personalized AI Readiness Score, skills gap analysis, and 30/60/90 day plan based on your specific role and experience.

Try the AI Advisor →
← Back to AI Coding Hub