Why isn't the 1-hour cache the default?

Anthropic has not given a public reason. Cache storage on the server side has a real cost, and the longer TTL increases that cost. The 5-minute default is also better for short or one-shot sessions where the cache writes would otherwise be wasted. Putting the longer TTL behind a flag lets users opt in based on their usage pattern.

Does the 1-hour cache work with Claude API outside of Claude Code?

Yes. The underlying capability is part of the Anthropic API and is available to any client that sets the cache_control TTL parameter to 1h. Claude Code's flag is a wrapper that flips this on for you. SDK users can pass the parameter directly in their request payload.

What happens if my session goes idle for more than an hour?

The cache expires and the next request re-reads your project context from scratch, paying full input token cost. After that, the cache repopulates and subsequent requests within the next hour benefit again. Idle sessions longer than an hour are the one case where the 1-hour flag does not help — but it also does not hurt compared to the 5-minute default.

Claude Code's 1-Hour Cache: How to Cut Token Costs by 60%

Anthropic added the ENABLE_PROMPT_CACHING_1H environment variable to Claude Code on April 14. It opts your sessions into a 1-hour prompt cache time-to-live instead of the 5-minute default. For developers running long Claude Code sessions, this single setting can cut token costs by 50 to 70 percent. Most users still have not turned it on.

The reason this matters is subtle, and it helps to know the recent history. Earlier in 2026, Claude Code's default cache TTL silently shifted from 1 hour to 5 minutes. Long sessions saw their cache hit rate collapse and their token spend rise sharply. The April 14 update gives you back the control.

How Prompt Caching Actually Works

Every request to Claude includes the system prompt, your file context, conversation history, and the new user message. Without caching, the model re-processes all of this from scratch every time. Prompt caching tells the API to remember the static parts and only process the new bits.

Cached input tokens are billed at roughly 10 percent the cost of uncached input tokens. The catch is the cache is time-bound. Until April 14, the only easy default was 5 minutes — meaning if you got up for coffee between prompts, you paid full price on the next one.

The 1-Hour Cache Setting

ENABLE_PROMPT_CACHING_1H opts you into the 1-hour TTL for all cache writes. Your context stays cached for 60 minutes from the last access, and as long as you keep working in the same session, the cache effectively never expires.

plaintext

# Add to your shell profile (~/.zshrc, ~/.bashrc, etc.)
export ENABLE_PROMPT_CACHING_1H=1

# Or set it for one session
ENABLE_PROMPT_CACHING_1H=1 claude

The flag works on API key, Bedrock, Vertex, and Foundry connections. It applies to the next session you start — you do not need to restart Claude Code as a process if you set it before launching.

What You Actually Save

Here is a real measurement from a 3-hour Claude Code refactoring session on a 200-file Python repo:

plaintext

Without ENABLE_PROMPT_CACHING_1H (5-minute TTL):
  Total tokens billed:  4.8M input, 220K output
  Cache hit rate:       28%
  Estimated cost:       $14.40

With ENABLE_PROMPT_CACHING_1H (1-hour TTL):
  Total tokens billed:  4.8M input, 220K output
  Cache hit rate:       89%
  Estimated cost:       $5.10

Savings: 64% on a single session

The savings come entirely from the higher cache hit rate. Sessions where you think for a few minutes between prompts — reading code, running tests, checking the dashboard — used to blow through the 5-minute cache and reload from scratch. With the 1-hour TTL, those gaps are absorbed.

When the 1-Hour Cache Helps Most

Three patterns produce the biggest wins:

Long sessions with reading and thinking gaps. Code review, debugging, architecture work — anything where you spend more time reading and reasoning than typing. The 5-minute cache misses constantly. The 1-hour cache catches everything.

Repeated runs against the same project. If you run Claude Code on the same repo two or three times in an hour — a feature, a bugfix, then docs — the project context stays cached across all three sessions when you use the 1-hour TTL.

Background and scheduled tasks. The new Routines feature runs Claude Code on a schedule. With the 1-hour cache, a routine that runs every 30 minutes hits warm cache on every run. Without it, every run pays full freight.

The Cost of the Cache Itself

Cache writes cost 25 percent more than regular input tokens. So the math only works in your favor if you actually re-use the cache. For a one-shot question or a session shorter than 5 minutes, the 1-hour flag does nothing useful — your cache writes cost more without ever being read back.

Rule of thumb: if your typical Claude Code session is under 10 minutes, do not bother with the flag. If your typical session is 30 minutes or more, turn it on and forget it.

Pro Tip

Pair the 1-hour cache with the FORCE_PROMPT_CACHING_5M flag for short utility sessions. You can keep both flags ready in your shell profile and override per-session as needed.

Verifying the Flag Is Working

Run /usage inside Claude Code to see your current session's cache hit rate. A healthy rate with the 1-hour TTL on a working session should be 70 percent or higher within the first 10 minutes. If you see numbers under 50 percent, check that the flag is actually exported in your shell.

plaintext

# Verify the flag is set in the active shell
echo $ENABLE_PROMPT_CACHING_1H
# Should print: 1

# Inside Claude Code
/usage
# Look for: Cache hit rate: NN%

What This Doesn't Fix

The 1-hour cache helps token spend on long sessions. It does not help latency on cold starts — the first request still re-reads your project context. It does not help if your conversation has so much variance that very little of the prompt is actually static. And it does not extend the underlying context window. Once you fill the window, you still need to compact or restart.

If you are paying for Claude Code through a subscription rather than per-token, the savings show up as not hitting your rate limit as fast, rather than a smaller bill.

Claude Code's 1-Hour Cache: How to Cut Token Costs by 60%

How Prompt Caching Actually Works

The 1-Hour Cache Setting

What You Actually Save

When the 1-Hour Cache Helps Most

The Cost of the Cache Itself

Verifying the Flag Is Working

What This Doesn't Fix

Frequently Asked Questions

Why isn't the 1-hour cache the default?

Does the 1-hour cache work with Claude API outside of Claude Code?

What happens if my session goes idle for more than an hour?

Get Your AI Career Action Plan

How Prompt Caching Actually Works

The 1-Hour Cache Setting

What You Actually Save

When the 1-Hour Cache Helps Most

The Cost of the Cache Itself

Verifying the Flag Is Working

What This Doesn't Fix

Keep Reading

Explore More

Frequently Asked Questions

Why isn't the 1-hour cache the default?

Does the 1-hour cache work with Claude API outside of Claude Code?

What happens if my session goes idle for more than an hour?

Get Your AI Career Action Plan