Claude Code's 1-Hour Cache: How to Cut Token Costs by 60%
Anthropic added the ENABLE_PROMPT_CACHING_1H flag on April 14. Here's how to set it up and what it actually saves on real coding sessions.
Anthropic added the ENABLE_PROMPT_CACHING_1H environment variable to Claude Code on April 14. It opts your sessions into a 1-hour prompt cache time-to-live instead of the 5-minute default. For developers running long Claude Code sessions, this single setting can cut token costs by 50 to 70 percent. Most users still have not turned it on.
The reason this matters is subtle, and it helps to know the recent history. Earlier in 2026, Claude Code's default cache TTL silently shifted from 1 hour to 5 minutes. Long sessions saw their cache hit rate collapse and their token spend rise sharply. The April 14 update gives you back the control.
How Prompt Caching Actually Works
Every request to Claude includes the system prompt, your file context, conversation history, and the new user message. Without caching, the model re-processes all of this from scratch every time. Prompt caching tells the API to remember the static parts and only process the new bits.
Cached input tokens are billed at roughly 10 percent the cost of uncached input tokens. The catch is the cache is time-bound. Until April 14, the only easy default was 5 minutes — meaning if you got up for coffee between prompts, you paid full price on the next one.
The 1-Hour Cache Setting
ENABLE_PROMPT_CACHING_1H opts you into the 1-hour TTL for all cache writes. Your context stays cached for 60 minutes from the last access, and as long as you keep working in the same session, the cache effectively never expires.
# Add to your shell profile (~/.zshrc, ~/.bashrc, etc.)
export ENABLE_PROMPT_CACHING_1H=1
# Or set it for one session
ENABLE_PROMPT_CACHING_1H=1 claude The flag works on API key, Bedrock, Vertex, and Foundry connections. It applies to the next session you start — you do not need to restart Claude Code as a process if you set it before launching.
What You Actually Save
Here is a real measurement from a 3-hour Claude Code refactoring session on a 200-file Python repo:
Without ENABLE_PROMPT_CACHING_1H (5-minute TTL):
Total tokens billed: 4.8M input, 220K output
Cache hit rate: 28%
Estimated cost: $14.40
With ENABLE_PROMPT_CACHING_1H (1-hour TTL):
Total tokens billed: 4.8M input, 220K output
Cache hit rate: 89%
Estimated cost: $5.10
Savings: 64% on a single session The savings come entirely from the higher cache hit rate. Sessions where you think for a few minutes between prompts — reading code, running tests, checking the dashboard — used to blow through the 5-minute cache and reload from scratch. With the 1-hour TTL, those gaps are absorbed.
When the 1-Hour Cache Helps Most
Three patterns produce the biggest wins:
Long sessions with reading and thinking gaps. Code review, debugging, architecture work — anything where you spend more time reading and reasoning than typing. The 5-minute cache misses constantly. The 1-hour cache catches everything.
Repeated runs against the same project. If you run Claude Code on the same repo two or three times in an hour — a feature, a bugfix, then docs — the project context stays cached across all three sessions when you use the 1-hour TTL.
Background and scheduled tasks. The new Routines feature runs Claude Code on a schedule. With the 1-hour cache, a routine that runs every 30 minutes hits warm cache on every run. Without it, every run pays full freight.
The Cost of the Cache Itself
Cache writes cost 25 percent more than regular input tokens. So the math only works in your favor if you actually re-use the cache. For a one-shot question or a session shorter than 5 minutes, the 1-hour flag does nothing useful — your cache writes cost more without ever being read back.
Rule of thumb: if your typical Claude Code session is under 10 minutes, do not bother with the flag. If your typical session is 30 minutes or more, turn it on and forget it.
Pair the 1-hour cache with the FORCE_PROMPT_CACHING_5M flag for short utility sessions. You can keep both flags ready in your shell profile and override per-session as needed.
Verifying the Flag Is Working
Run /usage inside Claude Code to see your current session's cache hit rate. A healthy rate with the 1-hour TTL on a working session should be 70 percent or higher within the first 10 minutes. If you see numbers under 50 percent, check that the flag is actually exported in your shell.
# Verify the flag is set in the active shell
echo $ENABLE_PROMPT_CACHING_1H
# Should print: 1
# Inside Claude Code
/usage
# Look for: Cache hit rate: NN% What This Doesn't Fix
The 1-hour cache helps token spend on long sessions. It does not help latency on cold starts — the first request still re-reads your project context. It does not help if your conversation has so much variance that very little of the prompt is actually static. And it does not extend the underlying context window. Once you fill the window, you still need to compact or restart.
If you are paying for Claude Code through a subscription rather than per-token, the savings show up as not hitting your rate limit as fast, rather than a smaller bill.
Key Takeaway
Setting ENABLE_PROMPT_CACHING_1H=1 in your shell profile is the single highest-ROI Claude Code tweak available right now. Long sessions on the same project will see cache hit rates jump from around 30 percent to nearly 90 percent, with proportional savings on token cost.
Frequently Asked Questions
Why isn't the 1-hour cache the default?
Anthropic has not given a public reason. Cache storage on the server side has a real cost, and the longer TTL increases that cost. The 5-minute default is also better for short or one-shot sessions where the cache writes would otherwise be wasted. Putting the longer TTL behind a flag lets users opt in based on their usage pattern.
Does the 1-hour cache work with Claude API outside of Claude Code?
Yes. The underlying capability is part of the Anthropic API and is available to any client that sets the cache_control TTL parameter to 1h. Claude Code's flag is a wrapper that flips this on for you. SDK users can pass the parameter directly in their request payload.
What happens if my session goes idle for more than an hour?
The cache expires and the next request re-reads your project context from scratch, paying full input token cost. After that, the cache repopulates and subsequent requests within the next hour benefit again. Idle sessions longer than an hour are the one case where the 1-hour flag does not help — but it also does not hurt compared to the 5-minute default.
Personalized for your role
Get Your AI Career Action Plan
Our AI Advisor builds you a personalized AI Readiness Score, skills gap analysis, and 30/60/90 day plan based on your specific role and experience.
Try the AI Advisor →Level up your AI coding every week
New tips, tool updates, and workflows every week. Stay ahead of what AI can do for your code.
We respect your privacy. No spam, ever.