Faster Claude

How Claude Code uses prompt caching

Claude Code manages prompt caching automatically. See why a model switch triggers a slow uncached turn, what /compact costs, and how to check your cache hit rate.

Prompt caching makes Claude Code faster and more cost-efficient. Without caching, the API would reprocess your full history on every turn. With caching, it reuses what it already processed and only does new work for what changed.

Claude Code handles prompt caching for you, unless you disable it. Some actions invalidate the cache and make the next response slower and more expensive while it rebuilds.

How the cache is organized

Each time you send a message, Claude Code makes a new API request. The model doesn't remember anything between requests, so Claude Code re-sends the full context. Prompt caching avoids reprocessing the part that didn't change.

The API caches by matching the start of each request (the prefix) against content it recently processed. The match is exact: a change anywhere in the prefix recomputes everything after it. There is no per-file or per-segment caching.

Claude Code orders each request so content that rarely changes between turns comes first:

LayerContentChanges when
System promptCore instructions, tool definitions, output styleAn MCP server connects or disconnects, or Claude Code is upgraded
Project contextCLAUDE.md, auto memory, unscoped rulesSession starts, or after /clear or /compact
ConversationYour messages, Claude's responses, tool resultsEvery turn

A change to the conversation layer leaves the system prompt and project context cached. A change to the system prompt invalidates everything.

Two settings are part of the cache key but not the prompt text:

  • Model — each model has its own cache. Switching models recomputes the entire request.
  • Effort level — each effort level has its own cache. Changing it mid-session recomputes the entire request.

Pick your model, effort level, and MCP servers at the top of a session, then save /compact for natural breaks between tasks.

Where the cache lives

Caching happens server-side. With an API key or Claude subscription, the cache lives in Anthropic's infrastructure. On Bedrock or Vertex AI, the cache lives in your cloud provider's infrastructure. With a custom ANTHROPIC_BASE_URL or LLM gateway, the cache lives wherever your requests are forwarded.

Cached prefixes expire after a period of inactivity. Each request that hits the cache resets the timer.

Actions that invalidate the cache

These actions cause the next request to miss part or all of the cache:

  • Switching models
  • Changing effort level
  • Connecting or disconnecting an MCP server
  • Denying an entire tool
  • Compacting the conversation
  • Upgrading Claude Code

Switching models

Each model has its own cache. Switching with /model means the next request reads the entire conversation history with no cache hits.

Changing effort level

The cache is keyed by effort level as well as model. Claude Code may show a confirmation dialog before applying an effort change that would invalidate the cache.

Connecting or disconnecting an MCP server

Tool definitions sit in the system prompt layer, so the cache invalidates when the set of MCP tools changes. Editing your MCP config does not by itself change the cache; the new config takes effect after a restart.

Denying an entire tool

Adding a bare tool name like Bash or WebFetch as a deny rule removes that tool from Claude's context entirely and invalidates the cache the same way an MCP change does.

Only a bare tool name, or the equivalent Bash(*) form, has this effect. Scoped deny rules like Bash(rm *) do not change which tools Claude sees.

Compacting the conversation

Compaction replaces your message history with a summary and invalidates the conversation layer. Claude Code reuses the system prompt layer and reloads project context from disk.

Run /compact at a natural break between tasks. If you want to abandon a path entirely, /rewind truncates to a prefix that is already cached.

Upgrading Claude Code

A new version typically updates the system prompt or tool definitions, so the first request after an upgrade rebuilds the cache from the top. Auto-update applies on the next launch, not mid-session.

Actions that keep the cache

  • Editing files in your repository
  • Editing CLAUDE.md mid-session (edit does not apply until /clear, /compact, or restart)
  • Changing output style mid-session (same — applies on /clear or restart)
  • Changing permission mode (except opusplan toggling plan mode, which switches models)
  • Invoking skills and commands
  • Running /recap
  • Rewinding the conversation

Editing a file Claude previously read does not retroactively change the earlier read; Claude Code appends a note that the file changed and re-reads if needed.

Cache lifetime

Cached prefixes expire after inactivity. The API offers a five-minute TTL and a one-hour TTL.

  • On a Claude subscription, Claude Code requests the one-hour TTL automatically.
  • On an API key or third-party provider, the TTL stays at five minutes by default. Set ENABLE_PROMPT_CACHING_1H=1 for one hour.
  • Set FORCE_PROMPT_CACHING_5M=1 to force five minutes regardless of authentication.

Cache scope

In Claude Code, the cache is effectively scoped to one machine and directory. The system prompt embeds the working directory, platform, shell, and git snapshot at startup.

Sessions in parallel in the same directory can read each other's cache. Sequential sessions share the prefix only when the git status snapshot at startup matches.

Check cache performance

FieldMeaning
cache_creation_input_tokensTokens written to the cache on this turn
cache_read_input_tokensTokens served from cache on this turn

A high read-to-creation ratio means caching is working well.

Subagents and the cache

A subagent starts its own conversation with its own system prompt and tool set. It builds its own cache. The parent's cache is unaffected. A fork inherits the parent's prefix, so its first request reads the parent's cache.

Disable prompt caching

VariableEffect
DISABLE_PROMPT_CACHINGDisable for all models
DISABLE_PROMPT_CACHING_HAIKUDisable for Haiku only
DISABLE_PROMPT_CACHING_SONNETDisable for Sonnet only
DISABLE_PROMPT_CACHING_OPUSDisable for Opus only

For normal use, leave caching enabled.

On this page