Prompt Caching Breakpoint Planner Tool - Plan cache_control Placement and Prefix Reuse

First Published: 2026-06-24
Last Updated: 2026-06-24

Plan where to place Anthropic prompt caching breakpoints (cache_control) so the stable prefix of your prompt is reused across requests. Arrange your tools, system, and message blocks, mark which parts are stable vs. changing, and see exactly which tokens get reused and whether each breakpoint meets the per-model minimum cacheable length. This planner illustrates the structure of cache reuse only - it makes no API calls, measures no real cache hits, and shows no cost.

All processing is performed entirely in your browser using client-side JavaScript. No data is sent to any server, no API is called, and nothing you type leaves your device.

⚠️ IMPORTANT DISCLAIMER:

This tool is provided "AS IS" without any warranties of any kind.
It models the documented structural rules of prompt caching; it does not call any API and does not measure real cache hits.
Token counts are approximate. Minimum cacheable lengths and limits differ by model and platform and change over time.
The author accepts no responsibility for outcomes arising from use of this tool. Always confirm against current official documentation.
By using this tool, you accept full responsibility for any outcomes.

This tool uses client-side JavaScript for all processing. No data is transmitted to servers, no files are uploaded online, all processing happens locally in your browser. Once loaded, this tool continues to work even without an internet connection. For more details, please refer to our Web Tools Disclaimer.

Breakpoint Planner

Model (sets minimum cacheable length)

Minimum cacheable length (tokens)

Cache TTL

The minimum is prefilled from the selected model (verified against official documentation as of 2026-06-23). You can override it for a model or platform not listed.

Examples:

Prompt blocks

Add the pieces of your prompt. Each block has a kind (tools, system, static-context, dynamic-history), an approximate token count, a stability flag (static vs. dynamic), and an optional cache_control breakpoint. Caching always renders in the order tools → system → messages, so the visualization below groups blocks by that order regardless of editor order.

Cache prefix visualization

Green = the stable prefix that is reused (cache read) when the breakpoint is valid. Amber = content re-sent and re-processed on every request. A breakpoint only caches when its prefix meets the model minimum and no dynamic content precedes it.

Structural checks

Findings are graded Error (the API rejects or it cannot work), Warning (the API silently won't cache), and Info (guidance).

Messages API request skeleton

A structural skeleton showing where cache_control lands. All values are placeholders - this tool never sends a request. Replace the placeholders with your real content.

Token estimator (approximate)

A rough helper to turn pasted text into an approximate token count (characters ÷ 4) you can type into a block. This is an estimate, not a substitute for the official count_tokens endpoint.

Paste text to estimate

About This Tool

The Prompt Caching Breakpoint Planner helps you reason about Anthropic prompt caching: a feature of the Messages API where a stable leading portion of your prompt (the prefix) is stored after the first request and reused on later requests, so those tokens are processed as a fast cache read instead of from scratch. Caching is governed by a few structural rules that are easy to get wrong, and this tool makes them visible: it shows the cache order, where to place cache_control breakpoints, which tokens are reused, and whether each breakpoint meets the per-model minimum cacheable length.

It is a learning and design aid, not a measurement tool. It makes no API calls, never sees your real prompts unless you paste them locally, measures no real cache hits, and deliberately shows no pricing or savings - only the structure of token reuse. For exact token counts use the official count_tokens endpoint, and for cache-hit numbers inspect usage.cache_read_input_tokens on real responses.

The rules below were verified against the official Anthropic documentation as of 2026-06-23 for Claude Fable 5, Claude Opus 4.8 / 4.7 / 4.6 / 4.5, Claude Sonnet 4.6 / 4.5, and Claude Haiku 4.5. Limits can differ by model and platform and change over time.

Prompt Caching Rules at a Glance

Rule	Value
Matching	Prefix match. Any byte change anywhere in the prefix invalidates the cache for everything after that point.
Cache order	`tools` → `system` → `messages`. A change at one level invalidates that level and all later levels.
Maximum breakpoints	4 `cache_control` breakpoints per request (a 5th explicit breakpoint returns a 400 error).
Where `cache_control` can go	On tool definitions, `system` text blocks, and message content blocks (text, image, document, tool_use, tool_result). A top-level `cache_control` auto-targets the last cacheable block.
TTL options	5 minutes (default) `{"type": "ephemeral"}`, or 1 hour `{"type": "ephemeral", "ttl": "1h"}`.
Lookback window	Each breakpoint looks back at most 20 content blocks for a matching prior cache entry.
Cache-hit signal	`usage.cache_read_input_tokens` and `usage.cache_creation_input_tokens` on the response.
Not cacheable	Thinking blocks on their own and empty text blocks. Concurrent identical requests do not hit until the first response begins.

Minimum cacheable prompt length on the Claude API (model-specific; varies by platform and can change over time):

Model	Minimum cacheable length
Claude Fable 5	512 tokens (1,024 tokens on Amazon Bedrock)
Claude Opus 4.8	1,024 tokens
Claude Opus 4.7	2,048 tokens
Claude Opus 4.6 / 4.5	4,096 tokens
Claude Sonnet 4.6 / 4.5	1,024 tokens
Claude Haiku 4.5	4,096 tokens

Source: Anthropic - Prompt caching (verified 2026-06-23). A prefix shorter than the minimum is processed without caching, and no error is returned.

How to Use

Pick your model (it prefills the minimum cacheable length), or override the minimum for a model or platform not listed.
Load an example or click Add block to lay out your prompt: choose each block's kind, approximate token count, and whether it is static (stable) or dynamic (changes per request).
Click Suggest breakpoints to place cache_control at the end of the stable prefix and at layer boundaries, or toggle the cache_control checkbox on any block yourself.
Read the visualization to see which tokens are reused and which are re-sent, and the structural checks for errors and warnings.
Copy the generated Messages API request skeleton and replace the placeholders with your real content.

Features

Block composition editor: add tools / system / static-context / dynamic-history blocks with approximate tokens, a static/dynamic flag, and reordering.
Breakpoint suggestion: places cache_control at the static/dynamic boundary and layer boundaries, only where the prefix meets the model minimum, and never exceeding 4.
Prefix visualization: colors the reused cached prefix vs. the re-sent tail, in true cache order (tools → system → messages), with per-breakpoint validity.
Structural checks: flags too many breakpoints, dynamic content inside the cached prefix, prefixes below the minimum, and stranded stable blocks - graded Error / Warning / Info.
Request skeleton: generates a copyable Messages API JSON body with cache_control in the right places and a selectable TTL. Placeholder values only.
Approximate token estimator and per-model minimum reference, with the confirmation date shown.
Privacy first: fully client-side, no API calls, no cost figures, nothing leaves your browser.

FAQ

How many cache breakpoints can I set?
Up to 4 cache_control breakpoints per request. They let you cache sections that change at different frequencies. A 5th explicit breakpoint returns a 400 error, which this tool flags as an Error.

Why didn't my cache hit (cache_read_input_tokens stays 0)?
Something in the prefix changes between requests, so the prefix hash never matches. Common causes: a timestamp, UUID, or per-request ID near the start; non-deterministic JSON key order; a changing tool set; switching models (caches are per-model); or dynamic content placed before a breakpoint. Order matters too - caching renders tools → system → messages, so a change in tools invalidates system and messages as well. Keep everything stable before the last breakpoint.

What is the minimum cacheable length?
It is model-specific (see the table above) - for example 1,024 tokens for Claude Opus 4.8 and Claude Sonnet 4.6, and 4,096 tokens for Claude Opus 4.6. A prefix shorter than the minimum is processed without caching and no error is returned, so a too-short prefix is a silent miss. This tool flags it as a Warning.

Where should I place cache_control?
At the end of the longest stable leading prefix. Everything before it must be byte-stable across requests (frozen system prompt, deterministic tool list, fixed context); put volatile content (the user's latest question, timestamps, per-request IDs) after the last breakpoint. For multi-turn conversations you can add a breakpoint near the end of each turn so cache reads accrue as the conversation grows.

Does this tool show cost or savings?
No. It illustrates only the structure of token reuse - which tokens are cached vs. re-sent. It makes no API calls, measures no real cache hits, and shows no pricing or savings figures. For pricing, see the official Anthropic pricing page; for real usage, read usage.cache_read_input_tokens on actual responses.

Related Tools

JSON Formatter Validator Tool - Online JSON Beautifier Minifier and Tree Viewer - format and validate the request skeleton this planner generates.
LLM Token Counter and Context Budget Planner - Estimate Tokens and Visualize Context Window Usage - estimate token counts and check minimum cacheable lengths per model.
Anthropic Tool Use Schema Builder and Validator Tool - Build and Validate Claude Function Calling Definitions - build the tool definitions you place at the start of a cached prompt.

Important Notes

This planner illustrates the structural rules of prompt caching (prefix matching, ordering, breakpoint limits, minimum cacheable length) verified against official documentation as of 2026-06-23 for Claude Fable 5, Claude Opus 4.8 / 4.7 / 4.6 / 4.5, Claude Sonnet 4.6 / 4.5, and Claude Haiku 4.5. It does not call any API, does not measure real cache hits, and does not estimate cost. Limits can differ by model and change over time - confirm against current documentation.
Token counts are approximate. Use the official count_tokens endpoint for exact counts before relying on a minimum-length decision.
The breakpoint suggestions are a starting point. Real prompts can have multiple sections that change at different frequencies; place breakpoints at those stability boundaries within the 4-breakpoint limit.
Caching is per-model and per-prefix. Changing the model, the tool set, or any earlier byte invalidates the cache from that point onward.

References:
Tech Blog with curated related content
Web Tools Collection

Written by Hidekazu Konishi