LLM Token Counter and Context Budget Planner - Estimate Tokens and Visualize Context Window Usage

First Published:
Last Updated:

Estimate how many tokens your prompt, conversation history, and tool definitions use, then see at a glance whether the input plus your reserved output fits inside a model's context window.

All processing is performed entirely in your browser using client-side JavaScript. No data is transmitted to any server, and no API calls are made. Your text never leaves your device.

⚠️ IMPORTANT DISCLAIMER:

  • This tool is provided "AS IS" without any warranties of any kind.
  • Token counts shown are estimates produced in your browser and may differ from a model's actual tokenizer.
  • For exact counts, use the provider's official token counting API.
  • The author accepts no responsibility for decisions based on these estimates.
  • By using this tool, you accept full responsibility for any outcomes.

This tool uses client-side JavaScript for all processing. No data is transmitted to servers, no files are uploaded online, all processing happens locally in your browser. Once loaded, this tool continues to work even without an internet connection. For more details, please refer to our Web Tools Disclaimer.

Model and Output Reserve

Context window: 0 tokens · Max output: 0 tokens
Tokenizer: -
4,096
Range: 0 to 0 tokens (model max output)

Input

Characters: 0 Words: 0 Estimated tokens: 0
Conversation history0 tok

Context Budget

0Used (input + reserve)
0Context window
0%Utilization
0Remaining
Estimated input total: 0 tokens

What-if

Load from File

Drop a .txt file (plain text) or a .json messages array here, or click to browse.
Examples:

About This Tool

A token is the unit a language model actually processes - a chunk of text the tokenizer treats as one piece, not a word or a character. The context window is the total token budget a model can consider at once, and it is shared by everything you send (system prompt, tool definitions, conversation history) and the space reserved for the model's response. When the input plus the reserved output approaches the window limit, requests fail or history has to be trimmed. This planner estimates the token cost of each part and shows how it all fits.

How the estimate works (and why it is an estimate): An exact Claude token count can only be produced by the provider's server-side token counting API; it cannot be reproduced in the browser, and this tool never sends your text anywhere. Instead, it approximates tokens from the character count using ratios derived from each model's officially published "words / unicode characters per token" figures, grouped by tokenizer family. This is deliberately not a GPT-style byte-pair tokenizer such as tiktoken or gpt-tokenizer: those are built for OpenAI models and meaningfully miscount Claude tokens, which would give a false sense of precision. The estimate also does not model the small per-message and per-tool structural overhead the real API adds, so treat the result as a planning approximation with a margin.

Context window and max output limits below were taken from the official Anthropic models documentation as of 2026-06-23 and can change. Max output values are for the synchronous Messages API.

ModelModel IDContext windowMax outputTokenizer
Claude Fable 5claude-fable-51,000,000128,000Opus 4.7+
Claude Opus 4.8claude-opus-4-81,000,000128,000Opus 4.7+
Claude Opus 4.7claude-opus-4-71,000,000128,000Opus 4.7+
Claude Opus 4.6claude-opus-4-61,000,000128,000Pre-4.7
Claude Sonnet 4.6claude-sonnet-4-61,000,00064,000Pre-4.7
Claude Sonnet 4.5claude-sonnet-4-5200,00064,000Pre-4.7
Claude Opus 4.5claude-opus-4-5200,00064,000Pre-4.7
Claude Haiku 4.5claude-haiku-4-5200,00064,000Pre-4.7
Claude Opus 4.1 (deprecated)claude-opus-4-1200,00032,000Pre-4.7

Source: Anthropic - Models overview. Some platforms differ (for example, Claude Opus 4.8 has a 200,000-token context window on Microsoft Foundry). For programmatic, always-current limits, query the Models API; for exact counts, use the token counting API.

How to Use

  1. Pick a model to load its context window and max output limit.
  2. Set the output reserve - how many tokens you want to keep free for the model's response (capped at the model's max output).
  3. Enter your input. Use Plain Text for a single prompt, or Messages (roles) to break it into system prompt, tool definitions, and conversation turns. Counts update as you type.
  4. Read the budget bar. The stacked bar and the numbers show input plus reserved output against the window, with a warning if you exceed it.
  5. Run a what-if. In Messages mode, drop the last N turns or shorten the system prompt to see how much room that frees.
  6. Load a file or an example, then Copy Summary to export the breakdown as text.

FAQ

Is this an exact token count?
No. These are browser-side estimates derived from character counts and per-model ratios. They will not exactly match the tokenizer. For an exact number, use the provider's official token counting API (it runs server-side and reports the real input_tokens).

What counts toward the context window?
Everything the model has to read or produce in a single request: the system prompt, all tool / function definitions, the entire conversation history (every prior user and assistant turn, including tool calls and results), and the space reserved for the response. The input and the reserved output share one budget, so they must fit together inside the window.

Why does my message use more tokens than it has words?
Tokens are sub-word units. Common words may be a single token, but rare words, code, punctuation, whitespace, and non-English text often split into several tokens each. As a rough guide, English text runs well under one token per character, while CJK text and code are denser, so a word can cost more than one token.

Why do different models show different token counts for the same text?
Models use different tokenizers. The tokenizer introduced with Claude Opus 4.7 produces roughly 30% more tokens for the same text than earlier Claude tokenizers, so this tool groups models into tokenizer families and applies different ratios accordingly. Always re-check counts when you switch model families.

Does my text get sent anywhere?
No. The tool makes no network requests and no API calls. All estimation runs locally in your browser, and nothing is uploaded or stored on a server. You can use it offline once the page has loaded.

Related Tools

Related Articles

Important Notes

  • Token counts shown here are estimates produced in your browser and may differ from a model's actual tokenizer. For exact counts, use the provider's official token counting API. Context window and output limits were taken from official documentation as of 2026-06-23 and can change.
  • The estimate does not include the small structural overhead the real API adds per message and per tool definition, so actual counts are usually slightly higher than shown.
  • This tool does not display pricing or cost figures; for current pricing, consult the provider's official pricing page.
  • No data is transmitted, no API calls are made, and nothing is stored. All processing is client-side.

References:
Tech Blog with curated related content
Web Tools Collection

Written by Hidekazu Konishi