Claude Code on Pay-As-You-Go API Billing - Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI
First Published:
Last Updated:
This article is a decision page for that choice. It answers three questions in order. First, what is the difference between a subscription (the per-seat Pro / Max / Team / Enterprise plans you log into) and metered, pay-as-you-go API billing (you pay for the tokens you consume), and which one fits which kind of team. Second, how do you actually configure each of the three metered routes — Anthropic API directly, Amazon Bedrock, and Google Cloud Vertex AI — at the level of environment variables, credentials, and model identifiers. Third, given your own constraints — evaluation versus steady-state, even versus uneven usage, existing cloud contract, data-residency rules, the need to charge the cost back to a project — which route should you actually pick.
Throughout, I keep to one rule that the rest of this site follows: no prices, no rates, no per-token figures. Those numbers move, and a page that quotes them is wrong within a quarter. Everything here is about the *mechanism* — how each route is wired, how spend is capped and observed, and how to reason about the trade-offs — so it stays correct as the numbers underneath it change. Where you need a figure, I point you at the official pricing page that owns it.
If you have never installed Claude Code or run it against your own files, start with Claude Code Getting Started — Why Knowing About Local AI Agents Changes Everything first; this article assumes you already have a working CLI or editor-extension install and are now deciding how to pay for it at a team or organization scale. For the full settings surface that these environment variables live inside, see Claude Code Features and Settings Reference.
1. Introduction — Same Tool, Different Ceilings
The reason this article exists is that the billing question is almost always asked too late. A team installs Claude Code, runs it on a subscription login because that is the path of least resistance, gets value from it, and then six months later finance asks why the spend is on a personal card, security asks why the traffic is leaving through Anthropic's endpoint instead of the company's existing Bedrock contract, and a second team that barely uses the tool is paying the same per-seat price as the team that lives in it. None of those are agent problems. They are billing-route problems, and every one of them is decided the moment you choose how Claude Code authenticates.There are two top-level billing models. A subscription is seat-based: you buy a plan — Pro, Max, Team, or Enterprise — and each user logs in with an OAuth flow that ties their Claude Code session to a Claude.ai account. The price is per person per month, flat, regardless of how many tokens that person burns. Metered API billing is consumption-based: Claude Code authenticates with an API credential, every request is metered in tokens, and you pay for exactly what you consume. The metered model then forks into three routes depending on *whose* API you meter against — Anthropic's own API, Amazon Bedrock, or Google Cloud Vertex AI — and that fork is where data residency, existing cloud spend, and IAM-based control get decided.
The mental model worth carrying through the rest of this article is that the subscription optimizes for predictable per-person cost and zero infrastructure, while metered billing optimizes for paying only for what you use and for routing the spend and the data through infrastructure you already control. Neither is universally cheaper or better. A team of ten engineers who all use Claude Code heavily, every day, will likely find the flat per-seat subscription both cheaper and simpler. A team where two people live in the tool and eight touch it occasionally is subsidizing the eight under a seat model and would meter more fairly. A regulated enterprise that already runs everything through Amazon Bedrock under a negotiated agreement will choose Bedrock not because it is cheaper per token but because it keeps Claude Code inside the same compliance, billing, and IAM boundary as the rest of its workloads.
The sections that follow take each of those routes in turn. Sections 2 and 3 set up the decision; sections 4 through 6 are the concrete setup for each metered route; section 7 maps the model identifiers across routes (they differ, and the difference causes real outages); section 8 covers spend control and observability without quoting a single price; section 9 is the decision framework itself; and sections 10 through 12 cover team rollout, pitfalls, and the questions people actually search for.
2. Two Billing Models: Subscription versus Metered API
2.1 How the Subscription Model Works
The subscription model is what you get by default. You runclaude, it opens a browser, you log in to your Claude.ai account, and from then on Claude Code draws against your plan. The supported account types are a Claude Pro or Max subscription, Claude for Teams or Enterprise, or a Claude Console account. The defining property is that the cost is attached to a *person*, not to consumption: a seat costs the same whether the holder runs one prompt a week or saturates the tool all day. Authentication is an OAuth login, refreshed transparently; there is no API key to rotate, no cloud account to wire up, and no per-request metering for you to monitor. To switch accounts you run /logout and log back in.This is the right model for a large and important class of teams: groups where most members use the tool regularly, where you want a flat and predictable monthly line item, and where you do not need the traffic to flow through a specific cloud account for compliance or billing reasons. For an individual developer or a small, high-usage team, the subscription is usually both the cheapest and the lowest-friction option, and you should not over-engineer your way out of it.
2.2 How Metered API Billing Works
Metered billing replaces the seat with a credential. Instead of logging in to a Claude.ai account, Claude Code authenticates with an API key (or a cloud provider's credentials), and every request it makes is metered in input and output tokens. You are billed for the tokens you consume — nothing for idle seats, more for heavy days, less for light ones. The spend shows up on an API bill (Anthropic Console, your AWS bill, or your Google Cloud bill) rather than as a per-person subscription charge.The properties that make metering attractive are the mirror image of the subscription's. Cost tracks usage, so a team with uneven usage stops subsidizing its light users. The spend can be attributed to a workspace, an AWS account, or a GCP project, which makes chargeback to a cost center possible. And because the credential can be a cloud provider's, the traffic and the billing can be kept inside an existing Amazon Bedrock or Google Cloud Vertex AI contract — which is frequently the deciding factor for an enterprise, independent of any per-token comparison.
2.3 Choosing Between the Two
The honest decision rule is about the shape of your usage and your constraints, not about a headline price. Reach for the subscription when usage is high and roughly even across the team, when you want one predictable number, and when you have no requirement to route through a particular cloud. Reach for metered API billing when you are still evaluating the tool and do not want to commit to seats; when usage is uneven and seat pricing would over-charge your light users; when data residency or an existing cloud contract requires Amazon Bedrock or Google Cloud Vertex AI; or when you need to charge the cost back to specific teams, projects, or accounts. Many organizations use both over time: meter during the evaluation, then move heavy steady-state users onto seats once the value is proven and usage has flattened. Section 9 turns this into an explicit decision tree; section 10 covers running the two models side by side.A practical note before the setup sections: the subscription login and an API key can both be present in your environment at once, and they do not politely coexist. If
ANTHROPIC_API_KEY is set, Claude Code prefers the API key over your subscription login. That is a feature when you mean it and a trap when you do not — covered in detail in section 4 and again in the pitfalls in section 11.2.4 What the Seat Buys and What the Meter Bills
It helps to be precise about what each model actually charges for, because the two answers explain when each one is the better deal — and they do so without any reference to a price. A seat charges for *availability*: you are paying, per person per month, for that person to be able to reach Claude through the tool whenever they want, up to the plan's limits, at a flat rate that does not move with how much they actually use it. The cost is a function of headcount, not of work done. That is exactly why a seat is efficient for a heavy daily user — the fixed price is spread across a large amount of work — and inefficient for an occasional user, whose fixed price is spread across very little.A meter charges for *work done*: every request is priced in input and output tokens, so the bill is a function of consumption and nothing else. There is no charge for a user who did nothing this month, and no flat ceiling that a heavy user stays under "for free." The cost scales smoothly with usage, which is what makes it fair across a team whose members use the tool by wildly different amounts, and what makes it the honest choice during an evaluation when you genuinely do not yet know how much anyone will use it.
The corollary is that the crossover between the two is governed by usage concentration, not by headcount alone. A ten-person team where everyone is a heavy user concentrates a lot of work onto each seat and tends to favor seats; a ten-person team where the same total work is concentrated in two people leaves eight seats mostly idle and tends to favor metering. You do not need a price to see which side of that line you are on — you need an honest read on how evenly the work is spread, which is the judgment the decision framework in section 9 turns into a route.
3. The Three Metered Routes at a Glance
Once you have decided to meter, you choose *whose* API you meter against. The three routes reach the same Claude models and run the same Claude Code, but they differ in how you authenticate, how you name the model, what the region concept is, and — most importantly — which existing contract and compliance boundary the traffic and billing sit inside.The figure below shows the four ways Claude Code can be wired: the subscription path via OAuth, and the three metered routes.

| Dimension | Anthropic API (direct) | Amazon Bedrock | Google Cloud Vertex AI |
|---|---|---|---|
| Enable with | ANTHROPIC_API_KEY set | CLAUDE_CODE_USE_BEDROCK=1 | CLAUDE_CODE_USE_VERTEX=1 |
| Authentication | Anthropic API key (or gateway bearer token) | AWS credential chain — SSO, profile, keys, instance role, or Bedrock API key | Google Application Default Credentials (ADC) |
| Model identifier form | Bare model ID (claude-opus-4-8) | Cross-region inference profile ID (us.anthropic.claude-opus-4-8) or inference profile ARN | Vertex model ID (claude-opus-4-8, dated variants like claude-haiku-4-5@20251001) |
| Region concept | None (Anthropic-managed) | AWS_REGION + cross-region inference profiles (us. / eu. / apac.) | CLOUD_ML_REGION (global, multi-region, or regional endpoints) |
| Existing-cloud integration | Standalone Anthropic account | Sits inside your AWS account, IAM, and bill | Sits inside your GCP project, IAM, and bill |
| Billing surface | Anthropic Console | AWS bill | Google Cloud bill |
| Best fit | Individuals and teams with no cloud-routing requirement | AWS-centric organizations, existing Bedrock contracts | GCP-centric organizations, existing Vertex contracts |
claude-opus-4-8 on the Anthropic API, us.anthropic.claude-opus-4-8 as a Bedrock cross-region inference profile, and claude-opus-4-8 on Vertex — and pasting one route's identifier into another route's configuration is a common, avoidable failure (section 7). Second, only the cloud routes give you IAM-based control and in-account billing; if that is why you are here, the direct API route does not deliver it. Third, data residency is a cloud-route property: if your requirement is that inference happens in a specific AWS or GCP region under your own account, that requirement alone selects Bedrock or Vertex and narrows your region choices within them.4. Route A — Anthropic API (Direct)
The direct route meters against Anthropic's own API. It is the simplest metered setup and the right starting point for an individual developer or a team that has no requirement to route through a specific cloud.4.1 Get an API Key and Set It
Create an API key in the Anthropic Console, then expose it to Claude Code through the environment:export ANTHROPIC_API_KEY="sk-ant-..."
claude
With the key present, Claude Code authenticates against the Anthropic API and meters every request as token usage on your Console account. In interactive use, Claude Code asks for approval the first time it detects the key; in non-interactive (-p / headless) use it is used whenever present. The key is sent on the X-Api-Key header under the hood — you do not set that header yourself.4.2 Subscription Login versus API Key, and the Order of Precedence
The single most important thing to understand about Route A is how it interacts with a subscription login, because the two share an environment and Claude Code resolves them by a fixed order of precedence. From highest priority to lowest, Claude Code selects credentials in this order:1. Cloud provider credentials (
CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX, or CLAUDE_CODE_USE_FOUNDRY set)2.
ANTHROPIC_AUTH_TOKEN (a bearer token, for gateways and proxies)3.
ANTHROPIC_API_KEY4.
apiKeyHelper script output (for dynamic or rotating credentials)5.
CLAUDE_CODE_OAUTH_TOKEN (a long-lived subscription OAuth token)6. Subscription OAuth credentials from
/login (the default for Pro / Max / Team / Enterprise users)The practical consequence: if
ANTHROPIC_API_KEY is set and you also have an active subscription, the API key wins. That is correct when you intend to meter, and a trap when you forgot the key was exported in your shell profile and wonder why your subscription is not being used — or worse, why authentication fails because the key belongs to an expired or disabled organization. The fix is to unset ANTHROPIC_API_KEY to fall back to the subscription, or to set it deliberately when you mean to meter.To confirm which credential is actually active, run
/status inside Claude Code; it shows your account information and the active authentication method. For a quick read on session cost, /usage shows token and cost estimates for the current session — but treat the dollar figure as a local estimate; the authoritative number lives in the Anthropic Console.4.3 Gateways, Proxies, and Dynamic Keys
Two more environment variables matter for Route A in larger setups.ANTHROPIC_AUTH_TOKEN is for routing through an LLM gateway or proxy that authenticates with bearer tokens rather than Anthropic API keys; it is sent as Authorization: Bearer <value> and sits above ANTHROPIC_API_KEY in the precedence order. ANTHROPIC_BASE_URL overrides the API endpoint URL — useful for pointing Claude Code at a proxy — but note that it changes the *endpoint*, not which model answers. For credentials that rotate, the apiKeyHelper setting names a script whose output Claude Code uses as the key, which lets you wire in a secrets manager instead of a static export.For continuous-integration use where no browser is available to complete an OAuth login,
claude setup-token generates a long-lived (one-year) CLAUDE_CODE_OAUTH_TOKEN tied to a Pro / Max / Team / Enterprise plan. That is a subscription credential, not a metered one, but it is the bridge that lets a subscription run unattended; the unattended-execution story in general is the subject of Claude Code in CI/CD and Headless Automation.4.4 Workspaces and Key Separation
On the Anthropic Console side, the unit of separation and control is the workspace. When you first authenticate Claude Code with a Console account, a workspace named "Claude Code" is created automatically, which gives you centralized cost tracking for the tool. You can issue separate API keys per workspace to isolate teams or projects, and administrators can attach spend limits at the workspace level — both covered in section 8. The point to carry into a rollout is that the workspace is where you put the boundary: one workspace per cost center, separate keys per workspace, and the Console's cost-and-usage reporting reads cleanly along those lines.4.5 Verifying the Direct Route
Because the direct route shares an environment with any subscription login, the first thing to do after setting the key is confirm that the credential you intend is actually the one in use:export ANTHROPIC_API_KEY="sk-ant-..."
claude
# inside Claude Code:
# /status -> confirms the active auth method and account
# /usage -> per-session token and cost estimate
If /status shows a subscription login when you expected the API key — or the reverse — walk the precedence list in section 4.2: a higher-priority credential is winning. The usual culprit is a CLAUDE_CODE_USE_BEDROCK / CLAUDE_CODE_USE_VERTEX flag still exported from an earlier experiment (those sit above the API key), or a stray ANTHROPIC_AUTH_TOKEN. Resolving the direct route is almost always a matter of removing a higher-priority credential that you did not mean to leave set, rather than anything wrong with the key itself.5. Route B — Amazon Bedrock
The Bedrock route meters against Claude models served through Amazon Bedrock, which means the traffic, the IAM controls, and the bill all sit inside your AWS account. For an organization that already runs on AWS — and especially one with a negotiated Bedrock agreement or a data-residency requirement pinned to AWS regions — this is frequently the deciding route regardless of any per-token comparison. If Bedrock itself is new to you, Amazon Bedrock — Basic Information and API Examples covers the service fundamentals that this section assumes.5.1 Enable Bedrock and Set the Region
Two environment variables are the minimum: the enabling flag and the region.export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
claude
AWS_REGION is required — Claude Code does not read the region from your ~/.aws/config file for this purpose, so it must be in the environment. Setting CLAUDE_CODE_USE_BEDROCK=1 puts Claude Code on the Bedrock route, which (per the precedence order in section 4) takes priority over any ANTHROPIC_API_KEY or subscription login that also happens to be present.One capability caveat is specific to this route: Claude Code's built-in WebSearch tool is not available on Bedrock. If a workflow depends on web search, supply an equivalent through an MCP server — a custom search tool registered via the Model Context Protocol, which runs the same way regardless of billing route — rather than relying on the built-in. Everything else about how Claude Code behaves is unchanged across routes.
5.2 AWS Credentials
Claude Code uses the standard AWS credential chain, so any of the usual mechanisms work:# Option A — an SSO profile
aws sso login --profile my-bedrock-profile
export AWS_PROFILE=my-bedrock-profile
# Option B — explicit access keys (e.g. on a build host)
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_SESSION_TOKEN="..." # if using temporary credentials
# Option C — a Bedrock API key, for simpler auth without full AWS credentials
export AWS_BEARER_TOKEN_BEDROCK="..."
On an EC2 instance or container with an attached instance role, the chain resolves the role automatically and you set nothing beyond AWS_REGION and the enabling flag. For credentials that expire mid-session, the awsAuthRefresh setting names a command Claude Code runs when it detects expiry; for cross-account setups where the Bedrock credentials differ from your default chain, awsCredentialExport runs at session start to supply them. Both live in settings.json.Two more Bedrock-specific knobs are worth knowing.
ANTHROPIC_BEDROCK_BASE_URL overrides the Bedrock endpoint URL, for custom endpoints or a gateway in front of Bedrock. ANTHROPIC_BEDROCK_SERVICE_TIER selects the Bedrock service tier — default, flex, or priority — which lets you trade latency and capacity characteristics; leave it unset unless you have a specific reason to change the tier. If you run the small/fast (Haiku-class) model in a different region from your primary, ANTHROPIC_SMALL_FAST_MODEL_AWS_REGION overrides its region, though it has no effect unless you have also set ANTHROPIC_DEFAULT_HAIKU_MODEL (or the deprecated ANTHROPIC_SMALL_FAST_MODEL).5.3 Choosing the Model and the Small/Fast Model
Claude Code uses a primary model for most work and a smaller, faster model for lightweight background tasks. On Bedrock you pin them with environment variables, and there is an important freshness point here: the variable that used to be calledANTHROPIC_SMALL_FAST_MODEL is deprecated in favor of ANTHROPIC_DEFAULT_HAIKU_MODEL. Use the new names:# Pin the model each alias resolves to (recommended for reproducibility)
export ANTHROPIC_DEFAULT_OPUS_MODEL='us.anthropic.claude-opus-4-8'
export ANTHROPIC_DEFAULT_SONNET_MODEL='us.anthropic.claude-sonnet-4-6'
export ANTHROPIC_DEFAULT_HAIKU_MODEL='us.anthropic.claude-haiku-4-5-20251001-v1:0'
# Or set the session's primary model directly
export ANTHROPIC_MODEL='us.anthropic.claude-opus-4-8'
ANTHROPIC_MODEL sets the primary model for the session. ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, and ANTHROPIC_DEFAULT_HAIKU_MODEL control what the opus, sonnet, and haiku aliases resolve to — and the Haiku-class model is what Claude Code uses for background tasks. The legacy ANTHROPIC_SMALL_FAST_MODEL still functions but is deprecated; if you have it in an old setup, migrate it to ANTHROPIC_DEFAULT_HAIKU_MODEL. If you pin nothing, the route falls back to a default primary model and uses the same model for the small/fast role.The values above are cross-region inference profile IDs — the
us. prefix is the key detail, explained next. You can also point ANTHROPIC_MODEL at an application inference profile ARN if you use those to track or constrain spend:export ANTHROPIC_MODEL='arn:aws:bedrock:us-east-2:111122223333:application-inference-profile/your-profile-id'
5.4 Cross-Region Inference Profiles
A Bedrock model identifier for Claude Code is not the bare foundation-model ID; it is a cross-region inference profile ID, distinguished by a geographic prefix —us., eu., or apac.. The profile lets Bedrock route a request across the regions in that geography for capacity, while keeping the data within the geography. The examples above use us.; if your account and data-residency rules put you in Europe or Asia-Pacific, substitute the matching prefix (eu.anthropic.claude-..., apac.anthropic.claude-...) and set AWS_REGION to a region within that geography. Getting the prefix wrong relative to your region is a common cause of a model-not-available error — covered again in section 11.5.5 Model Access and IAM Permissions
Two account-level prerequisites gate the Bedrock route. First, model access must be enabled for each Anthropic model you intend to use, once per AWS account, from the Bedrock console's model catalog; in an AWS Organizations setup the management account can enable access centrally. Until that is done, requests fail no matter how the environment is configured. Second, the IAM principal Claude Code runs as needs permission to invoke models and resolve inference profiles. A minimal policy:{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowModelAndInferenceProfileAccess",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:ListInferenceProfiles",
"bedrock:GetInferenceProfile"
],
"Resource": [
"arn:aws:bedrock:*:*:inference-profile/*",
"arn:aws:bedrock:*:*:application-inference-profile/*",
"arn:aws:bedrock:*:*:foundation-model/*"
]
},
{
"Sid": "AllowMarketplaceSubscription",
"Effect": "Allow",
"Action": [
"aws-marketplace:ViewSubscriptions",
"aws-marketplace:Subscribe"
],
"Resource": "*",
"Condition": {
"StringEquals": { "aws:CalledViaLast": "bedrock.amazonaws.com" }
}
}
]
}
bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream are the call itself; bedrock:GetInferenceProfile lets Claude Code resolve an application inference profile ARN to its backing foundation model without an extra round-trip (most relevant when you authenticate with AWS_BEARER_TOKEN_BEDROCK). The marketplace statement covers model subscription. Scope the Resource ARNs down to the specific profiles and regions you actually use in production rather than leaving the wildcards.5.6 A Note on "Claude in Amazon Bedrock" (Mantle)
There is a second, easily confused way to reach Claude on AWS. Alongside standard Bedrock — which uses the Bedrock Invoke API and theus.anthropic.claude-* inference-profile identifiers above — there is an Anthropic-operated endpoint, sometimes called Mantle or "Claude in Amazon Bedrock," enabled with CLAUDE_CODE_USE_MANTLE=1. It speaks Anthropic's native API shape rather than the Bedrock Invoke API, and its model identifiers are the bare, dateless anthropic.-prefixed form (for example anthropic.claude-haiku-4-5) rather than the regional inference-profile form. Running /status tells you which you are on — it reports Amazon Bedrock (Mantle) for this path. The distinction matters because the two use different model-ID formats; if you mix them up, requests fail. For the purposes of this article, standard Bedrock (section 5.1–5.5) is the route most AWS organizations mean when they say "run Claude Code on Bedrock," and Mantle is the parity-focused alternative worth knowing exists.5.7 A Complete Bedrock Configuration
Pulling the pieces together, a typical production Bedrock setup — region pinned, SSO credentials, all three model aliases pinned to cross-region inference profiles, and telemetry on — looks like this:# Route and region
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
# Credentials (SSO profile in this example)
export AWS_PROFILE=my-bedrock-profile
# Model pins (cross-region inference profiles; substitute eu./apac. as needed)
export ANTHROPIC_DEFAULT_OPUS_MODEL='us.anthropic.claude-opus-4-8'
export ANTHROPIC_DEFAULT_SONNET_MODEL='us.anthropic.claude-sonnet-4-6'
export ANTHROPIC_DEFAULT_HAIKU_MODEL='us.anthropic.claude-haiku-4-5-20251001-v1:0'
# Optional: telemetry to your own collector
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
claude # verify with /status
After launching, run /status to confirm Claude Code reports the Bedrock provider and the expected region, and run a trivial prompt to confirm the IAM principal can actually invoke the model. If the first request fails, the cause is almost always one of the three account-level prerequisites — model access not enabled, missing IAM permission, or a region/inference-profile-prefix mismatch — not the environment block itself.6. Route C — Google Cloud Vertex AI
The Vertex route meters against Claude models served through Google Cloud Vertex AI, keeping the traffic, IAM, and bill inside your GCP project. For a GCP-centric organization — or one whose data-residency rules pin inference to specific Google Cloud regions — this is the natural choice, on the same logic that makes Bedrock natural for AWS shops.6.1 Enable Vertex and Set Project and Region
export CLAUDE_CODE_USE_VERTEX=1
export ANTHROPIC_VERTEX_PROJECT_ID=my-gcp-project
export CLOUD_ML_REGION=us-east5
claude
CLAUDE_CODE_USE_VERTEX=1 puts Claude Code on the Vertex route. ANTHROPIC_VERTEX_PROJECT_ID names the GCP project — though note it is overridden by GCLOUD_PROJECT, GOOGLE_CLOUD_PROJECT, or the credential file pointed at by GOOGLE_APPLICATION_CREDENTIALS, and if none of those are set, the project is resolved from your gcloud configuration or attached service account. CLOUD_ML_REGION is the region or endpoint type — it can be global, a multi-region (eu, us), or a specific region such as us-east5.6.2 Authentication via Application Default Credentials
Vertex uses standard Google Cloud authentication — Application Default Credentials (ADC). For an interactive developer, that means a one-time:gcloud auth application-default login
For a service account (a build host, a CI runner), point GOOGLE_APPLICATION_CREDENTIALS at the key file instead. If Claude Code reports "Could not load the default credentials," ADC is not set up — run the command above or set the credentials path. For credentials that expire, the gcpAuthRefresh setting names the command Claude Code runs to refresh them (for example, gcloud auth application-default login). One behavioral note: /logout is unavailable on the Vertex route, because authentication is handled entirely through Google Cloud credentials rather than a Claude login.6.3 Model Identifiers and Per-Model Region Overrides
Vertex model identifiers use the bare family-and-version form, with a dated variant for some models:export ANTHROPIC_DEFAULT_OPUS_MODEL='claude-opus-4-8'
export ANTHROPIC_DEFAULT_SONNET_MODEL='claude-sonnet-4-6'
export ANTHROPIC_DEFAULT_HAIKU_MODEL='claude-haiku-4-5@20251001'
If you pin nothing, the route falls back to a default primary model and uses it for the small/fast role as well. Vertex has a wrinkle the other routes do not: model availability varies by region, and the service offers three endpoint types — global, multi-region, and regional. When CLOUD_ML_REGION=global but a particular model is not available on the global endpoint, you override the region for that one model with a VERTEX_REGION_CLAUDE_* variable. The docs show, for example:export VERTEX_REGION_CLAUDE_HAIKU_4_5=us-east5
export VERTEX_REGION_CLAUDE_4_6_SONNET=europe-west1
Most model versions have a corresponding VERTEX_REGION_CLAUDE_* variable; for the complete, current per-model list, consult the official Claude Code environment-variables reference linked in the References at the end of this article rather than hard-coding a variable name from memory.6.4 Enabling Vertex and Granting IAM
Three prerequisites gate the Vertex route. First, enable the Vertex AI API in your project:gcloud services enable aiplatform.googleapis.com
Second, request access to the Claude models in the Vertex AI Model Garden — approval can take 24–48 hours, so do this before you need it. Third, grant the principal the right IAM. The roles/aiplatform.user role includes the permission Claude Code needs — aiplatform.endpoints.predict, which covers both model invocation and token counting. For a tighter setup, create a custom role with only aiplatform.endpoints.predict rather than the broader predefined role.6.5 Choosing an Endpoint Type, and a Complete Vertex Configuration
The three endpoint types Vertex offers are a real choice, not a formality, and they map directly onto the data-residency reasoning that brought many teams to Vertex in the first place. A global endpoint (CLOUD_ML_REGION=global) gives the best availability by letting Google route across regions, but it is the wrong choice when your compliance rule pins inference to a geography. A multi-region endpoint (eu or us) keeps inference within a continent-scale boundary — the usual compromise when "must stay in the EU" or "must stay in the US" is the requirement. A regional endpoint (for example us-east5 or europe-west1) pins inference to a single region, the strictest residency posture and the one to use when the rule names a specific region. Because model availability differs across these endpoint types, the VERTEX_REGION_CLAUDE_* per-model overrides exist precisely so you can run most models on global while pinning the one model that is not yet on the global endpoint to a region that serves it.A complete production Vertex setup — project and region pinned, ADC for auth, all three aliases pinned, telemetry on — looks like this:
# Route, project, and region
export CLAUDE_CODE_USE_VERTEX=1
export ANTHROPIC_VERTEX_PROJECT_ID=my-gcp-project
export CLOUD_ML_REGION=us-east5
# Auth: run once for an interactive developer
# gcloud auth application-default login
# Or, for a service account host:
# export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
# Model pins
export ANTHROPIC_DEFAULT_OPUS_MODEL='claude-opus-4-8'
export ANTHROPIC_DEFAULT_SONNET_MODEL='claude-sonnet-4-6'
export ANTHROPIC_DEFAULT_HAIKU_MODEL='claude-haiku-4-5@20251001'
# Optional: telemetry to your own collector
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
claude
As with Bedrock, a first-request failure is almost always one of the three prerequisites — the Vertex AI API not enabled, Model Garden access not yet granted, or an ADC/project/region mismatch — rather than the environment block.7. Model ID Mapping Across Routes
The same Claude generation has a different identifier on each route, and the difference is not cosmetic — paste a Bedrock inference-profile ID into a Vertex configuration, or a bare Anthropic ID into a Bedrock one, and the request fails. This table is the one to keep open while you configure. It uses the cross-region inference profile form for Bedrock (theus. prefix; substitute eu./apac. for your geography).| Generation | Anthropic API | Amazon Bedrock (cross-region inference profile) | Google Cloud Vertex AI |
|---|---|---|---|
| Claude Opus 4.8 | claude-opus-4-8 | us.anthropic.claude-opus-4-8 | claude-opus-4-8 |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | us.anthropic.claude-sonnet-4-6 | claude-sonnet-4-6 |
| Claude Haiku 4.5 | claude-haiku-4-5 | us.anthropic.claude-haiku-4-5-20251001-v1:0 | claude-haiku-4-5@20251001 |
us./eu./apac.) onto an anthropic.-namespaced ID and, for some models, keeps the dated, versioned suffix (-20251001-v1:0). Vertex uses the bare family-and-version ID, sometimes with an @date variant. Note also that since the Claude 4.6 generation, the dateless identifiers are still pinned snapshots, not evergreen pointers — claude-opus-4-8 names a specific release, not "whatever the latest Opus is." For the full lineage of which generation arrived when, see Anthropic Claude Model Release Timeline.One more cross-route subtlety that bites teams who rely on the short aliases: the
opus and sonnet aliases resolve to different generations depending on the route. On the Anthropic API the opus alias maps to the latest Opus and sonnet to the latest Sonnet, while on Bedrock and Vertex the same aliases can map to an earlier generation, because model availability on the cloud routes trails the first-party API. If you care about running a specific generation, pin it explicitly with ANTHROPIC_MODEL or the ANTHROPIC_DEFAULT_*_MODEL variables rather than trusting an alias to mean the same thing everywhere. The role of the small/fast (Haiku-class) model is the same on every route: Claude Code uses it for cheap background work, so pinning ANTHROPIC_DEFAULT_HAIKU_MODEL is part of a complete configuration, not an afterthought.8. Spend Control and Observability Without Quoting Prices
Metered billing is only comfortable if you can see the spend and cap it. Every route gives you both, through different machinery — and you can reason about all of it without a single dollar figure, because the mechanisms are about *visibility, limits, and attribution*, not about rates. For the deeper lever — reducing the tokens you spend in the first place through prompt caching and context engineering — see Anthropic Claude API Prompt Caching and Token Efficiency, which is the natural companion to this section.8.1 Caps and Visibility per Route
On Route A (Anthropic API), the controls live in the Anthropic Console. Administrators can set workspace spend limits that cap total spend for the Claude Code workspace, and the Console provides cost-and-usage reporting. Because keys are issued per workspace, the limit and the reporting both follow the workspace boundary you set up in section 4.4. Inside the tool,/usage gives a per-session estimate and /status confirms which credential is active.On Route B (Amazon Bedrock), spend control is AWS-native: AWS Budgets for thresholds and alerts, and CloudWatch for usage metrics, both scoped by the AWS account and the IAM principal Claude Code runs as. Because the spend is in your AWS bill, it inherits whatever cost-allocation tags and account structure you already use — which is the whole point of routing through Bedrock for chargeback.
On Route C (Google Cloud Vertex AI), the equivalents are Google Cloud budget alerts and quota monitoring in the Cloud Console, scoped by project. As with Bedrock, the spend sits in your existing GCP billing, so project-level budgets and quotas are the right place to put the guardrails.
8.2 Claude Code's Own Telemetry
Independent of the route, Claude Code can emit its own usage and cost telemetry over OpenTelemetry, which is the cleanest way to get per-user, per-team, per-model breakdowns into a dashboard you control. The gate is one variable, after which you configure standard OTEL exporters:export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
Among the metrics it emits are claude_code.cost.usage (the session cost, in USD) and claude_code.token.usage (tokens, breakable down by type (input/output), user, team, model, and more), alongside operational counters like claude_code.session.count, claude_code.commit.count, and claude_code.lines_of_code.count. One caveat that matters for the cloud routes: on Bedrock, Vertex, and Foundry, Claude Code does not pull cost figures from your cloud account — the claude_code.cost.usage value on those routes is a local estimate, and the authoritative number is in the AWS or GCP bill. Use the telemetry for relative, real-time visibility (who is spending, on which model, doing what), and reconcile against the provider's billing for the ground truth.8.3 Reducing Consumption at the Source
Caps and dashboards are downstream of the real lever, which is consuming fewer tokens. Three mechanisms are worth naming here, with the detail deferred to the token-efficiency article. Choosing the right model for the job is the biggest one — the Haiku-class small/fast model handles background and lightweight work at a fraction of the cost of running everything on Opus, which is exactly whyANTHROPIC_DEFAULT_HAIKU_MODEL is worth setting deliberately. Bounding max_tokens caps the output side of a request. And prompt caching — reusing a stable context prefix across requests instead of re-sending it — is the single highest-leverage token reduction for agentic workloads, and the subject of the companion article above. The point for this section is only that spend control is a two-layer problem: cap and observe with the route's machinery, and reduce consumption at the source with model choice and caching.8.4 Where the Authoritative Number Lives
A recurring source of confusion is which surface to trust for the real spend, because each route exposes the number in a different place and Claude Code's own in-tool estimate is never the authoritative one. On Route A, the ground truth is the Anthropic Console's usage and billing views, scoped by workspace; the in-tool/usage figure is a convenience estimate for the current session, not the invoice. On Route B, the authoritative number is your AWS bill — Cost Explorer and the Billing console, filtered to the Bedrock service and tagged by whatever cost-allocation tags you apply — and AWS Budgets is where you set the alerting threshold against it. On Route C, it is the Google Cloud billing reports and budgets for the project, with Vertex AI as the filtered service.The reason this matters operationally: on the two cloud routes Claude Code does not read cost back from your cloud account, so its telemetry and
/usage are estimates that can diverge from the provider's metered figure (rounding, request overhead, tier effects). Use the telemetry for *fast relative signal* — which user, team, or model is driving spend right now — and reconcile against the provider's billing surface for *the number you report upward*. Building a cost dashboard that silently treats the OpenTelemetry estimate as the bill is a mistake that surfaces only when finance compares it to the actual invoice.9. A Decision Framework: Which Route (and Subscription versus Metered) Fits You
Enough mechanism. Here is how to actually choose. The decision tree below walks the questions in the order that most cleanly narrows the options; the prose after it explains each branch.
If no hard constraint forces a cloud route, ask whether you are evaluating or running steady-state. During an evaluation, metered billing is almost always right: you avoid committing to seats before you know the value, and you pay only for the trial usage. The direct Anthropic API route is the lowest-friction way to meter when you have no cloud-routing requirement.
For steady-state, the deciding question is the shape of usage. If usage is high and even across the team, the per-seat subscription is usually both cheaper and simpler — flat, predictable, no infrastructure. If usage is uneven — a few heavy users and many light ones — metering is fairer, because seats would over-charge the light users; meter on the Anthropic API, or on a cloud route if you also want chargeback.
Finally, ask whether you need to attribute the cost. If you must charge spend back to specific teams, projects, or cost centers, a cloud route (Bedrock or Vertex) gives you in-account billing and cost-allocation tags that the direct API route does not, and the Anthropic API route gives you per-workspace separation as a lighter-weight alternative. If you are on AWS, Bedrock; if on GCP, Vertex; if multi-cloud, pick the one that owns the cost center you are billing.
The axes, summarized: evaluation versus steady-state (meter while evaluating), even versus uneven usage (seats for even, metering for uneven), AWS versus GCP versus neither (Bedrock, Vertex, or direct API), data residency (forces a cloud route and narrows its regions), and chargeback (favors a cloud route or per-workspace keys). Most real decisions are one hard constraint plus one usage-shape judgment; the tree just makes the order explicit.
9.1 Four Worked Scenarios
The tree is easier to trust against concrete cases. Here are four that cover most of the spread.The solo developer trying it out. No team, no cloud requirement, just wants to see whether the tool earns its keep. There is no hard constraint and no chargeback need, and "evaluation" dominates — so meter on the direct Anthropic API with
ANTHROPIC_API_KEY. It is the lowest-friction way to pay only for the trial, with nothing to provision. If usage settles into heavy daily use later, revisit whether a Max subscription is now cheaper.The ten-person team where everyone uses it all day. No data-residency rule, no requirement to route through a specific cloud, and usage is high and even. The hard-constraint branch is empty, the team is past evaluation, and the usage shape is even — so the per-seat subscription wins on both cost and simplicity. Provisioning ten API keys and a cost dashboard here would be effort spent to make the setup worse.
The product group where two engineers live in the tool and eight dip in occasionally. Same lack of hard constraints, but the usage is sharply uneven. Seats would have eight people paying a flat price for very little work. Meter instead — on the direct Anthropic API if there is no chargeback need, or on a cloud route if the spend should land in the group's AWS or GCP account. A reasonable hybrid is to put the two heavy users on seats and meter the eight occasional ones, which the decision tree supports because it applies per user, not once for the group.
The regulated enterprise on AWS with a data-residency rule. A compliance requirement pins inference to specific AWS regions inside the company's own account. This is a hard constraint, and it overrides every cost consideration: the route is Amazon Bedrock, with the inference-profile geography prefix and
AWS_REGION chosen to satisfy the residency rule, the spend landing in the existing AWS bill for chargeback, and access controlled through IAM and managed settings. Whether metering or seats would have been "cheaper" never enters the decision, because the constraint is not negotiable on price. The GCP-shop version of this scenario is identical with Vertex substituted for Bedrock.10. Enterprise and Team Considerations
Rolling Claude Code out to many users turns the billing choice from a personal preference into a platform decision, and a few patterns recur.Enforce the route centrally rather than per developer. Claude Code reads its configuration from
settings.json, and managed (admin-owned) settings let a platform team fix the provider, inject the right environment, and prevent individuals from accidentally metering against the wrong account or falling back to a personal subscription. The mechanics of managed settings and the precedence between user and admin configuration are covered in Claude Code Features and Settings Reference and, at the boundary level, in Claude Code Harness and Environment Engineering Guide; the billing-relevant point is that the route should be a platform-enforced setting, not something each engineer wires up by hand. For a cloud route, that usually means shipping the CLAUDE_CODE_USE_BEDROCK / CLAUDE_CODE_USE_VERTEX flag, the region, and the model pins in managed settings, and letting the credential come from the host's instance role or SSO rather than a key in a dotfile.Decide the key-injection policy deliberately. For the Anthropic API route, the cleanest pattern is per-workspace keys delivered through
apiKeyHelper (so a secrets manager owns the key and rotation) rather than a static ANTHROPIC_API_KEY in every engineer's shell. For the cloud routes, prefer instance roles and SSO over long-lived keys entirely. This keeps secrets out of source control and lets you revoke access centrally.Audit and attribute. The OpenTelemetry stream from section 8.2 is what makes a fleet legible — per-user and per-team token and cost breakdowns flowing into a dashboard, reconciled against the provider's bill for the authoritative figure. Combine it with per-workspace keys (Route A) or per-account/project billing (Routes B and C) so that the question "who spent what, on which model" always has an answer.
On data handling, each provider documents its own data-use and retention policy, and the answer to "is my code used for training" depends on the route and the provider — so the right move in a procurement review is to cite the provider's official policy for the route you chose rather than a general statement. Point the reviewer at Anthropic's, AWS's, or Google Cloud's documentation for the specific commitments.
Run the two billing models side by side when it helps. A common and sensible pattern is to meter during evaluation and adoption, then move heavy, steady-state users onto seats once usage has flattened and the value is proven — keeping the occasional users on metering so they are not paying for idle seats. The two models are not mutually exclusive across an organization, and the decision tree in section 9 applies per team, not once for the whole company. For teams whose unattended pipeline usage runs on metered billing by definition, Claude Code in CI/CD and Headless Automation covers the automation side.
11. Common Pitfalls
The failures on these routes are almost all configuration mismatches, and they recur often enough to enumerate.Pasting one route's model ID into another route. The most common outage: a
us.anthropic.claude-opus-4-8 Bedrock inference-profile ID in a Vertex configuration, or a bare claude-opus-4-8 where Bedrock expects the inference-profile form. Each route has its own identifier shape (section 7); pin the right one per route.A region that does not serve the model. On Bedrock, a
us.-prefixed inference profile with AWS_REGION set to a non-US region — or the reverse — produces a model-not-available error. On Vertex, a model that is not on the global endpoint while CLOUD_ML_REGION=global fails until you set the matching VERTEX_REGION_CLAUDE_* override. Match the region (and the inference-profile geography prefix) to where the model is actually served.Bedrock model access not enabled. A correctly configured environment still fails if model access has not been enabled for that Anthropic model in the Bedrock console for your account. Enable it once per account (or centrally via AWS Organizations) before you debug anything else.
Vertex ADC, project, or region mismatch. "Could not load the default credentials" means ADC is not set up — run
gcloud auth application-default login or set GOOGLE_APPLICATION_CREDENTIALS. Remember that ANTHROPIC_VERTEX_PROJECT_ID is overridden by GCLOUD_PROJECT / GOOGLE_CLOUD_PROJECT / the credentials file, so a "wrong project" symptom is often one of those quietly taking precedence.ANTHROPIC_API_KEY and a subscription fighting. If a stray ANTHROPIC_API_KEY is exported in your shell profile, it overrides your subscription login (section 4.2) — and if that key belongs to an expired organization, you get authentication failures while wondering why your perfectly good subscription is not being used. unset ANTHROPIC_API_KEY to fall back; run /status to see what is actually active.Forgetting the small/fast model. Leaving
ANTHROPIC_DEFAULT_HAIKU_MODEL unset means background tasks run on whatever the default is — often the same model as your primary — which quietly inflates token spend. Pin it. And if you are carrying an old configuration, migrate the deprecated ANTHROPIC_SMALL_FAST_MODEL to ANTHROPIC_DEFAULT_HAIKU_MODEL.Proxy and base-URL traps.
ANTHROPIC_BASE_URL (and the Bedrock/Vertex equivalents ANTHROPIC_BEDROCK_BASE_URL / ANTHROPIC_VERTEX_BASE_URL) change the endpoint, not the model. A misconfigured corporate proxy or a stale base URL produces connection or auth errors that look like credential problems but are not — check the base-URL variables when authentication "should" work but does not.12. Frequently Asked Questions
Can I use Claude Code without a subscription? Yes. SetANTHROPIC_API_KEY to an Anthropic API key and Claude Code runs on pay-as-you-go metered billing with no subscription at all — that is Route A in section 4. You can also meter through Amazon Bedrock or Google Cloud Vertex AI instead.How do I run Claude Code on Amazon Bedrock? Set
CLAUDE_CODE_USE_BEDROCK=1 and AWS_REGION, provide AWS credentials through the standard chain (SSO, profile, keys, instance role, or a Bedrock API key), pin your models with ANTHROPIC_MODEL / ANTHROPIC_DEFAULT_*_MODEL using cross-region inference profile IDs, enable model access for your account, and grant the IAM permissions in section 5.5. Full setup is section 5.How do I run Claude Code on Google Vertex AI? Set
CLAUDE_CODE_USE_VERTEX=1, ANTHROPIC_VERTEX_PROJECT_ID, and CLOUD_ML_REGION, authenticate with Application Default Credentials (gcloud auth application-default login), enable the Vertex AI API, request Claude model access in the Model Garden, and grant roles/aiplatform.user. Full setup is section 6.Subscription versus pay-as-you-go: which is cheaper for uneven team usage? For uneven usage — a few heavy users and many light ones — metered billing is usually fairer, because a flat per-seat subscription charges the light users as much as the heavy ones. For high, even usage across the team, the subscription is usually cheaper and simpler. This site does not quote prices; section 9 gives the decision framework, and the official pricing pages own the numbers.
Do the three routes use the same models? They reach the same Claude generations, but the model *identifier* differs per route — bare IDs on the Anthropic API, cross-region inference profile IDs on Bedrock, family-and-version IDs on Vertex — and the short
opus/sonnet aliases can resolve to different generations on the cloud routes. Pin the model explicitly. The mapping is in section 7.How do I cap spending? Use the route's native controls — workspace spend limits in the Anthropic Console (Route A), AWS Budgets and CloudWatch (Route B), or Google Cloud budget alerts (Route C) — and reduce consumption at the source with model choice,
max_tokens, and prompt caching. Section 8 covers the machinery; the companion token-efficiency article covers reduction.Is my code used for training? That depends on the route and the provider, and each documents its own data-use and retention commitments. Rather than rely on a general statement, cite the official policy for the route you chose — Anthropic's for the direct API, AWS's for Bedrock, Google Cloud's for Vertex — which is also what a procurement review will want to see.
13. Summary
Claude Code is one tool with four ways to pay for it: a per-seat subscription logged in over OAuth, and three metered routes — the Anthropic API directly, Amazon Bedrock, and Google Cloud Vertex AI. The subscription optimizes for flat, predictable per-person cost and zero infrastructure, and wins for teams with high, even usage and no cloud-routing requirement. Metering optimizes for paying only for what you use and for keeping spend and data inside infrastructure you already control, and wins for evaluation, for uneven usage, for data-residency and existing-cloud constraints, and for chargeback.The setup differences come down to three things per route: the enabling switch and credential (
ANTHROPIC_API_KEY; CLAUDE_CODE_USE_BEDROCK=1 plus the AWS chain; CLAUDE_CODE_USE_VERTEX=1 plus ADC), the region concept (none; cross-region inference profiles; Vertex endpoint types), and the model identifier (bare; us.anthropic.* inference profiles; Vertex family-and-version IDs) — and the identifiers are not portable, so pin the right one per route. Spend is capped and observed with each route's native machinery plus Claude Code's own OpenTelemetry stream, all without quoting a price, and reduced at the source with model choice and caching.The decision, distilled: let any hard constraint (data residency, an existing cloud contract) pick the cloud route first; otherwise meter while you evaluate, then choose seats for high even usage and metering for uneven usage, leaning on a cloud route when you need chargeback. From here, the natural next reads are Claude Code Getting Started and the Features and Settings Reference for the surrounding configuration; the Anthropic Claude API Prompt Caching and Token Efficiency guide for spending fewer tokens on whichever route you pick; the Claude Agent SDK Complete Guide for building on the same provider routes from your own code; and Claude Code in CI/CD and Headless Automation for running the agent unattended, which assumes metered billing by definition. This page tracks a fast-moving surface — model identifiers, environment-variable names, and provider availability all change — so treat the official documentation linked below as the final authority and expect periodic updates here.
14. References
Official documentation — Claude Code- Claude Code documentation
- Claude Code — Authentication
- Claude Code — Amazon Bedrock
- Claude Code — Google Vertex AI
- Claude Code — Model configuration
- Claude Code — Environment variables reference
- Claude Code — Monitoring usage (OpenTelemetry)
- Claude Code — Manage costs
- Anthropic — Models overview
- Anthropic — Pricing
- Amazon Bedrock — Pricing
- Amazon Bedrock — Cross-region inference
- Amazon Bedrock — Model access
- Google Cloud — Claude models on Vertex AI
- Google Cloud Vertex AI — Pricing
- Claude Code Getting Started — Why Knowing About Local AI Agents Changes Everything
- Claude Code Features and Settings Reference
- Claude Code Harness and Environment Engineering Guide
- Claude Code Operator's Handbook
- Anthropic Claude Model Release Timeline
- Amazon Bedrock — Basic Information and API Examples
- MCP Server on AWS Lambda Complete Guide
References:
Tech Blog with curated related content
Written by Hidekazu Konishi