Amazon Bedrock Glossary - A Reference for AI Engineers and Architects

First Published: 2026-05-16
Last Updated: 2026-05-16

This glossary defines the core terms that AI engineers, cloud architects, and PoC owners encounter when building on Amazon Bedrock. Each entry is short (two to four sentences), names the closest related terms, and links to the canonical AWS documentation page so you can verify the definition or follow up on details. The entries are grouped into eleven categories — foundation models, inference and throughput, the API surface, prompt and token mechanics, Knowledge Bases and RAG, Guardrails, Bedrock Agents, Amazon Bedrock AgentCore, interoperability protocols, workflows and Studio, and customization and evaluation — and a full A–Z index is provided at the top for direct jumps.

The glossary is designed as a single-fetch reference: an AI search agent (or a human reader) can answer a "what is X in Bedrock?" question without having to open every individual AWS docs page. For deep dives, follow the internal links to the related articles on hidekazu-konishi.com referenced at the end of each section. If you are coming from the AWS AI and Machine Learning Glossary, this article is the Bedrock-specific companion focused on the Bedrock surface area.

Out of scope: pricing numbers, model-by-model parameter counts, and step-by-step API request/response examples. Pricing terms (for example, Provisioned Throughput or Application Inference Profile) are explained as architectural concepts only — the cost-control role of each is mentioned, but no dollar figures are quoted, because list prices change frequently and AWS docs are always the authoritative source for current pricing.

How to Use This Glossary

Each term entry has three lines: a short definition that names the concept on its own (no forward references), a Related: line that lists 2–6 nearby terms in this same glossary as in-page anchors, and a Source: line that links to the most relevant Amazon Bedrock documentation page. If the official name has changed (for example, the rename from "Knowledge Bases for Amazon Bedrock" to "Amazon Bedrock Knowledge Bases"), the entry mentions both names and which is current.

If a term is in preview or has an evolving official name at the time of writing, the entry calls that out explicitly so you can re-check the AWS console or the latest docs page for the canonical wording in your region.

A–Z Index

The full list, in alphabetical order, with anchors to each entry:

A2A (Agent-to-Agent) Protocol
Action Group
AgentCore Browser Tool
AgentCore Code Interpreter
AgentCore Gateway
AgentCore Identity
AgentCore Memory
AgentCore Observability
AgentCore Runtime
Amazon Bedrock Data Automation (BDA)
Amazon Bedrock IDE
Amazon Bedrock Knowledge Bases
Amazon Bedrock Marketplace
Amazon Nova
Anthropic Claude (on Bedrock)
Application Inference Profile
Base Model
Batch Inference
Bedrock Agent
Bedrock Flows
Bedrock Prompt Management
Bedrock Studio
Chunking
Collaborator Agent
Content Filter
Context Window
Continued Pre-training
Contextual Grounding Check
Converse API
ConverseStream API
Cross-Region Inference
Custom Model
Custom Model Import
Data Source and Sync Job
Denied Topic (Topic Policy)
Embedding
Embedding Model
Fine-tuning (on Bedrock)
Foundation Model (FM)
Function Calling
Guardrail
Inference Profile
InvokeModel API
InvokeModelWithResponseStream API
Latency-Optimized Inference
LLM-as-a-Judge
MCP (Model Context Protocol)
Meta Llama (on Bedrock)
Metadata Filter (Knowledge Base)
Mistral (on Bedrock)
Model Deprecation
Model Distillation
Model Evaluation
Model Family
Model Lifecycle
Model Provider
Multi-Agent Collaboration
Multimodal Input
On-Demand Inference
OpenAPI Schema (for Agent)
Orchestration (Agent)
Output Token (Max Tokens)
Prompt Attack Filter
Prompt Caching
Prompt Engineering
Provisioned Throughput
RAG (Retrieval Augmented Generation)
Reranker
Retrieval
RPM (Requests Per Minute)
Sensitive Information Filter
Stop Sequence
Streaming Response
Supervisor Agent
System Prompt
Temperature
Token
Tool Use
Top-K
Top-P (Nucleus Sampling)
TPM (Tokens Per Minute)
Vector Store
Watermark
Word Filter

Bedrock Component Map (Layer Overview)

Amazon Bedrock can be read as six conceptual layers that the rest of this glossary expands on in detail. The Foundation Models layer supplies models that the Inference Layer (on-demand, provisioned, profiled, or batch) exposes to your application through the API Surface (InvokeModel and Converse). On top of the API surface sit the higher-level Surfaces — Knowledge Bases for RAG, Guardrails for safety, Agents for orchestration, and Flows / Studio / Prompt Management for low-code authoring. Amazon Bedrock AgentCore is a parallel runtime layer for agents built outside the managed Bedrock Agents surface, and Customization and Evaluation spans the entire stack for adapting and benchmarking models.

Layer	Representative components in this glossary	Role
1. Foundation Models	Amazon Nova, Anthropic Claude, Meta Llama, Mistral, Cohere, AI21 Labs, plus additional models surfaced through Amazon Bedrock Marketplace	Provider-trained model weights exposed through a single Bedrock API regardless of provider.
2. Inference Layer	On-Demand Inference, Provisioned Throughput, Inference Profile (Cross-Region / Application), Batch Inference	Capacity purchasing and cross-region routing options for each invocation.
3. API Surface	`InvokeModel` / `InvokeModelWithResponseStream`, `Converse` / `ConverseStream` (Tool Use, Multimodal, Prompt Caching)	The runtime entry points your application calls to reach Layer 2 and ultimately Layer 1.
4. Higher-level Surfaces	Knowledge Bases (Embedding / Vector Store / Retrieval / Metadata Filter / Reranker), Bedrock Data Automation (BDA), Guardrails (Content / Topic / Word / PII / Grounding / Prompt Attack), Bedrock Agents (Action Group / OpenAPI / Multi-Agent), Flows / Studio / IDE / Prompt Management	Managed retrieval, safety, orchestration, and authoring features that sit on top of the API surface.
5. Amazon Bedrock AgentCore	AgentCore Runtime, Memory, Gateway (MCP / OAuth), Code Interpreter, Browser Tool, Identity, Observability	Parallel runtime for bring-your-own agents built outside the managed Bedrock Agents surface.
6. Customization and Evaluation	Continued Pre-training, Fine-tuning, Model Distillation, Model Evaluation, LLM-as-a-Judge	Cross-cutting offline tooling that adapts the models in Layer 1 and benchmarks the quality of their output.

In addition, this glossary covers the protocols MCP and A2A (Section I) and Watermark (Section K), which sit across these layers rather than belonging to a single one.

The typical request path is Application → Layer 3 (API Surface) → Layer 2 (Inference) → Layer 1 (Foundation Models), with Layer 4 features attached to the same Converse call (Knowledge Bases for retrieval, Guardrails for safety, Bedrock Agents for tool orchestration). Layer 5 (AgentCore) is reached when a custom agent built outside Bedrock Agents calls back into the same API surface to invoke a model, and Layer 6 (Customization and Evaluation) is offline tooling that produces or scores the artifacts used in Layer 1.

A. Foundation Models and Model Catalog

Foundation Model (FM)

A Foundation Model is a large pre-trained model — typically a large language model (LLM) or large multimodal model — that Amazon Bedrock exposes through a unified API so you can build generative AI applications without managing the underlying GPU infrastructure. In Bedrock terminology, every model accessible through InvokeModel or Converse is a foundation model, regardless of whether it was trained by Amazon, Anthropic, Meta, Mistral AI, AI21 Labs, Cohere, or another supported provider in the current Bedrock catalog.
Related: Base Model · Model Provider · Custom Model · Model Family
Source: What is Amazon Bedrock? — AWS Docs

Base Model

A Base Model is a foundation model in its provider-shipped state — that is, before any customer-specific customization such as fine-tuning, continued pre-training, or distillation. Bedrock contrasts base models with Custom Models to make it explicit which artifact carries provider weights versus your post-trained weights, and base-model IDs are what you pass to InvokeModel or Converse when no customization is needed.
Related: Foundation Model (FM) · Custom Model · Fine-tuning (on Bedrock) · Continued Pre-training
Source: Supported foundation models in Amazon Bedrock — AWS Docs

Custom Model

A Custom Model is a model artifact on Bedrock that carries customer-specific weights, produced through one of two paths: a Bedrock customization job (fine-tuning, continued pre-training, or model distillation), which yields a model invoked through Provisioned Throughput; or Custom Model Import, which registers externally trained weights of a supported open-source architecture and is invoked with InvokeModel / InvokeModelWithResponseStream on a per-token, on-demand basis in selected Regions.
Related: Base Model · Fine-tuning (on Bedrock) · Continued Pre-training · Model Distillation · Custom Model Import · Provisioned Throughput
Source: Custom models in Amazon Bedrock — AWS Docs

Model Provider

A Model Provider is the organization that trained and ships a foundation model into Amazon Bedrock — for example, Amazon (Nova, Titan), Anthropic (Claude), Meta (Llama), Mistral AI, AI21 Labs (Jamba), and Cohere, in addition to specialized providers exposed via Amazon Bedrock Marketplace. Each provider sets the model's End User License Agreement (EULA), and you must accept that EULA in the Bedrock console before invoking the model; the current provider list in your Region is authoritative because providers join and leave the catalog over time.
Related: Foundation Model (FM) · Model Family · Anthropic Claude (on Bedrock) · Meta Llama (on Bedrock) · Mistral (on Bedrock) · Amazon Nova
Source: Foundation model providers in Amazon Bedrock — AWS Docs

Model Family

A Model Family is a related set of foundation models from the same provider that share architecture and naming but differ in size, tier, or capability — for example, "Claude" (Haiku / Sonnet / Opus), "Llama" (3.x / 4.x), "Nova" (Micro / Lite / Pro / Premier / Canvas / Reel), or "Mistral" (7B / Large). Treating a family as a unit lets you pick a price-performance tier per workload and swap within the family later as new versions ship.
Related: Foundation Model (FM) · Model Provider · Model Lifecycle · Model Deprecation
Source: Foundation models reference — AWS Docs

Amazon Nova

Amazon Nova is Amazon's most recent first-party family of foundation models on Bedrock (alongside the earlier Amazon Titan family), spanning text-only tiers (Micro / Lite / Pro / Premier), image generation (Nova Canvas), and video generation (Nova Reel). Nova models are designed for a balance of cost, multilingual support, and tool use, and they are accessed through the same InvokeModel / Converse APIs as third-party models.
Related: Foundation Model (FM) · Model Family · Model Provider · Multimodal Input
Source: Amazon Nova models in Amazon Bedrock — AWS Docs

Anthropic Claude (on Bedrock)

Anthropic Claude is a family of foundation models from Anthropic, available on Bedrock in tiers such as Haiku (fast and economical), Sonnet (balanced), and Opus (highest capability). Bedrock exposes Claude through the same Converse and InvokeModel APIs as other models, and Claude's distinguishing features on Bedrock include native tool use, prompt caching, vision input, and extended context windows.
Related: Model Provider · Tool Use · Prompt Caching · Multimodal Input · Context Window
Source: Anthropic Claude models in Amazon Bedrock — AWS Docs

Meta Llama (on Bedrock)

Meta Llama is a family of open-weight foundation models from Meta, available on Bedrock through Meta's Bedrock channel. Bedrock supports current-generation Llama tiers (for example, 3.x and 4.x families) for chat, instruction following, and increasingly for multimodal inputs, and you call them with the same Converse API used for other Bedrock models.
Related: Model Provider · Model Family · Tool Use · Foundation Model (FM)
Source: Meta Llama models in Amazon Bedrock — AWS Docs

Mistral (on Bedrock)

Mistral is a family of foundation models from Mistral AI, available on Bedrock in current tiers such as Mistral Large (including the Mistral Large 2 generation) and Mistral Small, with the earlier Mistral 7B / Mixtral lineage now in various deprecation stages. Mistral models on Bedrock are commonly chosen for cost-sensitive workloads, structured output, and European data-residency requirements when invoked from EU regions; consult the model lifecycle page before pinning a specific tier.
Related: Model Provider · Model Family · Foundation Model (FM)
Source: Mistral AI models in Amazon Bedrock — AWS Docs

Model Lifecycle

Model Lifecycle is the set of states a foundation model passes through on Bedrock: typically Active (currently recommended and feature-complete), Legacy (still available but no longer the recommended default), and EOL / End of Life (no longer available for new invocations). Tracking lifecycle is essential because models in Legacy are scheduled for deprecation and applications pinning a specific model ID must plan migrations.
Related: Model Deprecation · Model Family · Foundation Model (FM)
Source: Model lifecycle in Amazon Bedrock — AWS Docs

Model Deprecation

Model Deprecation is the formal AWS announcement that a foundation model will no longer be available on Bedrock after a specific date, after which InvokeModel and Converse calls to that model ID will fail. AWS publishes deprecation dates and recommended successor models so workloads can migrate before the cutoff; ignoring a deprecation notice is the most common cause of production outages on Bedrock.
Related: Model Lifecycle · Model Family · Foundation Model (FM)
Source: Model lifecycle in Amazon Bedrock — AWS Docs

Amazon Bedrock Marketplace

Amazon Bedrock Marketplace is a discovery surface inside Bedrock for accessing additional foundation models — including specialized and industry-vertical models — beyond the core curated catalog of major providers. Models obtained through Marketplace are invoked through Bedrock APIs with familiar inference primitives, while licensing and access are negotiated through AWS Marketplace.
Related: Foundation Model (FM) · Model Provider · Custom Model
Source: Amazon Bedrock Marketplace — AWS Docs

B. Inference and Throughput

On-Demand Inference

On-Demand Inference is the default Bedrock invocation mode in which you pay per input/output token (or per image/second of video) without any pre-purchased capacity. It is appropriate for workloads with spiky or unpredictable traffic, but it is subject to per-model account-level TPM and RPM quotas and is not available for some custom models.
Related: Provisioned Throughput · Inference Profile · TPM (Tokens Per Minute) · RPM (Requests Per Minute)
Source: Run inference with Amazon Bedrock — AWS Docs

Provisioned Throughput

Provisioned Throughput is a Bedrock purchase mode that reserves a fixed amount of model capacity (Model Units) for a chosen base or custom model, in exchange for predictable performance with a commitment term of No commitment (hourly), one month, or six months. The No-commitment option is offered for selected base models at a higher per-Model-Unit rate without a minimum term, while the multi-month terms trade a minimum commitment for a lower per-Model-Unit rate. It is the standard way to invoke customization-job-produced Custom Models, to guarantee headroom during spikes, and to act as a cost-control lever by trading higher idle cost for lower per-token cost at steady high utilization.
Related: On-Demand Inference · Custom Model · Model Family · TPM (Tokens Per Minute)
Source: Provisioned Throughput in Amazon Bedrock — AWS Docs

Inference Profile

An Inference Profile is a Bedrock-managed identifier that abstracts a model so a single call can be routed across multiple AWS Regions (for cross-region inference) or against a tracked configuration. System-defined inference profiles cover popular models out of the box; you address them by inference profile ID instead of a raw model ID when cross-region routing is required.
Related: Application Inference Profile · Cross-Region Inference · InvokeModel API · Converse API
Source: Inference profiles in Amazon Bedrock — AWS Docs

Application Inference Profile

An Application Inference Profile is a customer-defined inference profile that wraps a model (or a system-defined profile) with custom tags so per-application or per-tenant cost can be tracked in Cost Explorer and AWS Budgets. It is also a cost-control lever: tagging traffic at the inference layer is the most reliable way to allocate Bedrock spend to business units without changing application code.
Related: Inference Profile · Cross-Region Inference · On-Demand Inference
Source: Application inference profiles in Amazon Bedrock — AWS Docs

Cross-Region Inference

Cross-Region Inference is the Bedrock capability of distributing on-demand inference traffic across multiple AWS Regions when called through an inference profile, increasing aggregate throughput and reducing the chance of throttling during traffic spikes. Cross-region inference does not move data outside the geographic boundary defined by the profile (for example, a US profile keeps traffic within US Regions).
Related: Inference Profile · Application Inference Profile · On-Demand Inference
Source: Cross-Region inference in Amazon Bedrock — AWS Docs

Streaming Response

A Streaming Response is a model invocation in which output tokens are returned incrementally as the model generates them, instead of waiting for the entire response to complete. On Bedrock you opt in to streaming through InvokeModelWithResponseStream or ConverseStream, which is the standard way to build chat UIs and reduce perceived latency.
Related: InvokeModelWithResponseStream API · ConverseStream API · Latency-Optimized Inference
Source: Streaming with Amazon Bedrock — AWS Docs

Batch Inference

Batch Inference is a Bedrock invocation mode in which you submit a large number of prompts via Amazon S3 input/output locations as a single asynchronous job, instead of calling InvokeModel interactively. Batch jobs trade real-time latency for higher discounts and are a cost-control lever for nightly summarization, evaluation runs, and offline analytics.
Related: On-Demand Inference · Model Evaluation · Provisioned Throughput
Source: Batch inference in Amazon Bedrock — AWS Docs

TPM (Tokens Per Minute)

TPM is the Bedrock service quota that caps how many input + output tokens an account can process per minute for a given model in a given Region. TPM is the dominant binding constraint for high-traffic LLM workloads and is the primary axis that Provisioned Throughput and Cross-Region Inference are designed to relax.
Related: RPM (Requests Per Minute) · Provisioned Throughput · Cross-Region Inference · Token
Source: Amazon Bedrock quotas — AWS Docs

RPM (Requests Per Minute)

RPM is the Bedrock service quota that caps how many API requests an account can make per minute for a given model in a given Region. RPM matters most when each request is small (for example, classification or embedding generation), where you can hit the RPM ceiling without coming close to the TPM ceiling.
Related: TPM (Tokens Per Minute) · On-Demand Inference · Embedding
Source: Amazon Bedrock quotas — AWS Docs

Latency-Optimized Inference

Latency-Optimized Inference is a Bedrock-supported invocation path that prioritizes time-to-first-token (TTFT) and tokens-per-second for selected models, by running them on hardware tuned for low-latency serving. It is requested at invocation time through a performance configuration and is most useful for interactive chat and voice agent workloads.
Related: Streaming Response · On-Demand Inference · Inference Profile
Source: Latency-optimized inference in Amazon Bedrock — AWS Docs

C. API and Invocation

For concrete request and response examples that pair with the API terms below, see Amazon Bedrock Basic Information and API Examples.

InvokeModel API

InvokeModel is the lowest-level Bedrock runtime API that takes a model-specific request body (varying by provider) and returns the model's raw response. It is the right choice when you need full access to model-specific parameters that the higher-level Converse API does not normalize, or when working with image, video, or embedding models that are not chat-shaped.
Related: Converse API · InvokeModelWithResponseStream API · Inference Profile
Source: InvokeModel — Amazon Bedrock Runtime API Reference

InvokeModelWithResponseStream API

InvokeModelWithResponseStream is the streaming counterpart of InvokeModel and returns response chunks as a server-sent event-style stream. It is used in chat and assistant UIs where displaying tokens as they are generated significantly improves perceived latency.
Related: InvokeModel API · Streaming Response · ConverseStream API
Source: InvokeModelWithResponseStream — Amazon Bedrock Runtime API Reference

Converse API

The Converse API is Bedrock's unified, model-agnostic chat API that normalizes request and response shapes across model providers — messages, system prompts, tool use, and multimodal content all use the same JSON schema regardless of model. Converse is the recommended default for chat workloads because it lets you swap models without re-writing client code.
Related: ConverseStream API · InvokeModel API · Tool Use · System Prompt · Multimodal Input
Source: Converse — Amazon Bedrock Runtime API Reference

ConverseStream API

ConverseStream is the streaming variant of Converse, returning unified events for content deltas, tool-use blocks, and stop reasons across all supported model providers. It is the standard streaming primitive when building chat experiences, because the event schema is consistent whether the underlying model is Claude, Llama, Nova, or Mistral.
Related: Converse API · Streaming Response · Tool Use
Source: ConverseStream — Amazon Bedrock Runtime API Reference

Tool Use

Tool Use is the Bedrock pattern in which a model is told about a set of tools (functions) it can call, decides at inference time which tool to invoke with which arguments, and returns a structured toolUse block instead of free-text. The application then executes the tool and feeds the result back to the model in a follow-up turn until the model returns a final answer.
Related: Function Calling · Converse API · Bedrock Agent · MCP (Model Context Protocol)
Source: Tool use (function calling) with the Converse API — AWS Docs

Function Calling

Function Calling is the widely used industry term for the same mechanism Bedrock calls Tool Use — the model emits a structured intent to call a named function with typed arguments. On Bedrock the two terms refer to the same feature; "Tool Use" is the AWS-native name in the Converse API surface.
Related: Tool Use · Converse API · OpenAPI Schema (for Agent)
Source: Tool use (function calling) with the Converse API — AWS Docs

Multimodal Input

Multimodal Input is the ability to send images, documents, video frames, or audio alongside text in a single request to models that support those modalities — for example, Claude (vision), Amazon Nova (vision and document), and Llama vision variants. On Bedrock you pass multimodal content as content blocks inside a Converse request, with type image, document, or video.
Related: Converse API · Anthropic Claude (on Bedrock) · Amazon Nova · Token
Source: Send multimodal requests with Converse — AWS Docs

D. Prompt and Token Mechanics

Prompt Engineering

Prompt Engineering is the practice of designing input text (system prompt, user prompt, examples, and instructions) so that a foundation model produces the desired output reliably. On Bedrock this is provider-specific: Anthropic prompts differ in best practice from Llama, Nova, or Mistral prompts, and AWS publishes per-provider prompt engineering guidelines you should consult before tuning.
Related: System Prompt · Prompt Caching · Temperature · Stop Sequence
Source: Prompt engineering guidelines — AWS Docs

System Prompt

A System Prompt is a special, separate string passed to a model that sets persistent role, constraints, or formatting rules for the entire conversation — it is not part of the user's turn and is not echoed back. In the Converse API you supply it as the system parameter, and it is the first thing AWS recommends you cache when applying Prompt Caching.
Related: Prompt Engineering · Prompt Caching · Converse API
Source: System prompts in Converse — AWS Docs

Prompt Caching

Prompt Caching is a Bedrock feature (supported on selected models, notably Anthropic Claude) that caches a marked-up prefix of a prompt — typically a long system prompt, a tool schema, or a document — so subsequent requests reusing that prefix incur much lower input-token charges and lower latency. It is one of the most effective cost-control levers on Bedrock for chat workloads where prompts share a large stable prefix.
Related: System Prompt · Anthropic Claude (on Bedrock) · Converse API · Token
Source: Prompt caching in Amazon Bedrock — AWS Docs

Stop Sequence

A Stop Sequence is a string that, when generated by the model, causes generation to terminate immediately at that point. On Bedrock you set them through the stopSequences array on Converse or the equivalent per-model parameter on InvokeModel, and they are commonly used to bound output for structured generation (for example, stop on a closing tag).
Related: Output Token (Max Tokens) · Converse API · Prompt Engineering
Source: Inference parameters for foundation models — AWS Docs

Temperature

Temperature is a generation hyperparameter (typically 0.0–1.0, sometimes higher) that scales the logits before sampling — lower values make the model more deterministic, higher values increase diversity. The right value is task-specific: extractive Q&A and structured output favor low temperature, while creative writing favors higher temperature; on Bedrock the parameter is passed through inferenceConfig in Converse.
Related: Top-P (Nucleus Sampling) · Top-K · Output Token (Max Tokens)
Source: Inference parameters for foundation models — AWS Docs

Top-P (Nucleus Sampling)

Top-P (also called nucleus sampling) restricts the sampling distribution at each step to the smallest set of tokens whose cumulative probability exceeds p, then samples from that set. It is an alternative to (or combined with) Temperature for controlling output diversity; most Bedrock models accept top_p as an inference parameter.
Related: Temperature · Top-K · Token
Source: Inference parameters for foundation models — AWS Docs

Top-K

Top-K limits sampling at each step to the top-K most likely next tokens, regardless of cumulative probability. Not every Bedrock model exposes Top-K (some only accept Top-P), so check the provider's per-model parameter table; when supported, it is set via the model-specific request body on InvokeModel or the additionalModelRequestFields on Converse.
Related: Temperature · Top-P (Nucleus Sampling) · Token
Source: Inference parameters for foundation models — AWS Docs

Token

A Token is the unit of text (or image patch, or audio frame, depending on the model) that a foundation model consumes and emits. Bedrock charges by tokens for most text models, and quotas (TPM), context windows, and Prompt Caching savings are all measured in tokens — making token accounting the most important per-request observable in a Bedrock workload.
Related: Context Window · Output Token (Max Tokens) · TPM (Tokens Per Minute) · Prompt Caching
Source: Tokens and characters in Amazon Bedrock — AWS Docs

Context Window

The Context Window is the maximum number of tokens (input + output combined, depending on model) that a foundation model can attend to in a single inference call. Each model on Bedrock advertises a maximum context window; exceeding it returns an error, and choosing a model is often a tradeoff between context window size, latency, and cost.
Related: Token · Output Token (Max Tokens) · Model Family · Prompt Caching
Source: Supported foundation models in Amazon Bedrock — AWS Docs

Output Token (Max Tokens)

The Output Token cap (often passed as maxTokens in Converse) bounds how many tokens the model is allowed to generate in a single response. It interacts with the model's overall context window — if input + maxTokens exceeds the window, the request fails — and it is the simplest hard guard against runaway generation cost.
Related: Token · Context Window · Stop Sequence · Temperature
Source: Inference parameters for foundation models — AWS Docs

E. Knowledge Bases and RAG

Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases is the managed RAG (Retrieval Augmented Generation) service inside Bedrock that handles ingestion (chunking, embedding, storing), retrieval, and grounded answer generation behind a single API. The service was originally named "Knowledge Bases for Amazon Bedrock" at launch and was renamed to "Amazon Bedrock Knowledge Bases" so that "Amazon Bedrock" sits at the front of the canonical service name.
Related: RAG (Retrieval Augmented Generation) · Embedding Model · Vector Store · Chunking · Retrieval · Reranker
Source: Amazon Bedrock Knowledge Bases — AWS Docs

Embedding

An Embedding is a dense numerical vector that represents the semantic content of a piece of text, image, or other input in a way that semantically similar inputs map to nearby vectors. Bedrock Knowledge Bases generate embeddings during ingestion (one per chunk) and at query time (one for the user question), then compare them via similarity search in a Vector Store.
Related: Embedding Model · Vector Store · Retrieval · Chunking
Source: Embeddings in Amazon Bedrock Knowledge Bases — AWS Docs

Embedding Model

An Embedding Model is a foundation model trained specifically to produce embeddings rather than generative text — for example, Amazon Titan Embeddings and Cohere Embed available on Bedrock. The choice of embedding model fixes the dimensionality of the vectors stored in your Vector Store, so changing the embedding model requires a full re-embedding of the corpus.
Related: Embedding · Vector Store · Foundation Model (FM) · Amazon Bedrock Knowledge Bases
Source: Supported embeddings models — AWS Docs

Vector Store

A Vector Store is the database that holds the embedding vectors and metadata for retrieval. Bedrock Knowledge Bases supports Amazon OpenSearch Serverless, Amazon OpenSearch Managed Clusters, Amazon S3 Vectors, Amazon Aurora PostgreSQL with pgvector, Amazon Neptune Analytics, Pinecone, Redis Enterprise Cloud, and MongoDB Atlas as backing stores; each has different operational and cost characteristics, and the supported list expands over time so confirm current options in the AWS docs before designing the ingestion path.
Related: Embedding · Retrieval · Amazon Bedrock Knowledge Bases
Source: Vector stores for Amazon Bedrock Knowledge Bases — AWS Docs

Chunking

Chunking is the ingestion-time step that splits a source document into smaller text passages (chunks) before embedding, so that retrieval returns focused snippets rather than entire files. Bedrock Knowledge Bases supports fixed-size, semantic, hierarchical, and no-chunking strategies; choosing a strategy is the single largest determinant of retrieval quality for a given corpus.
Related: Retrieval · Embedding · Data Source and Sync Job · Amazon Bedrock Knowledge Bases
Source: Chunking and parsing in Knowledge Bases — AWS Docs

Retrieval

Retrieval is the query-time step in which the Knowledge Base embeds the user question, performs similarity search (and optionally hybrid keyword search) against the Vector Store, and returns the top-K most relevant chunks. The retrieval API in Bedrock is Retrieve; the chained "retrieve and then generate an answer" API is RetrieveAndGenerate.
Related: Metadata Filter (Knowledge Base) · Reranker · Vector Store · RAG (Retrieval Augmented Generation) · Amazon Bedrock Knowledge Bases
Source: Retrieve from a Knowledge Base — AWS Docs

Metadata Filter (Knowledge Base)

A Metadata Filter is a Knowledge Bases retrieval constraint that narrows similarity search to chunks whose attached metadata satisfies a boolean expression (for example, category = "policy" or epoch_modification_time >= 1700000000). Filters are evaluated alongside vector similarity at query time and support comparison operators (equals, greaterThan, in, stringContains, startsWith, and others) combined with andAll / orAll logical operators — making them the standard mechanism for multi-tenant RAG, recency-aware retrieval, and document-class-based access control. Bedrock also supports Implicit Metadata Filtering, in which a Claude model generates the filter automatically from a provided metadata schema.
Related: Retrieval · Vector Store · Amazon Bedrock Knowledge Bases · Data Source and Sync Job
Source: Metadata filtering in Knowledge Bases — AWS Docs

Reranker

A Reranker is an optional second-pass scoring step that takes the top-K retrieval results and re-orders them with a model better-suited to fine-grained relevance judgment than the initial vector similarity. Bedrock Knowledge Bases supports built-in rerankers (for example, Amazon Rerank 1.0 and Cohere Rerank 3.5; confirm the currently available versions in the Bedrock console as reranker model versions are revised over time), and applying one is the most common single intervention for fixing "RAG returns related-but-not-right" failures.
Related: Retrieval · Amazon Bedrock Knowledge Bases · RAG (Retrieval Augmented Generation)
Source: Rerank models in Amazon Bedrock — AWS Docs

RAG (Retrieval Augmented Generation)

RAG is the pattern in which a generative model is grounded on retrieved passages — the model is given the user question plus the top-K retrieved chunks and instructed to answer using only that context. RAG reduces hallucination compared to closed-book generation and is the design that Bedrock Knowledge Bases implements end-to-end.
Related: Amazon Bedrock Knowledge Bases · Retrieval · Reranker · Contextual Grounding Check
Source: Retrieval-augmented generation with Bedrock — AWS Docs

Data Source and Sync Job

A Data Source is the upstream content location (most commonly an Amazon S3 bucket, but also web crawlers, Confluence, SharePoint, and Salesforce) that a Knowledge Base ingests from, and a Sync Job is the asynchronous task that crawls the source, parses, chunks, embeds, and writes vectors into the Vector Store. Sync Jobs are how you refresh a Knowledge Base after the upstream corpus changes.
Related: Chunking · Embedding · Amazon Bedrock Knowledge Bases
Source: Data sources for Amazon Bedrock Knowledge Bases — AWS Docs

Amazon Bedrock Data Automation (BDA)

Amazon Bedrock Data Automation is a managed service that uses generative AI to transform unstructured multimodal content — documents, images, audio, and video — into structured outputs through a single API, with built-in safeguards such as visual grounding and confidence scores. BDA is commonly used as a preprocessing layer in front of Knowledge Bases (turning PDFs, scanned documents, and video transcripts into structured chunks before ingestion), as a standalone IDP (Intelligent Document Processing) surface that returns JSON for downstream business logic, and as a media-analysis engine that classifies scenes and extracts in-video text.
Related: Amazon Bedrock Knowledge Bases · Chunking · Multimodal Input · RAG (Retrieval Augmented Generation)
Source: Amazon Bedrock Data Automation — AWS Docs

F. Guardrails

For a deeper look at how Guardrails compose with network-layer protections, see AWS WAF Patterns for Generative AI and Prompt Injection.

Guardrail

A Guardrail in Bedrock is a named, versioned configuration of safety and policy filters that is applied to a model invocation through Converse, InvokeModel, or a Bedrock Agent. A single Guardrail bundles content filters, denied topics, word filters, sensitive information filters, contextual grounding checks, and prompt attack detection, so a workload can be policy-checked in one round trip instead of stitching filters together.
Related: Content Filter · Denied Topic (Topic Policy) · Word Filter · Sensitive Information Filter · Contextual Grounding Check · Prompt Attack Filter
Source: Amazon Bedrock Guardrails — AWS Docs

Content Filter

A Content Filter in Guardrails detects and blocks (or flags) text that falls into harm categories such as hate, insults, sexual content, violence, misconduct, or prompt-attack indicators, with adjustable strength per category. Content filters apply to both the user input and the model output, so a single Guardrail can shield in both directions.
Related: Guardrail · Denied Topic (Topic Policy) · Word Filter · Prompt Attack Filter
Source: Content filters in Guardrails — AWS Docs

Denied Topic (Topic Policy)

A Denied Topic is a customer-defined topic, described in plain English with example utterances, that the Guardrail will block on either the input or the output. Topic policies are how you encode application-specific rules ("never discuss competitor pricing", "do not provide medical diagnoses") that no generic content filter would catch.
Related: Guardrail · Content Filter · Word Filter
Source: Denied topics in Guardrails — AWS Docs

Word Filter

A Word Filter is the Guardrails component that blocks specified words or short phrases on the input and output side, including managed lists (for example, profanity) and customer-supplied lists. It is a deterministic complement to the probabilistic Content Filter and is the right tool for exact-match prohibitions such as brand names or product codes.
Related: Guardrail · Content Filter · Sensitive Information Filter
Source: Word filters in Guardrails — AWS Docs

Sensitive Information Filter

A Sensitive Information Filter detects PII and other sensitive entities (for example, email, phone number, credit card number, SSN, name) and either blocks or redacts them on the input and output. The filter supports managed PII categories plus customer-defined regex patterns, and is the default mechanism for keeping personal data out of model logs.
Related: Guardrail · Word Filter · Content Filter
Source: Sensitive information filters in Guardrails — AWS Docs

Contextual Grounding Check

A Contextual Grounding Check is a Guardrails component that scores model output for two properties — "grounding" (is the output supported by the provided source documents?) and "relevance" (does it answer the user's question?) — and blocks or flags low-scoring responses. It is the single most useful check to enable when a RAG application is hallucinating despite having a Knowledge Base.
Related: Guardrail · RAG (Retrieval Augmented Generation) · Amazon Bedrock Knowledge Bases
Source: Contextual grounding check in Guardrails — AWS Docs

Prompt Attack Filter

A Prompt Attack Filter detects user inputs that look like prompt injection or jailbreak attempts (for example, "ignore previous instructions") and blocks them before they reach the model. It is one category inside Content Filters, but is called out separately because it requires different tuning than harm-category filters and is the natural complement to network-layer protections such as AWS WAF.
Related: Guardrail · Content Filter · Sensitive Information Filter
Source: Prompt attacks in Guardrails — AWS Docs

G. Bedrock Agents

Bedrock Agent

A Bedrock Agent is a managed Bedrock construct that combines a foundation model, a set of Action Groups (callable tools), an optional Knowledge Base for grounding, and an orchestration loop that decides when to call which tool. Agents handle multi-step reasoning, tool invocation, and follow-up question handling behind a single InvokeAgent API.
Related: Action Group · Orchestration (Agent) · Amazon Bedrock Knowledge Bases · Multi-Agent Collaboration · Tool Use
Source: Amazon Bedrock Agents — AWS Docs

Action Group

An Action Group is the unit by which you attach callable tools to a Bedrock Agent — each Action Group binds an OpenAPI schema (or a parameter-based function-definition style) to a Lambda function (or Return of Control) that implements the action. The Agent reads the schema to know what tools exist and what arguments they accept, then calls the bound Lambda when the model decides to use the tool.
Related: Bedrock Agent · OpenAPI Schema (for Agent) · Orchestration (Agent)
Source: Action groups in Amazon Bedrock Agents — AWS Docs

OpenAPI Schema (for Agent)

The OpenAPI Schema is the JSON or YAML document that declares an Action Group's endpoints, parameters, request bodies, and response shapes. Bedrock uses the schema both as the type definition the model reasons over (to choose arguments) and as the contract enforced when calling the bound Lambda, so an incorrect schema is the most common cause of an Agent's tool calls failing.
Related: Action Group · Bedrock Agent · Function Calling
Source: OpenAPI schema for Action Groups — AWS Docs

Orchestration (Agent)

Orchestration is the reasoning loop a Bedrock Agent runs at invocation time — interpreting the user request, choosing an Action Group to call, calling the bound Lambda, observing the result, and deciding whether to call another tool or return a final answer. Bedrock offers a default orchestration prompt template you can customize, and the trace it emits is the primary debugging surface.
Related: Bedrock Agent · Action Group · Tool Use
Source: Orchestration in Amazon Bedrock Agents — AWS Docs

Multi-Agent Collaboration

Multi-Agent Collaboration is the Bedrock feature in which one Supervisor Agent coordinates several Collaborator Agents — each Collaborator specializes in a domain (for example, billing, scheduling, returns), and the Supervisor routes turns or composes outputs. It is the managed alternative to building an orchestration layer yourself outside Bedrock; for a deep dive on the multi-agent patterns see AgentCore Implementation Guide — Part 4: Multi-Agent.
Related: Supervisor Agent · Collaborator Agent · Bedrock Agent · A2A (Agent-to-Agent) Protocol
Source: Multi-agent collaboration in Amazon Bedrock — AWS Docs

Supervisor Agent

A Supervisor Agent is the top-level Bedrock Agent in a Multi-Agent Collaboration setup; it decides which Collaborator Agent (or sequence of them) should handle a user turn and aggregates their responses. In Bedrock terminology, only the Supervisor receives the user-facing InvokeAgent call.
Related: Multi-Agent Collaboration · Collaborator Agent · Bedrock Agent
Source: Multi-agent collaboration in Amazon Bedrock — AWS Docs

Collaborator Agent

A Collaborator Agent is a domain-specialized Bedrock Agent that a Supervisor Agent delegates work to in a Multi-Agent Collaboration setup. Each Collaborator has its own instructions, Action Groups, and (optionally) Knowledge Base, and is opaque to the end user — only the Supervisor's response is returned.
Related: Supervisor Agent · Multi-Agent Collaboration · Bedrock Agent
Source: Multi-agent collaboration in Amazon Bedrock — AWS Docs

H. Amazon Bedrock AgentCore

If you are new to AgentCore, start with the AgentCore Beginner's Guide; for a hands-on tour of Runtime, Memory, and Code Interpreter, see Implementation Guide — Part 1: Foundation; for Identity and Gateway authentication patterns, see Part 2: Security and Part 3: Infrastructure; for production operations, see AgentCore Production Guide.

AgentCore Runtime

AgentCore Runtime is the Amazon Bedrock AgentCore component that runs agent code (in any framework — Strands Agents, LangGraph, CrewAI, custom) in managed, isolated microVMs with per-session state. Unlike Bedrock Agents (which is a managed orchestration model), AgentCore Runtime is bring-your-own-agent-code, so you control the planner, framework, and tool wiring while AWS handles compute, scaling, and isolation.
Related: AgentCore Memory · AgentCore Gateway · AgentCore Code Interpreter · AgentCore Identity · AgentCore Browser Tool · AgentCore Observability · Bedrock Agent
Source: Amazon Bedrock AgentCore Runtime — AWS Docs

AgentCore Memory

AgentCore Memory is a managed memory service for agents that supports both short-term session memory (per-conversation context) and long-term memory (cross-session facts, preferences, summaries) with built-in extraction strategies. It removes the need for application code to manage Redis, DynamoDB, or a custom vector store for memory.
Related: AgentCore Runtime · AgentCore Gateway · Amazon Bedrock Knowledge Bases
Source: Amazon Bedrock AgentCore Memory — AWS Docs

AgentCore Gateway

AgentCore Gateway turns existing APIs (REST, Lambda, OpenAPI specs) and managed sources into MCP-compatible tools that agents can call without writing custom adapter code. It also handles OAuth/inbound authentication so an agent built on AgentCore Runtime can consume external tools securely.
Related: MCP (Model Context Protocol) · AgentCore Runtime · AgentCore Identity · Tool Use
Source: Amazon Bedrock AgentCore Gateway — AWS Docs

AgentCore Code Interpreter

AgentCore Code Interpreter is a managed sandbox in which an agent can execute Python (and increasingly other languages) and observe the result, enabling data analysis, plotting, file processing, and arithmetic that LLMs cannot do reliably from token-only reasoning. Sandboxes are session-scoped and isolated, so untrusted code cannot exfiltrate data across sessions.
Related: AgentCore Runtime · AgentCore Browser Tool · Tool Use
Source: Amazon Bedrock AgentCore Code Interpreter — AWS Docs

AgentCore Identity

AgentCore Identity is the AgentCore component that brokers human-on-behalf-of and machine-to-machine credentials for agents — for example, exchanging a Cognito JWT for a downstream OAuth token, or vending temporary AWS credentials with scoped permissions. It is the place to encode "what is this agent allowed to do on whose behalf" without baking secrets into the agent code.
Related: AgentCore Gateway · AgentCore Runtime · Guardrail
Source: Amazon Bedrock AgentCore Identity — AWS Docs

AgentCore Browser Tool

AgentCore Browser Tool is a managed headless browser that an agent can drive (navigate, click, fill forms, screenshot) to interact with websites that have no API. It is the AWS-managed equivalent of "browser use" pattern libraries; in Bedrock documentation the official name is AgentCore Browser Tool, sometimes referred to informally as "Browser Use".
Related: AgentCore Code Interpreter · AgentCore Runtime · Tool Use
Source: Amazon Bedrock AgentCore Browser Tool — AWS Docs

AgentCore Observability

AgentCore Observability emits standardized traces, metrics, and logs for agent sessions running on AgentCore Runtime, including per-step latencies, tool-call inputs/outputs, and model usage. It is integrated with Amazon CloudWatch and OpenTelemetry-compatible exporters, so you do not need to instrument each step manually.
Related: AgentCore Runtime · AgentCore Memory
Source: Amazon Bedrock AgentCore Observability — AWS Docs

I. Protocols and Interoperability

MCP (Model Context Protocol)

MCP is an open protocol for connecting tools, resources, and prompts to LLM-powered applications through a uniform server interface — an MCP server exposes capabilities, and any MCP-compatible client (Claude Desktop, AgentCore, custom agents) can consume them. On Bedrock, AgentCore Gateway is the most direct path to exposing existing APIs as MCP tools without writing a server from scratch.
Related: AgentCore Gateway · Tool Use · A2A (Agent-to-Agent) Protocol
Source: Model Context Protocol — Anthropic

A2A (Agent-to-Agent) Protocol

A2A is an emerging open protocol for direct communication between independent agents — letting one agent discover, authenticate to, and delegate work to another agent built by a different team or vendor. On Bedrock, A2A complements Multi-Agent Collaboration (which is a managed pattern inside Bedrock) by extending agent collaboration across organizational boundaries. Because the AWS naming and supported surfaces for A2A continue to evolve, treat any version-specific behavior in this entry as indicative and confirm the current canonical wording in the latest AWS Bedrock documentation.
Related: Multi-Agent Collaboration · MCP (Model Context Protocol) · AgentCore Runtime
Source: A2A protocol — Project page

J. Workflows and Studio

Bedrock Flows

Bedrock Flows is the visual workflow builder inside Bedrock for orchestrating prompts, Knowledge Bases, Agents, Guardrails, Lambda functions, and conditional logic into a single named flow that you invoke via an API. It is the low-code option for chaining Bedrock primitives without writing orchestration code in a Lambda or AgentCore Runtime.
Related: Bedrock Studio · Bedrock Prompt Management · Bedrock Agent · Guardrail
Source: Amazon Bedrock Flows — AWS Docs

Bedrock Studio

Bedrock Studio is a managed authoring environment in which non-developers (subject matter experts, prompt engineers) can build and test Bedrock applications — combining models, Knowledge Bases, Guardrails, and Functions — without managing AWS infrastructure. It is integrated with IAM Identity Center so identity and access are centrally controlled.
Related: Amazon Bedrock IDE · Bedrock Flows · Bedrock Prompt Management · Amazon Bedrock Knowledge Bases
Source: Amazon Bedrock Studio — AWS Docs

Amazon Bedrock IDE

Amazon Bedrock IDE is the evolved authoring surface AWS has introduced alongside Bedrock Studio, providing an integrated environment for selecting models, designing prompts, attaching Knowledge Bases and Guardrails, and composing Flows in a single interface. Because AWS is actively consolidating Bedrock authoring tooling and the boundary between Bedrock Studio, Bedrock IDE, and the broader SageMaker Unified Studio integration continues to shift, treat this entry as a pointer and confirm the canonical authoring surface for your account and Region in the latest AWS documentation before standardizing internal workflows.
Related: Bedrock Studio · Bedrock Flows · Bedrock Prompt Management
Source: Amazon Bedrock User Guide (authoring surfaces) — AWS Docs

Bedrock Prompt Management

Bedrock Prompt Management is the Bedrock feature for storing, versioning, and parameterizing prompts as first-class resources (with variables) that Flows, Agents, and direct API calls can reference by ID. It is how you avoid hard-coding production prompts in application source code, and how you A/B-test prompt revisions without redeploying.
Related: Prompt Engineering · Bedrock Flows · Bedrock Studio
Source: Prompt management in Amazon Bedrock — AWS Docs

K. Customization and Evaluation

Continued Pre-training

Continued Pre-training is a Bedrock customization mode that takes a base foundation model and continues its unsupervised pre-training on your domain corpus, producing a Custom Model that has internalized domain vocabulary and writing style. It is heavier than fine-tuning, requires more data, and is appropriate when prompting alone cannot bridge a large domain gap.
Related: Fine-tuning (on Bedrock) · Custom Model · Base Model · Model Distillation
Source: Continued pre-training in Amazon Bedrock — AWS Docs

Fine-tuning (on Bedrock)

Fine-tuning on Bedrock is a customization mode that takes a base foundation model and adjusts its weights on a supervised dataset (prompt/completion pairs) to produce a Custom Model that follows your task format more closely. Fine-tuned models are invoked through Provisioned Throughput and are the appropriate customization when you have curated examples of the exact output you want.
Related: Custom Model · Base Model · Continued Pre-training · Model Distillation · Provisioned Throughput
Source: Model customization in Amazon Bedrock — AWS Docs

Model Distillation

Model Distillation is a Bedrock customization mode that trains a smaller "student" model to mimic a larger "teacher" model on your domain data, producing a Custom Model that is cheaper and faster to invoke than the teacher while retaining most of its task quality. It is appropriate when you have validated a high-quality but expensive model in production and need to reduce inference cost.
Related: Custom Model · Fine-tuning (on Bedrock) · Provisioned Throughput
Source: Model distillation in Amazon Bedrock — AWS Docs

Custom Model Import

Custom Model Import is a Bedrock customization path in which you upload externally trained weights in a supported open-source architecture (Llama, Mistral, Mixtral, Flan, GPTBigCode, Qwen2 / Qwen2.5 / Qwen3, GPT-OSS, and Mllama vision variants) and register them as a Custom Model on Bedrock without running a Bedrock-side customization job. Imported models are invoked with on-demand throughput through InvokeModel or InvokeModelWithResponseStream in selected Regions (us-east-1, us-east-2, us-west-2, eu-central-1) — making this the path to take when training was already completed in Amazon SageMaker AI, on-premises, or another platform and you only need Bedrock as the inference surface.
Related: Custom Model · Fine-tuning (on Bedrock) · InvokeModel API · On-Demand Inference
Source: Custom Model Import in Amazon Bedrock — AWS Docs

Model Evaluation

Model Evaluation is a Bedrock feature for benchmarking foundation models on your own datasets and metrics — supporting automatic metrics (accuracy, F1, ROUGE, BERTScore), human evaluation workflows, and LLM-as-a-Judge. It is how you pick between candidate models, validate a customization, and continuously regression-test prompts.
Related: LLM-as-a-Judge · Fine-tuning (on Bedrock) · Batch Inference
Source: Model evaluation in Amazon Bedrock — AWS Docs

LLM-as-a-Judge

LLM-as-a-Judge is an evaluation mode in which one foundation model scores the outputs of another (or its own outputs from a previous run) against a rubric — for example, helpfulness, correctness, or style adherence. Bedrock Model Evaluation supports LLM-as-a-Judge as a built-in metric type, which is cheaper than human evaluation and faster than full automatic-metric pipelines.
Related: Model Evaluation · Foundation Model (FM)
Source: LLM-as-a-Judge in Bedrock Model Evaluation — AWS Docs

Watermark

A Watermark on Bedrock is an invisible signal embedded in generated content (most prominently in Amazon Titan / Nova image and video outputs) that AWS-side detectors can later use to identify that the content was AI-generated. Watermarking is an active research area; the exact mechanism, payload, and which surfaces support detection differ by model, so confirm the current per-model coverage in the AWS Bedrock documentation before relying on a specific detection guarantee.
Related: Amazon Nova · Foundation Model (FM) · Model Evaluation
Source: Watermark detection in Amazon Bedrock — AWS Docs

Frequently Asked Questions

What is the difference between Inference Profile and Application Inference Profile?

A standard Inference Profile is an AWS-defined identifier that abstracts a model — most commonly used to enable cross-region inference — and is referenced by the system-defined profile ID. An Application Inference Profile is a customer-defined profile that wraps either a model or a system-defined inference profile with custom tags, so the same workload's spend can be allocated to a specific application, tenant, or business unit in Cost Explorer.

When should I choose Provisioned Throughput over On-Demand?

Choose Provisioned Throughput when (a) you are invoking a Custom Model (most custom models do not support on-demand), (b) your traffic is steady at high utilization so the per-token unit cost beats on-demand, or (c) you need a predictable performance ceiling and cannot afford on-demand throttling during a spike. Stay on On-Demand Inference for spiky, unpredictable, or low-volume workloads.

When should I use Bedrock Agents versus AgentCore Runtime?

Use Bedrock Agents when AWS-managed orchestration (Action Groups + OpenAPI schemas + an out-of-the-box reasoning loop) covers your use case — it is the fastest path to a working agent. Use AgentCore Runtime when you need to bring your own framework (Strands Agents, LangGraph, CrewAI), your own planner, or stateful long-running agents with custom memory; AgentCore gives you full control of the agent code while still providing managed compute, identity, gateways, and observability.

How does Knowledge Bases reduce hallucinations compared to a raw LLM call?

Amazon Bedrock Knowledge Bases implements RAG end-to-end — at query time it retrieves the top-K most relevant chunks from your Vector Store and instructs the model to answer only from that context, rather than from its parametric knowledge. Pairing it with a Reranker (to improve retrieval precision) and a Contextual Grounding Check in a Guardrail (to reject ungrounded outputs) is the standard recipe for production-grade grounded answers.

How do Guardrails compose with model-level safety like Claude's built-in refusals?

Guardrails apply at the Bedrock service layer regardless of which model you invoke, so the same Content Filter / Denied Topic / Sensitive Information Filter / Prompt Attack Filter configuration shields you across model swaps. Provider-side refusals (for example, Anthropic's HHH training) still apply on top — Guardrails are an independent, customer-controlled policy layer, which is why they are the right place to encode application-specific rules.

How do I choose a chunking strategy in Knowledge Bases?

Start with fixed-size chunking (around 300 tokens with roughly 20% overlap) as the baseline because it works well across most prose corpora and is cheap to iterate on. Move to semantic chunking when your documents have clear semantic boundaries the fixed-size strategy is splitting awkwardly, to hierarchical chunking when retrieval needs to span both fine and coarse granularity (for example, long technical manuals), and to no-chunking only when each source document is already short and self-contained. Always pair the choice with an evaluation set; Chunking strategy is the single largest determinant of RAG quality.

References

References:
Tech Blog with curated related content

Written by Hidekazu Konishi