AWS Service Quotas - A Practical Cheat Sheet for Major AWS Services | hidekazu-konishi.com
AWS Service Quotas - A Practical Cheat Sheet for Major AWS Services
First Published:
Last Updated:
This article is a single-page numeric reference of the AWS service quotas (formerly known as service limits) that production engineers, capacity planners, and SREs hit most often in real workloads. It is intentionally not a complete catalog of every quota in every service — that catalog already exists in the AWS Service Quotas Console. Instead, this article curates the 5 to 15 quotas per service that actually matter when you size a workload, design for blast radius, or write a quota-increase request.
All numbers in this article are the default quotas as of 2026-05 and assume a standard commercial AWS region (us-east-1 / us-west-2 / eu-west-1). Quotas that AWS frequently revises, or that vary per account due to age and usage history, are labeled Account-specific (verify in console) so that you re-confirm the live value before designing a change.
This article is intentionally text-and-table only — no diagrams. Numeric references should be skimmable and searchable, not pictorial.
Service Quotas (the public name AWS adopted to replace the older term "service limits") are the per-account, per-region (or per-resource) ceilings that AWS enforces on its APIs and resources. They exist for three overlapping reasons: to protect the underlying control plane from runaway consumption, to give AWS a forecasting signal for capacity provisioning, and to give customers a soft-fail boundary instead of a cliff.
In day-to-day work, three patterns repeat:
A team finishes a load test in dev and is surprised when prod rejects the same call — because the prod account is newer and has lower default On-Demand vCPU and Lambda concurrent execution quotas than the older dev account.
An incident postmortem cites "service limit reached" — Kinesis shard count, EventBridge target count, KMS request rate, and CloudFormation resources-per-stack are the classics here, and three of those four are adjustable but were never adjusted.
A capacity planning spreadsheet uses the maximum theoretical throughput of a service — but the per-account default is a fraction of that maximum, and the gap is the lead time for a quota-increase request.
This cheat sheet exists so that all three problems can be resolved from a single page, with each quota labeled Adjustable or Non-adjustable so you can immediately answer "can we raise this in time?"
For deeper service-specific design treatments — particularly around concurrency, single-table design, and key/prefix scaling — see the related internal references at the bottom of this article.
2. How to Use This Cheat Sheet
The intended workflow is:
Find the service in section 4 (Service-by-Service Quotas). Services are grouped by category — Compute, Container, Storage, Database, Network, AI/ML, Integration, Security, Observability, Management.
Read the quota row. Each table has five columns: Quota, Default, Scope, Adjustable, Notes. Hover or scan the Notes column for the gotcha — for example, S3's request rate of 3,500 PUT/COPY/POST/DELETE per second is per prefix, not per bucket, which changes the entire layout strategy.
If the workload depends on a value near the default, open the AWS Service Quotas Console (aws-region.console.aws.amazon.com/servicequotas/home) and confirm the live, account-specific value. A cell labeled Account-specific (verify in console) in this article means the value drifts between accounts or has been revised by AWS recently, so this article does not commit to a single default.
If a quota is Adjustable: Yes, plan the request. Most adjustable quotas are processed in hours but some (regional vCPU pools, KMS request rates above 30,000 RPS, SES sending limits) can take 2 to 14 business days, and account managers can prioritize for production launches if you cite a date and a workload.
If a quota is Adjustable: No, it is a hard architectural constraint. Design around it. Examples: DynamoDB item size 400 KB, SQS message size 256 KB, Lambda function timeout 900 seconds, S3 object size 5 TB. These will never be raised by a support ticket.
A short legend on scope:
Account: a single global ceiling per AWS account (e.g. IAM users, S3 buckets — though S3's bucket count is now soft and adjustable).
Per-resource: bounded per individual resource (e.g. items per DynamoDB table partition, parts per S3 multipart upload, rules per security group).
Per-API: a TPS or request-rate ceiling on a specific API operation (e.g. KMS Decrypt requests per second).
3. Quota Categories — Five Axes for Reasoning
Every AWS quota falls along five axes simultaneously. Understanding the axes is the difference between filing the right support case and the wrong one.
3.1 Adjustable vs. Non-adjustable
Adjustable (also called "soft") quotas are ceilings AWS chose for safety, not for technical reasons. They can be raised via a support case, an AWS account team request, or programmatically through the Service Quotas API. Most account-level quotas — vCPUs, Lambda concurrency, EBS storage, IAM roles, EventBridge rules — are adjustable.
Non-adjustable (also called "hard") quotas are technical or architectural ceilings of the underlying system. They will not be raised even for the largest customers. Examples:
DynamoDB item size: 400 KB.
SQS / SNS message body: 256 KB.
Lambda function timeout: 900 seconds (15 min).
S3 object size: 5 TB.
Step Functions Standard execution history: 25,000 events.
When you read a quota row, the Adjustable column is the first thing to look at. If it says No, your design choices are constrained.
3.2 Account-level vs. Region-level vs. Per-resource
Account-level quotas are global to the AWS account. The classic examples are IAM users (5,000), IAM roles (1,000, adjustable), Organizations OUs and accounts, and Service Control Policies.
Region-level quotas are the most common kind. EC2 vCPUs, Lambda concurrency, DynamoDB tables per region, VPC count — all enforced per (account, region) pair.
Per-resource quotas bound a specific resource. Parts in an S3 multipart upload (10,000), rules per security group (60+60 default), GSIs per DynamoDB table (20), subnets per VPC (200), targets per EventBridge rule (5), resources per CloudFormation stack (500).
When you scale horizontally — sharding across accounts via Organizations or across regions via active-active — you are typically buying yourself more copies of a region-level quota pool, not raising an account-level one.
3.3 Per-API and Throttle (Rate) Quotas
A second class of region-level quotas governs API request rate rather than resource count. These are usually expressed as TPS or requests per second:
KMS Decrypt, GenerateDataKey, Encrypt: 5,500 to 30,000 RPS depending on key spec and region.
API Gateway: 10,000 RPS account-level steady-state, 5,000 burst.
EventBridge PutEvents: 10,000 events per second per region (us-east-1, us-west-2, eu-west-1; lower in other regions).
CloudWatch PutMetricData: 150 TPS per region.
Rate quotas are often the first quotas to bite at scale because they multiply with traffic, not with resource count. A workload that pre-creates resources well under the resource-count limits can still hit a per-API rate limit at peak.
3.4 Per-Partition / Per-Prefix Quotas
A subtler category exists in storage and serverless services: quotas applied at a sub-resource level that the customer has indirect control over.
S3 request rate: 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix. The bucket has effectively unlimited aggregate throughput, but each prefix has a ceiling, so the bucket's effective throughput is a function of key design. See Amazon S3 Object Key Design Best Practices for the full treatment.
DynamoDB partition throughput: 3,000 RCU / 1,000 WCU per partition. A hot partition exhausts its throughput even when the table-level provisioned throughput is far from its quota. See Amazon DynamoDB Single-Table Design Guide for partition-key design.
Lambda burst concurrency: account-level burst limit of 500 to 3,000 concurrent executions at the moment of a traffic surge (depending on region), independent of the steady-state concurrency quota. After the burst, each function scales at +1,000 concurrent executions every 10 seconds.
These quotas often appear "phantom" because the high-level metric is fine but a sub-bucket of it is saturated.
3.5 Snapshot vs. Aggregate Quotas
Lastly, distinguish between a quota measured at an instant (snapshot) and a quota measured over a window (aggregate).
Snapshot: number of running EC2 instances, count of active S3 multipart uploads, currently allocated Elastic IPs, in-flight messages in an SQS queue.
Aggregate: requests per second to an API, events per second to EventBridge, sent messages per day in SES.
Adjustment lead times differ. Snapshot quotas often increase the moment AWS processes the case; aggregate (rate) quotas may require a 24- to 72-hour warm-up.
4. Service-by-Service Quotas
Forty-five services follow, grouped by category. Each table lists 5 to 15 quotas — the ones that actually constrain production. Service Quotas Console contains hundreds more per service; the values selected here are the ones you reach for in capacity reviews and incident analysis.
4.1 Compute
Amazon EC2
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) vCPUs
5 vCPUs
Region
Yes
New-account default is intentionally low; production accounts typically raised to 64 to several thousand. The number is vCPUs, not instances.
Running On-Demand G and VT vCPUs
Account-specific (verify in console)
Region
Yes
GPU and video-transcode instance families have their own per-family vCPU pool, often 0 in new accounts.
Running On-Demand P vCPUs
Account-specific (verify in console)
Region
Yes
Same pattern — P-family (high-end GPU) has its own pool, requires explicit increase.
All Spot Instance Requests (vCPUs)
5 vCPUs
Region
Yes
Spot has independent vCPU pools per family, mirroring On-Demand.
Elastic IP addresses
5
Region
Yes
Counts both attached and unattached. Unattached EIPs accrue charges.
EBS-backed snapshots
100,000
Region
Yes
Stack-up effect over time; revisit lifecycle policies before requesting.
AMIs
50,000
Region
Yes
Pruning policy strongly recommended; old AMIs hold snapshots.
Security groups per VPC
2,500
Per-VPC
Yes
Used to be 500; the cap was raised in 2020. Inbound + outbound rules per SG combine.
Rules per security group
60 inbound + 60 outbound
Per-SG
Yes
Can be raised up to 1,000 each, but rule count × SG-per-ENI is a hard product limit (1,000).
EC2-Classic
Not available
—
—
Fully retired 2022-08-15. Mentioned here because legacy diagrams still reference it.
AWS Lambda
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Concurrent executions
1,000
Region
Yes
Account-level pool shared across all functions. Reserved concurrency carves out a guaranteed slice.
Burst concurrency
500 to 3,000 concurrent executions
Region
No
Region-specific instantaneous burst limit. After the burst, each function scales at +1,000 concurrent executions every 10 seconds (changed from the older +500/minute account-level rate in December 2023). See AWS Lambda Cold Start Mitigation Guide.
Function memory
128 MB to 10,240 MB
Per-function
No
Memory determines vCPU allocation linearly. Above 1,769 MB you get more than 1 full vCPU.
Function timeout
900 seconds (15 min)
Per-function
No
Hard ceiling. For longer work, use Step Functions or ECS/Fargate Tasks.
Deployment package size (zipped, direct upload)
50 MB
Per-function
No
S3-deployed zips can be larger but unzipped size still caps at 250 MB.
Deployment package size (unzipped)
250 MB
Per-function
No
Zip deployments only. Container image deployments allow up to 10 GB.
Container image size
10 GB
Per-function
No
Lambda pulls from ECR; first cold start with a 10 GB image is noticeably slower.
Environment variables size
4 KB
Per-function
No
Total of all env vars including keys. Larger config should live in Parameter Store / Secrets Manager.
Layers per function
5
Per-function
No
Layer total unzipped size + function unzipped size must be ≤ 250 MB.
Ephemeral /tmp storage
512 MB to 10,240 MB
Per-function
No
Per-invocation scratch space. Configurable up to 10 GB; reused across warm invocations.
Function URL request timeout
900 seconds
Per-function
No
Matches function timeout. Streaming responses supported.
Payload size (sync)
6 MB
Per-invocation
No
Both request and response. Async invocation has its own limit (256 KB).
Payload size (async)
256 KB
Per-invocation
No
Async payload is queued via internal SQS-like buffer, hence the 256 KB ceiling.
AWS Fargate
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Fargate On-Demand vCPU resource count (ECS)
Account-specific (verify in console)
Region
Yes
Defaults vary by region and account age. Often 6 to 100 in new accounts.
Fargate Spot vCPU resource count (ECS)
Account-specific (verify in console)
Region
Yes
Separate pool from On-Demand.
Task CPU
0.25 to 16 vCPU
Per-task
No
Allowed: 0.25, 0.5, 1, 2, 4, 8, 16 vCPU. Memory is bounded by CPU choice.
Task memory
0.5 to 120 GB
Per-task
No
Memory increments allowed depend on the chosen CPU bucket.
Task ephemeral storage
20 GB to 200 GB
Per-task
No
Configurable per task definition. Default 20 GB.
AWS Batch
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Compute environments per Region
50
Region
Yes
Plan separate compute environments per workload class.
Job queues per Region
50
Region
Yes
Queues bind to compute environments by priority.
Job definitions per Region
Account-specific (verify in console)
Region
Yes
Versioned; old versions are retained unless explicitly deregistered.
Jobs in SUBMITTED state per queue
1,000,000
Per-queue
Yes
Large pipelines should batch-submit and watch for throttling.
Array job size
10,000 child jobs
Per-array-job
No
Hard cap on parallel array job size.
4.2 Container
Amazon ECS
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Clusters per Region
10,000
Region
Yes
Most workloads stay under 50; one cluster per environment is typical.
Services per cluster
5,000
Per-cluster
Yes
Cluster split if approaching.
Tasks launched per service (ECS Service)
5,000
Per-service
Yes
Tied to desired count.
Tasks per cluster
5,000
Per-cluster
Yes
Across all services and standalone tasks.
Container instances per cluster (EC2 launch type)
5,000
Per-cluster
Yes
Not applicable for Fargate launch type.
Task definition size
64 KB
Per-task-definition
No
JSON document hard cap.
Containers per task definition
10
Per-task
No
Sidecar patterns (logging, service mesh) consume this quickly.
Amazon EKS
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Clusters per Region
100
Region
Yes
One cluster per environment is the common pattern, multi-tenant via namespaces.
Managed node groups per cluster
30
Per-cluster
Yes
Separate node groups for taints / instance families.
Nodes per managed node group
450
Per-node-group
Yes
Hard scaling limit per group, not per cluster.
Pods per node
VPC-CNI / IP limits
Per-node
No
Bounded by ENI count and IPs per ENI; varies by instance type. Use prefix delegation to relax.
Fargate profiles per cluster
10
Per-cluster
Yes
Selector limit (5 per profile) often forces multiple profiles.
Amazon ECR
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Repositories per Region
10,000
Region
Yes
Pattern: one repo per micro-service; large fleets approach this.
Images per repository
10,000
Per-repository
Yes
Configure lifecycle policy to prune old tags.
Maximum image size
10 GiB
Per-image
No
Same as Lambda container image cap.
Image layer size
10 GiB
Per-layer
No
Single layer hard cap.
Rate of image pull (ECR Private)
Account-specific (verify in console)
Region
Yes
Pulls do throttle at scale; VPC endpoint and image caching mitigate.
4.3 Storage
Amazon S3
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Buckets per account
10,000
Account
Yes
Was 100 for many years; raised significantly. Approach with caution — most designs need far fewer.
Object size
5 TB
Per-object
No
Hard ceiling. Single PUT max 5 GB; above that requires multipart upload.
Multipart upload — parts
10,000
Per-upload
No
Part size 5 MB to 5 GB (last part may be smaller).
Multipart upload — part size
5 MB to 5 GB
Per-part
No
Last part exempted from 5 MB minimum.
PUT/COPY/POST/DELETE rate
3,500 RPS per prefix
Per-prefix
No (scales automatically)
Bucket scales horizontally as you add prefixes. See object key design.
GET/HEAD rate
5,500 RPS per prefix
Per-prefix
No (scales automatically)
Aggregate bucket throughput is effectively unlimited if keys are sharded.
Bucket policy size
20 KB
Per-bucket
No
JSON hard cap. Consider IAM policies or Access Points for complex auth.
Access Points per Region per account
10,000
Region
Yes
Per-bucket access patterns can be split out via Access Points.
S3 Object Lambda Access Points per account per Region
1,000
Region
Yes
Used when transforming objects on retrieval.
Lifecycle rules per bucket
1,000
Per-bucket
No
Stack pattern: tier S3 Standard → IA → Glacier consumes 2 to 3 rules.
Amazon EBS
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Total storage for gp3 volumes (TiB)
50
Region
Yes
New-account default; production typically raised to several hundred TiB.
Total storage for io2 volumes (TiB)
20
Region
Yes
io2 reserved for critical DB workloads.
Snapshots per Region
100,000
Region
Yes
Lifecycle Manager strongly recommended at scale.
Volume size (gp3 / gp2 / io1 / io2)
16 TiB
Per-volume
No
io2 Block Express supports up to 64 TiB.
IOPS — gp3
3,000 baseline, 16,000 max
Per-volume
No
Provisionable up to 16,000.
IOPS — io2 Block Express
256,000
Per-volume
No
Highest single-volume IOPS in EBS.
Amazon EFS
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
File systems per Region
1,000
Region
Yes
Most workloads use 1 to 10 file systems.
Mount targets per file system
1 per AZ
Per-file-system
No
One mount target per AZ regardless of subnet count in that AZ.
Access Points per file system
1,000
Per-file-system
Yes
Used for per-tenant POSIX root and UID/GID enforcement.
REST: Yes (up to 300 seconds via Service Quotas) / HTTP: No
Since November 2024, REST API integration timeout is configurable up to 300 seconds (5 minutes) via a Service Quotas request. HTTP APIs remain capped at 30 seconds. For longer work, use async patterns or Step Functions.
Payload size
10 MB
Per-request
No
Both directions.
Resources per REST API
300
Per-API
Yes
—
Routes per HTTP API
300
Per-API
Yes
—
Stages per API
10
Per-API
Yes
—
Usage plans per account
300
Account
Yes
—
AWS Global Accelerator
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Accelerators per account
20
Account
Yes
—
Listeners per accelerator
10
Per-accelerator
Yes
—
Endpoint groups per listener
10
Per-listener
Yes
One per Region typically.
Endpoints per endpoint group
10
Per-group
Yes
—
Custom routing accelerators
Account-specific (verify in console)
Account
Yes
—
4.6 AI / ML
Amazon Bedrock
* You can sort the table by clicking on the column name.
Bedrock quotas are model-, modality-, and region-specific, and AWS revises them more often than most services. Numbers below are illustrative defaults; always confirm against the Service Quotas Console for the exact model ID and region before sizing.
Quota
Default
Scope
Adjustable
Notes
InvokeModel requests per minute (per model)
Account-specific (verify in console)
Region
Yes
Per-model RPM. Anthropic Claude models have separate quotas from Nova, Llama, Cohere, etc.
InvokeModel tokens per minute (per model)
Account-specific (verify in console)
Region
Yes
Cross-region inference profiles aggregate across regions.
Provisioned model units per account
Account-specific (verify in console)
Region
Yes
Required for deterministic throughput.
Custom models per account
Account-specific (verify in console)
Region
Yes
Imported and fine-tuned models share quota in many regions.
Knowledge bases per account
Account-specific (verify in console)
Region
Yes
Bedrock Knowledge Bases.
Agents per account
Account-specific (verify in console)
Region
Yes
Bedrock Agents.
Guardrails per account
Account-specific (verify in console)
Region
Yes
Bedrock Guardrails policies.
Maximum prompt size
Model-dependent
Per-request
No
Claude Sonnet 4.x: 200K context. Nova Pro: 300K context. Confirm per model.
Amazon SageMaker
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Endpoints per Region
Account-specific (verify in console)
Region
Yes
Usually around 100 to several hundred; varies by instance type.
Endpoint configurations per Region
Account-specific (verify in console)
Region
Yes
—
Notebook instances per Region
Account-specific (verify in console)
Region
Yes
SageMaker AI Studio domains have a separate ceiling.
Training jobs concurrent per Region
Account-specific (verify in console)
Region
Yes
Per-instance-type concurrent training caps.
Model artifact size
5 GB
Per-model
No
For built-in algorithms; bring-your-own can be larger via ECR.
Maximum batch transform job duration
28 days
Per-job
No
—
Amazon Comprehend
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
DetectEntities sync TPS
20 TPS
Region
Yes
—
DetectSentiment sync TPS
20 TPS
Region
Yes
—
Custom classification models per Region
10
Region
Yes
—
Custom entity recognition models per Region
10
Region
Yes
—
Document size (single request)
5,000 bytes
Per-request
No
Use async batch jobs for larger documents.
Amazon Textract
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
DetectDocumentText sync TPS
10 TPS
Region
Yes
—
AnalyzeDocument sync TPS
2 TPS
Region
Yes
Tables / Forms / Queries features have separate sub-quotas.
StartDocumentTextDetection async jobs
600 concurrent
Region
Yes
—
Maximum document size (sync, image)
10 MB / 5 MB (PDF)
Per-request
No
PDF supported only async > 5 MB.
Maximum pages per async document
3,000
Per-document
No
—
4.7 Integration
Amazon SNS
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Topics per account
100,000
Region
Yes
—
Subscriptions per topic
12.5 million
Per-topic
No
—
Subscriptions per account
200 million
Region
No
—
Message body size
256 KB
Per-message
No
SNS Extended Library uses S3 to send up to 2 GB by reference.
FIFO topic message throughput
300 TPS without batching, 3,000 with batching
Per-topic
No
—
Standard topic message throughput
~30,000 TPS (US, EU regions)
Region
Yes
Lower defaults in other regions.
Filter policies per subscription
5
Per-subscription
No
—
Amazon SQS
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Queues per account per Region
1,000,000
Region
No
Effectively unlimited.
Message body size
256 KB
Per-message
No
Extended Library to S3 for >256 KB up to 2 GB.
Visibility timeout
0 to 12 hours
Per-message
No
Default 30 seconds.
Message retention
1 minute to 14 days
Per-queue
No
Default 4 days.
Delay seconds
0 to 900 (15 min)
Per-message or queue
No
—
Inflight messages (Standard queue)
120,000
Per-queue
No
Inflight = received but not deleted.
Inflight messages (FIFO queue)
20,000
Per-queue
No
—
Throughput (Standard queue)
Effectively unlimited
Per-queue
No
Scales horizontally with traffic.
Throughput (FIFO queue, high-throughput mode)
9,000 TPS (us-east-1, us-west-2, eu-west-1) without batching
Per-queue
No
Other regions: 3,000 TPS. Batching multiplies by 10.
Amazon EventBridge
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Event buses per account
100
Region
Yes
Includes default bus, partner buses, and custom buses.
Rules per event bus
300
Per-bus
Yes
—
Targets per rule
5
Per-rule
No
Hard ceiling.
Event size
256 KB
Per-event
No
JSON document including envelope.
PutEvents requests per second
10,000 RPS (us-east-1, us-west-2, eu-west-1)
Region
Yes
Lower in other regions, often 400 to 2,400.
Throttled events (DLQ)
Configurable
Per-rule
No
Failed deliveries routed to SQS DLQ if configured.
Archive size
Unlimited
Per-archive
—
Pricing applies per GB stored and replayed.
Schemas per registry
1,000
Per-registry
Yes
EventBridge Schemas / Schema Registry.
Amazon Kinesis Data Streams
* You can sort the table by clicking on the column name.
API request rate — asymmetric Encrypt / Decrypt / Sign
500 to 1,000 RPS
Region
Yes
RSA / ECC keys are CPU-intensive.
Encrypt input size (symmetric)
4,096 bytes (4 KB)
Per-call
No
For larger data, encrypt with a data key locally.
Encrypt input size (asymmetric)
~190 to 470 bytes
Per-call
No
Depends on key spec (RSA-2048 / RSA-3072 / RSA-4096).
Key policy size
32 KB
Per-CMK
No
—
AWS WAF
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Web ACLs per account per Region
100
Region
Yes
CloudFront Web ACLs sit in us-east-1 (Global).
WCUs per Web ACL
1,500
Per-Web-ACL
Yes
Web ACL Capacity Units; each rule consumes WCUs.
WCUs per rule group
1,500
Per-rule-group
Yes
WAF v2 rule group capacity is WCU-bound. Per-rule WCU cost ranges from 1 to several dozen depending on match type and text transformations. Request a WCU increase via the Service Quotas Console.
Rule groups per Web ACL
20
Per-Web-ACL
No
—
IP sets per account per Region
100
Region
Yes
—
IP addresses per IP set
10,000
Per-IP-set
Yes
—
Maximum body inspection size (CloudFront)
64 KB
Per-request
No
For ALB / API Gateway: 8 KB default, adjustable to 64 KB.
AWS Shield Advanced
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Protected resources per account
1,000
Account
Yes
Shield Standard is free and automatic; Shield Advanced is paid.
Health-based detection
Configurable
Per-resource
—
Route 53 health checks integrate.
Maximum custom mitigation
Account-specific (verify in console)
Account
Yes
SRT (Shield Response Team) can apply ad-hoc rules.
AWS Secrets Manager
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Secrets per account per Region
500,000
Region
Yes
—
Secret value size
65,536 bytes (64 KB)
Per-secret
No
Both ciphertext and plaintext.
GetSecretValue TPS
10,000 RPS
Region
Yes
Use a local cache for very high read volume.
Versions per secret
~100
Per-secret
No
Old versions deprecated after 24 hours unless labeled.
4.9 Observability
Amazon CloudWatch
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Custom metrics per account
Effectively unlimited
Region
No
Cost scales with metric count and resolution.
Metric resolution (high resolution)
1 second
Per-metric
No
1-second metrics retained 3 hours, then aggregated.
Alarms per Region
5,000
Region
Yes
—
Composite alarms per account
500
Account
Yes
—
PutMetricData TPS
150 TPS
Region
Yes
Soft. Use batch (up to 1,000 metrics per call) to reduce TPS.
Log groups per Region
1,000,000
Region
Yes
—
Log group retention
1 day to 10 years (or Never expire)
Per-log-group
No
Default "Never expire" — set retention to control cost.
Log streams per log group
Effectively unlimited
Per-log-group
No
—
PutLogEvents per stream
5 RPS (default)
Per-stream
Yes
—
Maximum log event size
256 KB
Per-event
No
—
Subscription filters per log group
2
Per-log-group
No
Hard limit. For fan-out, route via a Lambda or Firehose subscriber that re-publishes.
Dashboards per account
500
Region
Yes
Each dashboard up to 500 widgets.
AWS X-Ray
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Traces per second
Default sampling 1 req/sec + 5%
Per-app
Yes
Sampling rules configurable.
Trace document size
500 KB
Per-trace
No
—
Segment document size
64 KB
Per-segment
No
—
Trace retention
30 days
Region
No
Hard.
Groups per account per Region
25
Region
Yes
—
Insights enabled per group
Configurable
Per-group
—
X-Ray Insights detects anomalies in traffic patterns.
4.10 Management
AWS CloudFormation
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Stacks per Region
2,000
Region
Yes
—
Resources per stack
500
Per-stack
No
Was 200 for years; raised to 500. Use nested stacks above this.
Parameters per stack
200
Per-stack
No
—
Outputs per stack
200
Per-stack
No
—
Mappings per template
200
Per-template
No
—
Template body size (direct)
51,200 bytes (50 KB)
Per-template
No
API call limit.
Template body size (via S3)
1 MB
Per-template
No
Reference template via TemplateURL.
Number of stack sets per administrator account
100
Account
Yes
StackSets fan out across Organization.
Operations per StackSet
1,000 concurrent
Per-StackSet
Yes
Throttle by region / account.
AWS Organizations
* You can sort the table by clicking on the column name.
Quota
Default
Scope
Adjustable
Notes
Member accounts per organization
10
Organization
Yes
Adjustable to thousands via support case. Some customers run 5,000+ accounts.
OUs per organization
1,000
Organization
No
—
OU nesting depth
5 levels
Organization
No
Plus Root = 6 total.
Service Control Policies (SCPs) per organization
1,000
Organization
No
—
SCPs attached to a single entity (account / OU / root)
5
Per-entity
No
—
SCP document size
5,120 characters
Per-SCP
No
JSON document hard cap.
Backup policies per organization
10
Organization
No
Organization-level Backup policies.
Tag policies per organization
10
Organization
No
—
5. Quotas Most Frequently Hit in Production
The list below is a ranked digest of quotas that, in my experience and from customer postmortems, are the ones engineering teams most often hit during launches, traffic spikes, and incidents. Mark these and pre-check them before scaling events.
* You can sort the table by clicking on the column name.
Rank
Service
Quota
Default
Why it bites
1
EC2
Running On-Demand Standard vCPUs
5 vCPUs (new accounts)
Surprise during scale-out tests. New accounts often blocked at <100 vCPUs.
2
Lambda
Concurrent executions
1,000
Burst events (S3 event, EventBridge fan-out) saturate before steady-state warms.
3
DynamoDB
Partition throughput (1,000 WCU / 3,000 RCU)
Per partition
Hot partition exhausts even when table-level is far from cap.
4
S3
PUT/POST/COPY/DELETE rate per prefix
3,500 RPS
Sequential keys serialize to one prefix; rehash strategy fixes.
5
API Gateway
Integration timeout
29 sec (REST default, up to 300 sec) / 30 sec (HTTP, hard)
Long-running backend. REST adjustable via Service Quotas; HTTP remains hard.
6
API Gateway
Steady-state RPS
10,000 RPS
Account-level, shared across all APIs.
7
KMS
Symmetric Decrypt RPS
30,000 RPS (us-east-1) / 5,500 to 15,000 (others)
Per-region throttle hits at burst (sign-in storms, batch decrypt).
8
EventBridge
PutEvents RPS
10,000 (us-east-1, us-west-2, eu-west-1)
Lower in other regions (400 to 2,400).
9
Step Functions
Execution history events (Standard)
25,000 events
Loops or large Maps blow the cap.
10
CloudWatch
PutMetricData TPS
150 TPS
EMF logs at high cardinality saturate.
11
CloudFormation
Resources per stack
500
Hard; forces nested stacks.
12
EC2
Elastic IPs per Region
5
Multi-AZ NAT + reservations exhaust quickly.
13
VPC
VPCs per Region
5
Multi-environment in one account hits this.
14
SQS
Inflight messages (Standard queue)
120,000
Slow consumer causes inflight to stack.
15
RDS
DB instances per Region
40
Microservice-per-DB pattern hits early.
16
Lambda
Burst concurrency
500 to 3,000 concurrent executions (region-dependent)
Cold start storm at deploy or after dormant period. Post-burst scaling now +1,000 concurrent per 10 seconds, per function.
17
CloudFront
Lambda@Edge function size (viewer)
1 MB
Pulled in many SDKs cross the size cap.
18
IAM
Managed policies attached to a role
10 (adjustable to 20)
Permission-set sprawl hits quickly.
19
Organizations
SCPs attached to a single entity
5
Layered guardrails approach 5 quickly.
20
Bedrock
InvokeModel RPM per model
Varies by model / region
Production launches blocked until per-model quota raised. Plan 7+ days lead time.
A common pattern across these quotas: the issue is rarely "AWS can't scale here" — it is "AWS does not auto-scale this quota for you and you forgot to ask in advance." For the adjustable rows, set Service Quotas Console utilization alarms at 80%.
6. How to Request Quota Increases
The mechanics of getting an adjustable quota raised changed substantially in 2019 with the introduction of the unified Service Quotas service and again in subsequent years with finer-grained per-service quota lists. As of 2026-05, three paths exist.
6.1 Path 1 — Service Quotas Console (most quotas)
The fastest path for most adjustable quotas:
Open the Service Quotas Console for the target Region.
Search for the quota by name or service.
Click Request quota increase, enter the new value, and submit.
Many quotas auto-approve within minutes (especially EC2 vCPU for established accounts and low-multiple increases).
Larger increases (10x or more) escalate to a human reviewer; SLA typically 1 to 5 business days.
You can also do this via API: aws service-quotas request-service-quota-increase --service-code <code> --quota-code <code> --desired-value <n>.
6.2 Path 2 — Support case (legacy quotas, complex changes)
Some service-specific limits still live in legacy support categories rather than Service Quotas. SES sending limit, Route 53 domain registration limit, and certain regional EC2 capacity pools fall in this group.
Open Support Center → Create case.
Choose Service limit increase, then the service and Region.
Provide workload description, expected steady-state, and peak.
SLA depends on support plan tier. Business and Enterprise plans typically respond within hours; Basic / Developer support may take days.
6.3 Path 3 — Account team escalation (large or time-sensitive)
For launches that need quota increases in multiple services simultaneously, large multi-account fleets, or quotas near the regional cap, work with your AWS Technical Account Manager (TAM) or Solutions Architect (SA):
Submit a structured launch plan: services, current quotas, desired quotas, expected go-live date.
Account teams can coordinate parallel approvals across teams (EC2 capacity, KMS, Bedrock, Lambda concurrency).
Lead time: plan 7 to 14 business days for cross-service launches. Bedrock model quotas in particular benefit from this path.
6.4 Strategies before requesting
Before filing a request, exhaust the architectural options:
Shard across accounts (Organizations + Account Vending) to multiply region-level quotas.
Shard across Regions (multi-region active-active) to multiply Region-level quotas.
Use reserved concurrency (Lambda) to carve a guaranteed slice without raising the account ceiling.
Use Provisioned Throughput (Bedrock, DynamoDB) where deterministic capacity is a tighter constraint than per-region quotas.
Use Service-Linked Roles and managed policies instead of large inline policies to avoid per-role IAM policy size caps.
Move config out of environment variables (Lambda 4 KB cap) into Parameter Store or Secrets Manager.
When you do file, always include expected steady-state RPS, expected peak RPS, peak duration, and go-live date. AWS reviewers approve faster when the numbers are concrete than when the request reads "we might grow."
6.5 Quotas you can NOT raise
Be realistic: roughly one-third of the quotas in this article are hard. Filing a support case for these wastes everyone's time. The hard quotas you should design around — never raise — include:
DynamoDB item size 400 KB
S3 object size 5 TB
SQS / SNS message size 256 KB
Lambda function timeout 900 seconds
Lambda payload (sync) 6 MB
Step Functions execution history 25,000 events
API Gateway HTTP API integration timeout 30 seconds (REST is now adjustable up to 300 seconds)
CloudFormation resources per stack 500
IAM inline policy size (per role) 10,240 characters
KMS symmetric Encrypt input 4 KB
Aurora cluster readers 15
For these, the design pattern is to compose smaller calls (Step Functions Map, Lambda chaining, multipart upload) or to externalize state to S3.
7. Frequently Asked Questions
7.1 Where are quotas different between Regions?
Compute and rate quotas are usually larger in us-east-1, us-west-2, and eu-west-1 than elsewhere. Examples:
KMS symmetric Decrypt: 30,000 RPS in those three Regions; 5,500 to 15,000 RPS elsewhere.
SNS Publish: ~30,000 TPS in those Regions; lower elsewhere.
EventBridge PutEvents: 10,000 RPS in those Regions; 400 to 2,400 RPS elsewhere.
SQS FIFO high-throughput: 9,000 TPS in those Regions; 3,000 TPS elsewhere.
When you deploy in less-trafficked Regions (e.g. ap-northeast-1 for some quotas, eu-central-1, ap-south-1), check Region-specific defaults before sizing.
7.2 Are quotas the same in all accounts of an Organization?
No. Each member account has its own per-Region quotas, defaulted to the new-account values. Establishing a new account does not inherit increases from the management account or sibling accounts. Use AWS Control Tower account-baseline automation or service-quotas request-service-quota-increase in account-vending pipelines.
7.3 Are quotas the same for AWS Free Tier accounts?
Yes — the quota architecture is identical; Free Tier only affects pricing. New accounts (free or paid) share the same conservative defaults.
7.4 How do I check my account's current quota for a specific service?
Console: Service Quotas Console → choose service → filter by quota name.
Note that some quotas are not yet onboarded to Service Quotas API. For those, use the Trusted Advisor "Service Limits" check (Business / Enterprise support) or the service-specific console.
7.5 How quickly do quota changes take effect?
Auto-approved increases: minutes.
Reviewed increases (Service Quotas): 1 to 5 business days.
Cross-service launch increases (TAM-coordinated): 7 to 14 business days.
Hard quotas: never.
7.6 Can I set alerts when I am approaching a quota?
Yes. Service Quotas integrates with CloudWatch — for each adjustable quota that supports monitoring, you can enable CloudWatch usage metrics and create alarms (e.g. alarm at 80% of quota). Combine with EventBridge to notify an SRE channel.
then compare against GetMetricStatistics for the relevant CloudWatch usage metric (AWS/Usage namespace).
7.7 Why is the Bedrock model RPM quota so low?
Bedrock model quotas are intentionally conservative because the underlying inference capacity is shared and pricing is per-token. AWS sizes these quotas based on aggregate region capacity and revises them often. For production workloads, you will almost certainly need to file an increase request 7+ days before launch. Cross-region inference profiles (model IDs prefixed with us. or eu., such as us.anthropic.claude-sonnet-4-6-...-v1:0) aggregate quotas across multiple Regions and are the preferred high-throughput path.
7.8 What is the difference between a quota and a throttle?
A quota is a ceiling enforced over a region / account / resource over a period (snapshot or aggregate). Going over yields LimitExceededException, ServiceQuotaExceededException, or service-specific error codes.
A throttle is a short-term rate limit, often token-bucket based, that returns ThrottlingException / Rate exceeded and is generally retryable with backoff.
Many adjustable quotas are technically throttle policies (KMS request rates, EventBridge RPS, API Gateway RPS). The distinction matters for retry strategy: throttles are transient and should be retried with exponential backoff and jitter; hard quota violations should fail closed and alert.
7.9 Do quotas apply differently to AWS-internal callers (e.g. EventBridge → Lambda)?
Yes, in some cases. EventBridge invocations to Lambda do count toward Lambda's concurrent execution limit but use a separate retry / DLQ mechanism. S3 events to Lambda likewise count. However, AWS services calling each other (e.g. CloudWatch Logs → Lambda for subscription filters) often use service-linked roles and may have their own internal back-pressure independent of customer quotas.
7.10 What changes the most often?
In my observation, in roughly this order:
Bedrock model RPM / TPM (revised multiple times per year as inference capacity scales).
DynamoDB on-demand initial throughput (revised upward over time).
CloudFormation resources per stack (was 200, raised to 500).
DynamoDB GSIs per table (was 5, raised to 20).
S3 buckets per account (was 100, raised significantly).
Treat any quota value older than 12 months in a third-party blog with suspicion — including this one. The Service Quotas Console is always authoritative.
8. Summary
This article condensed the AWS Service Quotas surface area into a single-page reference for the major AWS services that production engineers touch most. The principles to internalize:
Default ≠ maximum. Most adjustable quotas can be raised; "we hit a limit" usually means "we did not ask in advance."
Per-prefix, per-partition, per-tunnel quotas hide inside per-region quotas. Look one level deeper than the headline number.
Hard quotas are design constraints. DynamoDB 400 KB, Lambda 900 sec, SQS 256 KB, Step Functions 25,000 events, CloudFormation 500 resources — design around them.
Regions are not equal. us-east-1, us-west-2, eu-west-1 carry higher per-Region rate quotas than other Regions.
Lead time matters. 7 to 14 business days for cross-service launches; the time to file the case is the day you start the project, not the day before launch.
The Service Quotas Console is the source of truth. This article is the index.