AWS Service Quotas - A Practical Cheat Sheet for Major AWS Services

First Published: 2026-05-16
Last Updated: 2026-05-16

This article is a single-page numeric reference of the AWS service quotas (formerly known as service limits) that production engineers, capacity planners, and SREs hit most often in real workloads. It is intentionally not a complete catalog of every quota in every service — that catalog already exists in the AWS Service Quotas Console. Instead, this article curates the 5 to 15 quotas per service that actually matter when you size a workload, design for blast radius, or write a quota-increase request.

All numbers in this article are the default quotas as of 2026-05 and assume a standard commercial AWS region (us-east-1 / us-west-2 / eu-west-1). Quotas that AWS frequently revises, or that vary per account due to age and usage history, are labeled Account-specific (verify in console) so that you re-confirm the live value before designing a change.

This article is intentionally text-and-table only — no diagrams. Numeric references should be skimmable and searchable, not pictorial.

1. Overview — Why a Service Quotas Cheat Sheet
2. How to Use This Cheat Sheet
3. Quota Categories — Five Axes for Reasoning
4. Service-by-Service Quotas
5. Quotas Most Frequently Hit in Production
6. How to Request Quota Increases
7. Frequently Asked Questions
8. Summary
9. References

1. Overview — Why a Service Quotas Cheat Sheet

Service Quotas (the public name AWS adopted to replace the older term "service limits") are the per-account, per-region (or per-resource) ceilings that AWS enforces on its APIs and resources. They exist for three overlapping reasons: to protect the underlying control plane from runaway consumption, to give AWS a forecasting signal for capacity provisioning, and to give customers a soft-fail boundary instead of a cliff.

In day-to-day work, three patterns repeat:

A team finishes a load test in dev and is surprised when prod rejects the same call — because the prod account is newer and has lower default On-Demand vCPU and Lambda concurrent execution quotas than the older dev account.
An incident postmortem cites "service limit reached" — Kinesis shard count, EventBridge target count, KMS request rate, and CloudFormation resources-per-stack are the classics here, and three of those four are adjustable but were never adjusted.
A capacity planning spreadsheet uses the maximum theoretical throughput of a service — but the per-account default is a fraction of that maximum, and the gap is the lead time for a quota-increase request.

This cheat sheet exists so that all three problems can be resolved from a single page, with each quota labeled Adjustable or Non-adjustable so you can immediately answer "can we raise this in time?"

For deeper service-specific design treatments — particularly around concurrency, single-table design, and key/prefix scaling — see the related internal references at the bottom of this article.

2. How to Use This Cheat Sheet

The intended workflow is:

Find the service in section 4 (Service-by-Service Quotas). Services are grouped by category — Compute, Container, Storage, Database, Network, AI/ML, Integration, Security, Observability, Management.
Read the quota row. Each table has five columns: Quota, Default, Scope, Adjustable, Notes. Hover or scan the Notes column for the gotcha — for example, S3's request rate of 3,500 PUT/COPY/POST/DELETE per second is per prefix, not per bucket, which changes the entire layout strategy.
If the workload depends on a value near the default, open the AWS Service Quotas Console (aws-region.console.aws.amazon.com/servicequotas/home) and confirm the live, account-specific value. A cell labeled Account-specific (verify in console) in this article means the value drifts between accounts or has been revised by AWS recently, so this article does not commit to a single default.
If a quota is Adjustable: Yes, plan the request. Most adjustable quotas are processed in hours but some (regional vCPU pools, KMS request rates above 30,000 RPS, SES sending limits) can take 2 to 14 business days, and account managers can prioritize for production launches if you cite a date and a workload.
If a quota is Adjustable: No, it is a hard architectural constraint. Design around it. Examples: DynamoDB item size 400 KB, SQS message size 256 KB, Lambda function timeout 900 seconds, S3 object size 5 TB. These will never be raised by a support ticket.

A short legend on scope:

Account: a single global ceiling per AWS account (e.g. IAM users, S3 buckets — though S3's bucket count is now soft and adjustable).
Region: per-account, per-region (e.g. Lambda concurrent executions, EC2 vCPUs, DynamoDB tables).
Per-resource: bounded per individual resource (e.g. items per DynamoDB table partition, parts per S3 multipart upload, rules per security group).
Per-API: a TPS or request-rate ceiling on a specific API operation (e.g. KMS Decrypt requests per second).

3. Quota Categories — Five Axes for Reasoning

Every AWS quota falls along five axes simultaneously. Understanding the axes is the difference between filing the right support case and the wrong one.

3.1 Adjustable vs. Non-adjustable

Adjustable (also called "soft") quotas are ceilings AWS chose for safety, not for technical reasons. They can be raised via a support case, an AWS account team request, or programmatically through the Service Quotas API. Most account-level quotas — vCPUs, Lambda concurrency, EBS storage, IAM roles, EventBridge rules — are adjustable.

Non-adjustable (also called "hard") quotas are technical or architectural ceilings of the underlying system. They will not be raised even for the largest customers. Examples:

DynamoDB item size: 400 KB.
SQS / SNS message body: 256 KB.
Lambda function timeout: 900 seconds (15 min).
S3 object size: 5 TB.
Step Functions Standard execution history: 25,000 events.

When you read a quota row, the Adjustable column is the first thing to look at. If it says No, your design choices are constrained.

3.2 Account-level vs. Region-level vs. Per-resource

Account-level quotas are global to the AWS account. The classic examples are IAM users (5,000), IAM roles (1,000, adjustable), Organizations OUs and accounts, and Service Control Policies.
Region-level quotas are the most common kind. EC2 vCPUs, Lambda concurrency, DynamoDB tables per region, VPC count — all enforced per (account, region) pair.
Per-resource quotas bound a specific resource. Parts in an S3 multipart upload (10,000), rules per security group (60+60 default), GSIs per DynamoDB table (20), subnets per VPC (200), targets per EventBridge rule (5), resources per CloudFormation stack (500).

When you scale horizontally — sharding across accounts via Organizations or across regions via active-active — you are typically buying yourself more copies of a region-level quota pool, not raising an account-level one.

3.3 Per-API and Throttle (Rate) Quotas

A second class of region-level quotas governs API request rate rather than resource count. These are usually expressed as TPS or requests per second:

KMS Decrypt, GenerateDataKey, Encrypt: 5,500 to 30,000 RPS depending on key spec and region.
API Gateway: 10,000 RPS account-level steady-state, 5,000 burst.
EventBridge PutEvents: 10,000 events per second per region (us-east-1, us-west-2, eu-west-1; lower in other regions).
CloudWatch PutMetricData: 150 TPS per region.

Rate quotas are often the first quotas to bite at scale because they multiply with traffic, not with resource count. A workload that pre-creates resources well under the resource-count limits can still hit a per-API rate limit at peak.

3.4 Per-Partition / Per-Prefix Quotas

A subtler category exists in storage and serverless services: quotas applied at a sub-resource level that the customer has indirect control over.

S3 request rate: 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix. The bucket has effectively unlimited aggregate throughput, but each prefix has a ceiling, so the bucket's effective throughput is a function of key design. See Amazon S3 Object Key Design Best Practices for the full treatment.
DynamoDB partition throughput: 3,000 RCU / 1,000 WCU per partition. A hot partition exhausts its throughput even when the table-level provisioned throughput is far from its quota. See Amazon DynamoDB Single-Table Design Guide for partition-key design.
Lambda burst concurrency: account-level burst limit of 500 to 3,000 concurrent executions at the moment of a traffic surge (depending on region), independent of the steady-state concurrency quota. After the burst, each function scales at +1,000 concurrent executions every 10 seconds.

These quotas often appear "phantom" because the high-level metric is fine but a sub-bucket of it is saturated.

3.5 Snapshot vs. Aggregate Quotas

Lastly, distinguish between a quota measured at an instant (snapshot) and a quota measured over a window (aggregate).

Snapshot: number of running EC2 instances, count of active S3 multipart uploads, currently allocated Elastic IPs, in-flight messages in an SQS queue.
Aggregate: requests per second to an API, events per second to EventBridge, sent messages per day in SES.

Adjustment lead times differ. Snapshot quotas often increase the moment AWS processes the case; aggregate (rate) quotas may require a 24- to 72-hour warm-up.

4. Service-by-Service Quotas

Forty-five services follow, grouped by category. Each table lists 5 to 15 quotas — the ones that actually constrain production. Service Quotas Console contains hundreds more per service; the values selected here are the ones you reach for in capacity reviews and incident analysis.

4.1 Compute

Amazon EC2

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) vCPUs	5 vCPUs	Region	Yes	New-account default is intentionally low; production accounts typically raised to 64 to several thousand. The number is vCPUs, not instances.
Running On-Demand G and VT vCPUs	Account-specific (verify in console)	Region	Yes	GPU and video-transcode instance families have their own per-family vCPU pool, often 0 in new accounts.
Running On-Demand P vCPUs	Account-specific (verify in console)	Region	Yes	Same pattern — P-family (high-end GPU) has its own pool, requires explicit increase.
All Spot Instance Requests (vCPUs)	5 vCPUs	Region	Yes	Spot has independent vCPU pools per family, mirroring On-Demand.
Elastic IP addresses	5	Region	Yes	Counts both attached and unattached. Unattached EIPs accrue charges.
EBS-backed snapshots	100,000	Region	Yes	Stack-up effect over time; revisit lifecycle policies before requesting.
AMIs	50,000	Region	Yes	Pruning policy strongly recommended; old AMIs hold snapshots.
Security groups per VPC	2,500	Per-VPC	Yes	Used to be 500; the cap was raised in 2020. Inbound + outbound rules per SG combine.
Rules per security group	60 inbound + 60 outbound	Per-SG	Yes	Can be raised up to 1,000 each, but rule count × SG-per-ENI is a hard product limit (1,000).
EC2-Classic	Not available	—	—	Fully retired 2022-08-15. Mentioned here because legacy diagrams still reference it.

AWS Lambda

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Concurrent executions	1,000	Region	Yes	Account-level pool shared across all functions. Reserved concurrency carves out a guaranteed slice.
Burst concurrency	500 to 3,000 concurrent executions	Region	No	Region-specific instantaneous burst limit. After the burst, each function scales at +1,000 concurrent executions every 10 seconds (changed from the older +500/minute account-level rate in December 2023). See AWS Lambda Cold Start Mitigation Guide.
Function memory	128 MB to 10,240 MB	Per-function	No	Memory determines vCPU allocation linearly. Above 1,769 MB you get more than 1 full vCPU.
Function timeout	900 seconds (15 min)	Per-function	No	Hard ceiling. For longer work, use Step Functions or ECS/Fargate Tasks.
Deployment package size (zipped, direct upload)	50 MB	Per-function	No	S3-deployed zips can be larger but unzipped size still caps at 250 MB.
Deployment package size (unzipped)	250 MB	Per-function	No	Zip deployments only. Container image deployments allow up to 10 GB.
Container image size	10 GB	Per-function	No	Lambda pulls from ECR; first cold start with a 10 GB image is noticeably slower.
Environment variables size	4 KB	Per-function	No	Total of all env vars including keys. Larger config should live in Parameter Store / Secrets Manager.
Layers per function	5	Per-function	No	Layer total unzipped size + function unzipped size must be ≤ 250 MB.
Ephemeral /tmp storage	512 MB to 10,240 MB	Per-function	No	Per-invocation scratch space. Configurable up to 10 GB; reused across warm invocations.
Function URL request timeout	900 seconds	Per-function	No	Matches function timeout. Streaming responses supported.
Payload size (sync)	6 MB	Per-invocation	No	Both request and response. Async invocation has its own limit (256 KB).
Payload size (async)	256 KB	Per-invocation	No	Async payload is queued via internal SQS-like buffer, hence the 256 KB ceiling.

AWS Fargate

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Fargate On-Demand vCPU resource count (ECS)	Account-specific (verify in console)	Region	Yes	Defaults vary by region and account age. Often 6 to 100 in new accounts.
Fargate Spot vCPU resource count (ECS)	Account-specific (verify in console)	Region	Yes	Separate pool from On-Demand.
Task CPU	0.25 to 16 vCPU	Per-task	No	Allowed: 0.25, 0.5, 1, 2, 4, 8, 16 vCPU. Memory is bounded by CPU choice.
Task memory	0.5 to 120 GB	Per-task	No	Memory increments allowed depend on the chosen CPU bucket.
Task ephemeral storage	20 GB to 200 GB	Per-task	No	Configurable per task definition. Default 20 GB.

AWS Batch

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Compute environments per Region	50	Region	Yes	Plan separate compute environments per workload class.
Job queues per Region	50	Region	Yes	Queues bind to compute environments by priority.
Job definitions per Region	Account-specific (verify in console)	Region	Yes	Versioned; old versions are retained unless explicitly deregistered.
Jobs in SUBMITTED state per queue	1,000,000	Per-queue	Yes	Large pipelines should batch-submit and watch for throttling.
Array job size	10,000 child jobs	Per-array-job	No	Hard cap on parallel array job size.

4.2 Container

Amazon ECS

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Clusters per Region	10,000	Region	Yes	Most workloads stay under 50; one cluster per environment is typical.
Services per cluster	5,000	Per-cluster	Yes	Cluster split if approaching.
Tasks launched per service (ECS Service)	5,000	Per-service	Yes	Tied to desired count.
Tasks per cluster	5,000	Per-cluster	Yes	Across all services and standalone tasks.
Container instances per cluster (EC2 launch type)	5,000	Per-cluster	Yes	Not applicable for Fargate launch type.
Task definition size	64 KB	Per-task-definition	No	JSON document hard cap.
Containers per task definition	10	Per-task	No	Sidecar patterns (logging, service mesh) consume this quickly.

Amazon EKS

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Clusters per Region	100	Region	Yes	One cluster per environment is the common pattern, multi-tenant via namespaces.
Managed node groups per cluster	30	Per-cluster	Yes	Separate node groups for taints / instance families.
Nodes per managed node group	450	Per-node-group	Yes	Hard scaling limit per group, not per cluster.
Pods per node	VPC-CNI / IP limits	Per-node	No	Bounded by ENI count and IPs per ENI; varies by instance type. Use prefix delegation to relax.
Fargate profiles per cluster	10	Per-cluster	Yes	Selector limit (5 per profile) often forces multiple profiles.

Amazon ECR

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Repositories per Region	10,000	Region	Yes	Pattern: one repo per micro-service; large fleets approach this.
Images per repository	10,000	Per-repository	Yes	Configure lifecycle policy to prune old tags.
Maximum image size	10 GiB	Per-image	No	Same as Lambda container image cap.
Image layer size	10 GiB	Per-layer	No	Single layer hard cap.
Rate of image pull (ECR Private)	Account-specific (verify in console)	Region	Yes	Pulls do throttle at scale; VPC endpoint and image caching mitigate.

4.3 Storage

Amazon S3

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Buckets per account	10,000	Account	Yes	Was 100 for many years; raised significantly. Approach with caution — most designs need far fewer.
Object size	5 TB	Per-object	No	Hard ceiling. Single PUT max 5 GB; above that requires multipart upload.
Multipart upload — parts	10,000	Per-upload	No	Part size 5 MB to 5 GB (last part may be smaller).
Multipart upload — part size	5 MB to 5 GB	Per-part	No	Last part exempted from 5 MB minimum.
PUT/COPY/POST/DELETE rate	3,500 RPS per prefix	Per-prefix	No (scales automatically)	Bucket scales horizontally as you add prefixes. See object key design.
GET/HEAD rate	5,500 RPS per prefix	Per-prefix	No (scales automatically)	Aggregate bucket throughput is effectively unlimited if keys are sharded.
Bucket policy size	20 KB	Per-bucket	No	JSON hard cap. Consider IAM policies or Access Points for complex auth.
Access Points per Region per account	10,000	Region	Yes	Per-bucket access patterns can be split out via Access Points.
S3 Object Lambda Access Points per account per Region	1,000	Region	Yes	Used when transforming objects on retrieval.
Lifecycle rules per bucket	1,000	Per-bucket	No	Stack pattern: tier S3 Standard → IA → Glacier consumes 2 to 3 rules.

Amazon EBS

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Total storage for gp3 volumes (TiB)	50	Region	Yes	New-account default; production typically raised to several hundred TiB.
Total storage for io2 volumes (TiB)	20	Region	Yes	io2 reserved for critical DB workloads.
Snapshots per Region	100,000	Region	Yes	Lifecycle Manager strongly recommended at scale.
Volume size (gp3 / gp2 / io1 / io2)	16 TiB	Per-volume	No	io2 Block Express supports up to 64 TiB.
IOPS — gp3	3,000 baseline, 16,000 max	Per-volume	No	Provisionable up to 16,000.
IOPS — io2 Block Express	256,000	Per-volume	No	Highest single-volume IOPS in EBS.

Amazon EFS

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
File systems per Region	1,000	Region	Yes	Most workloads use 1 to 10 file systems.
Mount targets per file system	1 per AZ	Per-file-system	No	One mount target per AZ regardless of subnet count in that AZ.
Access Points per file system	1,000	Per-file-system	Yes	Used for per-tenant POSIX root and UID/GID enforcement.
Maximum file size	47.9 TiB	Per-file	No	52,673,613,135,872 bytes per file (≈ 47.9 TiB / ≈ 52.7 TB decimal). NFSv4.1 protocol limit.
Throughput (Bursting mode)	Scales with stored data	Per-file-system	No	Baseline 50 MiB/s per TiB stored; burst up to 100 MiB/s per TiB (minimum 100 MiB/s). Switch to Provisioned or Elastic mode for higher.

Amazon FSx

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
FSx for Lustre — file systems per Region	100	Region	Yes	HPC workloads cluster many small file systems per job.
FSx for Windows — file systems per Region	100	Region	Yes	SMB protocol; integrates with AD.
FSx for OpenZFS — file systems per Region	100	Region	Yes	—
FSx for NetApp ONTAP — file systems per Region	Account-specific (verify in console)	Region	Yes	—
Maximum file system size (Lustre)	Account-specific (verify in console)	Per-file-system	—	Varies by deployment type (Scratch 1 / Scratch 2 / Persistent 1 / Persistent 2).

Amazon S3 Glacier (Vaults, legacy API)

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Vaults per Region	1,000	Region	Yes	Glacier vault API is legacy; prefer S3 Glacier storage classes for new designs.
Archive size	40 TB	Per-archive	No	Hard ceiling for single archive via vault API.
Archive retrieval time (Standard)	3 to 5 hours	Per-job	No	Bulk: 5 to 12 hours; Expedited: 1 to 5 minutes (subject to capacity).
Job description size	1 KB	Per-job	No	Custom job parameters.

4.4 Database

Amazon RDS

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
DB instances per Region	40	Region	Yes	Includes Aurora cluster members.
Manual DB snapshots	100	Region	Yes	Excludes automated backups.
Total storage for DB instances (TiB)	100	Region	Yes	Across all engines combined.
Read replicas per DB instance	5 (MySQL / MariaDB / PostgreSQL)	Per-instance	No	Aurora supports 15 readers per cluster instead.
Parameter groups	100	Region	Yes	Across all engine versions.
Maximum DB storage size — General Purpose SSD (MySQL, MariaDB, PostgreSQL)	64 TiB	Per-instance	No	SQL Server caps lower (16 TiB).
Tags per resource	50	Per-resource	No	Common across most AWS services.

Amazon Aurora

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
DB clusters per Region	40	Region	Yes	Counts toward the 40 RDS DB instance ceiling indirectly.
Instances per cluster	16	Per-cluster	No	1 writer + 15 readers maximum.
Cluster storage size	128 TiB	Per-cluster	No	Auto-scaling; you pay for what you use.
Aurora Global Database — secondary Regions	5	Per-cluster	No	Plus 1 primary = 6 Regions total.
Aurora Serverless v2 — ACU range	0.5 to 256 ACUs	Per-instance	No	Auto-scaling capacity unit. v1 used different model.

Amazon DynamoDB

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Tables per Region	2,500	Region	Yes	Was 256 for years; revised upward. Single-table design typically needs <10.
Item size	400 KB	Per-item	No	Includes attribute names and values. Hard ceiling.
Partition key length	2,048 bytes	Per-key	No	UTF-8 encoded.
Sort key length	1,024 bytes	Per-key	No	UTF-8 encoded.
Global Secondary Indexes (GSI) per table	20	Per-table	Yes	Was 5, then raised. See DynamoDB Key Design GSI/LSI Dictionary.
Local Secondary Indexes (LSI) per table	5	Per-table	No	Hard, must be declared at table-create time.
Provisioned table throughput (per Region)	40,000 RCU / 40,000 WCU	Region	Yes	Account-level default ceiling shared across tables.
Provisioned table throughput (per table)	40,000 RCU / 40,000 WCU	Per-table	Yes	Per-table cap independent of account ceiling.
On-demand initial table throughput	4,000 WCU / 12,000 RCU	Per-table	No	Doubles automatically up to twice previous peak.
Partition throughput	1,000 WCU / 3,000 RCU	Per-partition	No	Hot partitions cap here even if table cap unused. See single-table design.
Transaction batch size	100 actions / 4 MB	Per-call	No	TransactWriteItems / TransactGetItems.
BatchWriteItem batch size	25 items / 16 MB	Per-call	No	—
BatchGetItem batch size	100 items / 16 MB	Per-call	No	—
Query / Scan result page size	1 MB	Per-page	No	Paginate via LastEvaluatedKey.

Amazon ElastiCache

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Nodes per Region (across all clusters)	300	Region	Yes	Combined ElastiCache (Redis / Valkey / Memcached) ceiling.
Nodes per cluster (Redis cluster mode disabled)	6 (1 primary + 5 replicas)	Per-cluster	No	Cluster mode enabled has different limits.
Shards per cluster (Redis cluster mode enabled)	500	Per-cluster	Yes	Each shard has 1 primary and up to 5 replicas.
Replicas per shard	5	Per-shard	No	—
Manual snapshots per account per Region	150	Region	Yes	—

Amazon MemoryDB for Redis

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Clusters per Region	50	Region	Yes	—
Shards per cluster	500	Per-cluster	Yes	—
Replicas per shard	5	Per-shard	No	—
Snapshots per account per Region	Account-specific (verify in console)	Region	Yes	—

Amazon DocumentDB

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Clusters per Region	40	Region	Yes	—
Instances per cluster	16	Per-cluster	No	1 primary + 15 read replicas.
Storage per cluster	128 TiB	Per-cluster	No	Auto-scaling.
Document size	16 MB	Per-document	No	MongoDB-compatible BSON cap.
Index size	2,048 bytes	Per-index-entry	No	—

Amazon Neptune

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Clusters per Region	40	Region	Yes	—
Instances per cluster	16	Per-cluster	No	1 writer + 15 readers.
Storage per cluster	128 TiB	Per-cluster	No	Auto-scaling.
Concurrent loader jobs	1	Per-cluster	No	Loader is single-threaded per cluster.
Query timeout	120 seconds default	Per-query	Yes	Adjustable via cluster parameter group up to several thousand seconds.

4.5 Network

Amazon VPC

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
VPCs per Region	5	Region	Yes	Multi-account / multi-region designs are the long-term answer.
Subnets per VPC	200	Per-VPC	Yes	Three-tier (public / private / data) × 3 AZ = 9 typically.
Route tables per VPC	200	Per-VPC	Yes	Includes the main route table.
Routes per route table	50	Per-route-table	Yes	Can be raised to 1,000 but BGP-propagated routes count toward this.
Network ACLs per VPC	200	Per-VPC	Yes	—
Rules per network ACL	20 inbound + 20 outbound	Per-NACL	Yes	Can be raised, but evaluation order matters.
Internet gateways per Region	5	Region	Yes	Tied to VPC count.
NAT gateways per AZ	5	Per-AZ	Yes	One NAT gateway scales to 45 Gbps; usually you want 1 per AZ.
Elastic Network Interfaces per Region	5,000	Region	Yes	Lambda VPC attaches consume ENIs.
VPC peering connections per VPC	50	Per-VPC	Yes	Transit Gateway scales better above ~10 VPCs.

AWS Transit Gateway

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Transit gateways per account per Region	5	Region	Yes	One per Region is typical.
Attachments per TGW	5,000	Per-TGW	Yes	VPC / VPN / Direct Connect Gateway / Peering attachments combined.
Routes per TGW route table	10,000	Per-route-table	No	Hard ceiling.
Bandwidth per VPC attachment	50 Gbps burst	Per-attachment	No	VPC attachments scale up to 50 Gbps.
Bandwidth per VPN attachment	1.25 Gbps per tunnel	Per-tunnel	No	ECMP doubles this with 2 tunnels.

AWS PrivateLink (VPC Endpoints)

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Interface VPC endpoints per VPC	50	Per-VPC	Yes	See PrivateLink and VPC Endpoints Complete Guide.
Gateway VPC endpoints per VPC	20	Per-VPC	Yes	S3 and DynamoDB only.
VPC endpoint services per Region	50	Region	Yes	For providing your own service via PrivateLink.
Connections per VPC endpoint service	50	Per-service	Yes	Each consumer VPC counts as one connection.
Endpoint policy size	20,480 characters	Per-endpoint	No	JSON hard cap.

Amazon Route 53

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Hosted zones per account	500	Account	Yes	—
Records per hosted zone	10,000	Per-zone	Yes	Can be raised; rare.
Health checks per account	200	Account	Yes	—
Traffic policies per account	50	Account	Yes	—
Domains per account (Registrar)	20	Account	Yes	Registration is separate from DNS hosting.
Resolver endpoints per Region	4	Region	Yes	2 inbound + 2 outbound typical.

Amazon CloudFront

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Distributions per account	200	Account	Yes	—
Origins per distribution	25	Per-distribution	Yes	Multi-origin failover patterns approach this.
Cache behaviors per distribution	25	Per-distribution	Yes	One default + up to 24 path-pattern behaviors.
Alternate domain names (CNAMEs) per distribution	100	Per-distribution	Yes	—
Request rate per distribution	250,000 RPS	Per-distribution	Yes	Soft, raised on request for known launches.
Data transfer rate per distribution	150 Gbps	Per-distribution	Yes	—
File size	30 GB	Per-file	No	Use S3 multipart and signed URLs for larger transfers.
Lambda@Edge function size	1 MB (viewer-request / viewer-response) / 50 MB (origin-request / origin-response)	Per-function	No	Viewer events have a tight package cap; origin events allow larger bundles.

Amazon API Gateway

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Account-level steady-state request rate (REST API)	10,000 RPS	Region	Yes	Across all APIs in the account / Region.
Account-level burst (REST API)	5,000 requests	Region	Yes	Token-bucket burst before steady-state.
HTTP API request rate	10,000 RPS	Region	Yes	HTTP APIs have their own pool.
WebSocket connections per Region	500,000 concurrent	Region	Yes	—
Integration timeout	29 seconds (REST default), 30 seconds (HTTP)	Per-integration	REST: Yes (up to 300 seconds via Service Quotas) / HTTP: No	Since November 2024, REST API integration timeout is configurable up to 300 seconds (5 minutes) via a Service Quotas request. HTTP APIs remain capped at 30 seconds. For longer work, use async patterns or Step Functions.
Payload size	10 MB	Per-request	No	Both directions.
Resources per REST API	300	Per-API	Yes	—
Routes per HTTP API	300	Per-API	Yes	—
Stages per API	10	Per-API	Yes	—
Usage plans per account	300	Account	Yes	—

AWS Global Accelerator

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Accelerators per account	20	Account	Yes	—
Listeners per accelerator	10	Per-accelerator	Yes	—
Endpoint groups per listener	10	Per-listener	Yes	One per Region typically.
Endpoints per endpoint group	10	Per-group	Yes	—
Custom routing accelerators	Account-specific (verify in console)	Account	Yes	—

4.6 AI / ML

Amazon Bedrock

* You can sort the table by clicking on the column name.

Bedrock quotas are model-, modality-, and region-specific, and AWS revises them more often than most services. Numbers below are illustrative defaults; always confirm against the Service Quotas Console for the exact model ID and region before sizing.

Quota	Default	Scope	Adjustable	Notes
InvokeModel requests per minute (per model)	Account-specific (verify in console)	Region	Yes	Per-model RPM. Anthropic Claude models have separate quotas from Nova, Llama, Cohere, etc.
InvokeModel tokens per minute (per model)	Account-specific (verify in console)	Region	Yes	Cross-region inference profiles aggregate across regions.
Provisioned model units per account	Account-specific (verify in console)	Region	Yes	Required for deterministic throughput.
Custom models per account	Account-specific (verify in console)	Region	Yes	Imported and fine-tuned models share quota in many regions.
Knowledge bases per account	Account-specific (verify in console)	Region	Yes	Bedrock Knowledge Bases.
Agents per account	Account-specific (verify in console)	Region	Yes	Bedrock Agents.
Guardrails per account	Account-specific (verify in console)	Region	Yes	Bedrock Guardrails policies.
Maximum prompt size	Model-dependent	Per-request	No	Claude Sonnet 4.x: 200K context. Nova Pro: 300K context. Confirm per model.

Amazon SageMaker

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Endpoints per Region	Account-specific (verify in console)	Region	Yes	Usually around 100 to several hundred; varies by instance type.
Endpoint configurations per Region	Account-specific (verify in console)	Region	Yes	—
Notebook instances per Region	Account-specific (verify in console)	Region	Yes	SageMaker AI Studio domains have a separate ceiling.
Training jobs concurrent per Region	Account-specific (verify in console)	Region	Yes	Per-instance-type concurrent training caps.
Model artifact size	5 GB	Per-model	No	For built-in algorithms; bring-your-own can be larger via ECR.
Maximum batch transform job duration	28 days	Per-job	No	—

Amazon Comprehend

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
DetectEntities sync TPS	20 TPS	Region	Yes	—
DetectSentiment sync TPS	20 TPS	Region	Yes	—
Custom classification models per Region	10	Region	Yes	—
Custom entity recognition models per Region	10	Region	Yes	—
Document size (single request)	5,000 bytes	Per-request	No	Use async batch jobs for larger documents.

Amazon Textract

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
DetectDocumentText sync TPS	10 TPS	Region	Yes	—
AnalyzeDocument sync TPS	2 TPS	Region	Yes	Tables / Forms / Queries features have separate sub-quotas.
StartDocumentTextDetection async jobs	600 concurrent	Region	Yes	—
Maximum document size (sync, image)	10 MB / 5 MB (PDF)	Per-request	No	PDF supported only async > 5 MB.
Maximum pages per async document	3,000	Per-document	No	—

4.7 Integration

Amazon SNS

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Topics per account	100,000	Region	Yes	—
Subscriptions per topic	12.5 million	Per-topic	No	—
Subscriptions per account	200 million	Region	No	—
Message body size	256 KB	Per-message	No	SNS Extended Library uses S3 to send up to 2 GB by reference.
FIFO topic message throughput	300 TPS without batching, 3,000 with batching	Per-topic	No	—
Standard topic message throughput	~30,000 TPS (US, EU regions)	Region	Yes	Lower defaults in other regions.
Filter policies per subscription	5	Per-subscription	No	—

Amazon SQS

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Queues per account per Region	1,000,000	Region	No	Effectively unlimited.
Message body size	256 KB	Per-message	No	Extended Library to S3 for >256 KB up to 2 GB.
Visibility timeout	0 to 12 hours	Per-message	No	Default 30 seconds.
Message retention	1 minute to 14 days	Per-queue	No	Default 4 days.
Delay seconds	0 to 900 (15 min)	Per-message or queue	No	—
Inflight messages (Standard queue)	120,000	Per-queue	No	Inflight = received but not deleted.
Inflight messages (FIFO queue)	20,000	Per-queue	No	—
Throughput (Standard queue)	Effectively unlimited	Per-queue	No	Scales horizontally with traffic.
Throughput (FIFO queue, high-throughput mode)	9,000 TPS (us-east-1, us-west-2, eu-west-1) without batching	Per-queue	No	Other regions: 3,000 TPS. Batching multiplies by 10.

Amazon EventBridge

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Event buses per account	100	Region	Yes	Includes default bus, partner buses, and custom buses.
Rules per event bus	300	Per-bus	Yes	—
Targets per rule	5	Per-rule	No	Hard ceiling.
Event size	256 KB	Per-event	No	JSON document including envelope.
PutEvents requests per second	10,000 RPS (us-east-1, us-west-2, eu-west-1)	Region	Yes	Lower in other regions, often 400 to 2,400.
Throttled events (DLQ)	Configurable	Per-rule	No	Failed deliveries routed to SQS DLQ if configured.
Archive size	Unlimited	Per-archive	—	Pricing applies per GB stored and replayed.
Schemas per registry	1,000	Per-registry	Yes	EventBridge Schemas / Schema Registry.

Amazon Kinesis Data Streams

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Shards per Region (Provisioned mode)	500 (us-east-1, us-west-2, eu-west-1) / 200 (other regions)	Region	Yes	Soft. Aggregate across all streams in the Region.
Data Streams per Region	50	Region	Yes	Independent of shard count.
PutRecord(s) throughput per shard	1 MiB/s or 1,000 records/s	Per-shard	No	Whichever limit hits first. Reshard or switch to On-Demand to scale.
GetRecords throughput per shard	2 MiB/s or 5 calls/s	Per-shard	No	Enhanced Fan-Out (EFO) gives each consumer a dedicated 2 MiB/s per shard.
Maximum record size	1 MiB	Per-record	No	Aggregate small records with the KPL to amortize overhead.
Data retention	24 hours (default), up to 365 days	Per-stream	No	Beyond 7 days incurs extended retention pricing.
On-Demand mode write throughput	200 MiB/s and 200,000 records/s	Per-stream	Yes	Default cap; raise via support case for higher.
On-Demand mode read throughput	400 MiB/s	Per-stream	No	2x the write throughput, sized for two consumer applications.
Consumers per stream (Enhanced Fan-Out)	20	Per-stream	Yes	Each EFO consumer gets a dedicated 2 MiB/s per shard.

AWS Step Functions

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
State machines per account	10,000	Region	Yes	—
Execution history events (Standard)	25,000 events	Per-execution	No	Hard ceiling. Loops or Map states can blow this quickly.
Maximum execution duration (Standard)	1 year	Per-execution	No	—
Maximum execution duration (Express)	5 minutes	Per-execution	No	—
State machine definition size	1 MB	Per-state-machine	No	ASL JSON document hard cap.
Maximum input / output size	262,144 bytes (256 KB)	Per-state	No	Use S3 references for larger payloads.
StartExecution requests (Standard)	1,300 TPS / bucket size 5,000	Region	Yes	Express has higher TPS, ~100,000 RPS.
Map state max concurrency (Distributed Map)	10,000	Per-state	No	Distributed Map raised the cap from Inline Map's 40.

Amazon MWAA (Managed Workflows for Apache Airflow)

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Environments per Region	5	Region	Yes	One environment per team / DAG-set is typical.
DAGs per environment	Effectively unlimited (memory-bound)	Per-environment	No	Worker memory caps DAG count in practice.
Workers per environment	1 to 25 (small) up to 1 to 50 (large)	Per-environment	Yes (env size)	Environment class (mw1.small/medium/large) sets the ceiling.
Schedulers per environment	2 to 5	Per-environment	No	Tied to env class.

4.8 Security

AWS IAM

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Roles per account	1,000	Account	Yes	Adjustable up to 5,000.
Users per account	5,000	Account	No	For long term, prefer IAM Identity Center (SSO) over IAM users.
Groups per account	300	Account	No	—
Customer-managed policies per account	1,500	Account	Yes	—
Managed policies attached to a role	10	Per-role	Yes	Adjustable up to 20.
Managed policies attached to a user	10	Per-user	Yes	Adjustable up to 20.
Inline policy size on a role	10,240 characters	Per-role	No	—
Inline policy size on a user	2,048 characters	Per-user	No	—
Inline policy size on a group	5,120 characters	Per-group	No	—
Policy versions per managed policy	5	Per-policy	No	Older versions must be deleted before adding new.
Access keys per IAM user	2	Per-user	No	For key rotation, use both slots.
MFA devices per user	8	Per-user	No	Hardware + virtual + Passkeys combined.
Server certificates per account	20	Account	Yes	Use ACM for TLS certificates instead.

AWS KMS

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Customer managed keys (CMKs) per Region	100,000	Region	Yes	Includes pending-deletion keys.
Aliases per CMK	50	Per-CMK	No	—
Grants per CMK	50,000	Per-CMK	No	Common with AWS services like DynamoDB / Lambda.
API request rate — symmetric Encrypt / Decrypt / GenerateDataKey (AES-256 / SYMMETRIC_DEFAULT)	30,000 RPS (us-east-1, us-west-2, eu-west-1)	Region	Yes	5,500 to 15,000 RPS in other regions.
API request rate — asymmetric Encrypt / Decrypt / Sign	500 to 1,000 RPS	Region	Yes	RSA / ECC keys are CPU-intensive.
Encrypt input size (symmetric)	4,096 bytes (4 KB)	Per-call	No	For larger data, encrypt with a data key locally.
Encrypt input size (asymmetric)	~190 to 470 bytes	Per-call	No	Depends on key spec (RSA-2048 / RSA-3072 / RSA-4096).
Key policy size	32 KB	Per-CMK	No	—

AWS WAF

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Web ACLs per account per Region	100	Region	Yes	CloudFront Web ACLs sit in us-east-1 (Global).
WCUs per Web ACL	1,500	Per-Web-ACL	Yes	Web ACL Capacity Units; each rule consumes WCUs.
WCUs per rule group	1,500	Per-rule-group	Yes	WAF v2 rule group capacity is WCU-bound. Per-rule WCU cost ranges from 1 to several dozen depending on match type and text transformations. Request a WCU increase via the Service Quotas Console.
Rule groups per Web ACL	20	Per-Web-ACL	No	—
IP sets per account per Region	100	Region	Yes	—
IP addresses per IP set	10,000	Per-IP-set	Yes	—
Maximum body inspection size (CloudFront)	64 KB	Per-request	No	For ALB / API Gateway: 8 KB default, adjustable to 64 KB.

AWS Shield Advanced

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Protected resources per account	1,000	Account	Yes	Shield Standard is free and automatic; Shield Advanced is paid.
Health-based detection	Configurable	Per-resource	—	Route 53 health checks integrate.
Maximum custom mitigation	Account-specific (verify in console)	Account	Yes	SRT (Shield Response Team) can apply ad-hoc rules.

AWS Secrets Manager

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Secrets per account per Region	500,000	Region	Yes	—
Secret value size	65,536 bytes (64 KB)	Per-secret	No	Both ciphertext and plaintext.
GetSecretValue TPS	10,000 RPS	Region	Yes	Use a local cache for very high read volume.
Versions per secret	~100	Per-secret	No	Old versions deprecated after 24 hours unless labeled.

4.9 Observability

Amazon CloudWatch

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Custom metrics per account	Effectively unlimited	Region	No	Cost scales with metric count and resolution.
Metric resolution (high resolution)	1 second	Per-metric	No	1-second metrics retained 3 hours, then aggregated.
Alarms per Region	5,000	Region	Yes	—
Composite alarms per account	500	Account	Yes	—
PutMetricData TPS	150 TPS	Region	Yes	Soft. Use batch (up to 1,000 metrics per call) to reduce TPS.
Log groups per Region	1,000,000	Region	Yes	—
Log group retention	1 day to 10 years (or Never expire)	Per-log-group	No	Default "Never expire" — set retention to control cost.
Log streams per log group	Effectively unlimited	Per-log-group	No	—
PutLogEvents per stream	5 RPS (default)	Per-stream	Yes	—
Maximum log event size	256 KB	Per-event	No	—
Subscription filters per log group	2	Per-log-group	No	Hard limit. For fan-out, route via a Lambda or Firehose subscriber that re-publishes.
Dashboards per account	500	Region	Yes	Each dashboard up to 500 widgets.

AWS X-Ray

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Traces per second	Default sampling 1 req/sec + 5%	Per-app	Yes	Sampling rules configurable.
Trace document size	500 KB	Per-trace	No	—
Segment document size	64 KB	Per-segment	No	—
Trace retention	30 days	Region	No	Hard.
Groups per account per Region	25	Region	Yes	—
Insights enabled per group	Configurable	Per-group	—	X-Ray Insights detects anomalies in traffic patterns.

4.10 Management

AWS CloudFormation

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Stacks per Region	2,000	Region	Yes	—
Resources per stack	500	Per-stack	No	Was 200 for years; raised to 500. Use nested stacks above this.
Parameters per stack	200	Per-stack	No	—
Outputs per stack	200	Per-stack	No	—
Mappings per template	200	Per-template	No	—
Template body size (direct)	51,200 bytes (50 KB)	Per-template	No	API call limit.
Template body size (via S3)	1 MB	Per-template	No	Reference template via TemplateURL.
Number of stack sets per administrator account	100	Account	Yes	StackSets fan out across Organization.
Operations per StackSet	1,000 concurrent	Per-StackSet	Yes	Throttle by region / account.

AWS Organizations

* You can sort the table by clicking on the column name.

Quota	Default	Scope	Adjustable	Notes
Member accounts per organization	10	Organization	Yes	Adjustable to thousands via support case. Some customers run 5,000+ accounts.
OUs per organization	1,000	Organization	No	—
OU nesting depth	5 levels	Organization	No	Plus Root = 6 total.
Service Control Policies (SCPs) per organization	1,000	Organization	No	—
SCPs attached to a single entity (account / OU / root)	5	Per-entity	No	—
SCP document size	5,120 characters	Per-SCP	No	JSON document hard cap.
Backup policies per organization	10	Organization	No	Organization-level Backup policies.
Tag policies per organization	10	Organization	No	—

5. Quotas Most Frequently Hit in Production

The list below is a ranked digest of quotas that, in my experience and from customer postmortems, are the ones engineering teams most often hit during launches, traffic spikes, and incidents. Mark these and pre-check them before scaling events.

* You can sort the table by clicking on the column name.

Rank	Service	Quota	Default	Why it bites
1	EC2	Running On-Demand Standard vCPUs	5 vCPUs (new accounts)	Surprise during scale-out tests. New accounts often blocked at <100 vCPUs.
2	Lambda	Concurrent executions	1,000	Burst events (S3 event, EventBridge fan-out) saturate before steady-state warms.
3	DynamoDB	Partition throughput (1,000 WCU / 3,000 RCU)	Per partition	Hot partition exhausts even when table-level is far from cap.
4	S3	PUT/POST/COPY/DELETE rate per prefix	3,500 RPS	Sequential keys serialize to one prefix; rehash strategy fixes.
5	API Gateway	Integration timeout	29 sec (REST default, up to 300 sec) / 30 sec (HTTP, hard)	Long-running backend. REST adjustable via Service Quotas; HTTP remains hard.
6	API Gateway	Steady-state RPS	10,000 RPS	Account-level, shared across all APIs.
7	KMS	Symmetric Decrypt RPS	30,000 RPS (us-east-1) / 5,500 to 15,000 (others)	Per-region throttle hits at burst (sign-in storms, batch decrypt).
8	EventBridge	PutEvents RPS	10,000 (us-east-1, us-west-2, eu-west-1)	Lower in other regions (400 to 2,400).
9	Step Functions	Execution history events (Standard)	25,000 events	Loops or large Maps blow the cap.
10	CloudWatch	PutMetricData TPS	150 TPS	EMF logs at high cardinality saturate.
11	CloudFormation	Resources per stack	500	Hard; forces nested stacks.
12	EC2	Elastic IPs per Region	5	Multi-AZ NAT + reservations exhaust quickly.
13	VPC	VPCs per Region	5	Multi-environment in one account hits this.
14	SQS	Inflight messages (Standard queue)	120,000	Slow consumer causes inflight to stack.
15	RDS	DB instances per Region	40	Microservice-per-DB pattern hits early.
16	Lambda	Burst concurrency	500 to 3,000 concurrent executions (region-dependent)	Cold start storm at deploy or after dormant period. Post-burst scaling now +1,000 concurrent per 10 seconds, per function.
17	CloudFront	Lambda@Edge function size (viewer)	1 MB	Pulled in many SDKs cross the size cap.
18	IAM	Managed policies attached to a role	10 (adjustable to 20)	Permission-set sprawl hits quickly.
19	Organizations	SCPs attached to a single entity	5	Layered guardrails approach 5 quickly.
20	Bedrock	InvokeModel RPM per model	Varies by model / region	Production launches blocked until per-model quota raised. Plan 7+ days lead time.

A common pattern across these quotas: the issue is rarely "AWS can't scale here" — it is "AWS does not auto-scale this quota for you and you forgot to ask in advance." For the adjustable rows, set Service Quotas Console utilization alarms at 80%.

6. How to Request Quota Increases

The mechanics of getting an adjustable quota raised changed substantially in 2019 with the introduction of the unified Service Quotas service and again in subsequent years with finer-grained per-service quota lists. As of 2026-05, three paths exist.

6.1 Path 1 — Service Quotas Console (most quotas)

The fastest path for most adjustable quotas:

Open the Service Quotas Console for the target Region.
Search for the quota by name or service.
Click Request quota increase, enter the new value, and submit.
Many quotas auto-approve within minutes (especially EC2 vCPU for established accounts and low-multiple increases).
Larger increases (10x or more) escalate to a human reviewer; SLA typically 1 to 5 business days.

You can also do this via API: aws service-quotas request-service-quota-increase --service-code <code> --quota-code <code> --desired-value <n>.

6.2 Path 2 — Support case (legacy quotas, complex changes)

Some service-specific limits still live in legacy support categories rather than Service Quotas. SES sending limit, Route 53 domain registration limit, and certain regional EC2 capacity pools fall in this group.

Open Support Center → Create case.
Choose Service limit increase, then the service and Region.
Provide workload description, expected steady-state, and peak.
SLA depends on support plan tier. Business and Enterprise plans typically respond within hours; Basic / Developer support may take days.

6.3 Path 3 — Account team escalation (large or time-sensitive)

For launches that need quota increases in multiple services simultaneously, large multi-account fleets, or quotas near the regional cap, work with your AWS Technical Account Manager (TAM) or Solutions Architect (SA):

Submit a structured launch plan: services, current quotas, desired quotas, expected go-live date.
Account teams can coordinate parallel approvals across teams (EC2 capacity, KMS, Bedrock, Lambda concurrency).
Lead time: plan 7 to 14 business days for cross-service launches. Bedrock model quotas in particular benefit from this path.

6.4 Strategies before requesting

Before filing a request, exhaust the architectural options:

Shard across accounts (Organizations + Account Vending) to multiply region-level quotas.
Shard across Regions (multi-region active-active) to multiply Region-level quotas.
Use reserved concurrency (Lambda) to carve a guaranteed slice without raising the account ceiling.
Use Provisioned Throughput (Bedrock, DynamoDB) where deterministic capacity is a tighter constraint than per-region quotas.
Use Service-Linked Roles and managed policies instead of large inline policies to avoid per-role IAM policy size caps.
Move config out of environment variables (Lambda 4 KB cap) into Parameter Store or Secrets Manager.

When you do file, always include expected steady-state RPS, expected peak RPS, peak duration, and go-live date. AWS reviewers approve faster when the numbers are concrete than when the request reads "we might grow."

6.5 Quotas you can NOT raise

Be realistic: roughly one-third of the quotas in this article are hard. Filing a support case for these wastes everyone's time. The hard quotas you should design around — never raise — include:

DynamoDB item size 400 KB
S3 object size 5 TB
SQS / SNS message size 256 KB
Lambda function timeout 900 seconds
Lambda payload (sync) 6 MB
Step Functions execution history 25,000 events
API Gateway HTTP API integration timeout 30 seconds (REST is now adjustable up to 300 seconds)
CloudFormation resources per stack 500
IAM inline policy size (per role) 10,240 characters
KMS symmetric Encrypt input 4 KB
Aurora cluster readers 15

For these, the design pattern is to compose smaller calls (Step Functions Map, Lambda chaining, multipart upload) or to externalize state to S3.

7. Frequently Asked Questions

7.1 Where are quotas different between Regions?

Compute and rate quotas are usually larger in us-east-1, us-west-2, and eu-west-1 than elsewhere. Examples:

KMS symmetric Decrypt: 30,000 RPS in those three Regions; 5,500 to 15,000 RPS elsewhere.
SNS Publish: ~30,000 TPS in those Regions; lower elsewhere.
EventBridge PutEvents: 10,000 RPS in those Regions; 400 to 2,400 RPS elsewhere.
SQS FIFO high-throughput: 9,000 TPS in those Regions; 3,000 TPS elsewhere.

When you deploy in less-trafficked Regions (e.g. ap-northeast-1 for some quotas, eu-central-1, ap-south-1), check Region-specific defaults before sizing.

7.2 Are quotas the same in all accounts of an Organization?

No. Each member account has its own per-Region quotas, defaulted to the new-account values. Establishing a new account does not inherit increases from the management account or sibling accounts. Use AWS Control Tower account-baseline automation or service-quotas request-service-quota-increase in account-vending pipelines.

7.3 Are quotas the same for AWS Free Tier accounts?

Yes — the quota architecture is identical; Free Tier only affects pricing. New accounts (free or paid) share the same conservative defaults.

7.4 How do I check my account's current quota for a specific service?

Programmatically:

aws service-quotas list-service-quotas --service-code <code>
aws service-quotas get-service-quota --service-code <code> --quota-code <code>
aws service-quotas list-aws-default-service-quotas --service-code <code>

Console: Service Quotas Console → choose service → filter by quota name.

Note that some quotas are not yet onboarded to Service Quotas API. For those, use the Trusted Advisor "Service Limits" check (Business / Enterprise support) or the service-specific console.

7.5 How quickly do quota changes take effect?

Auto-approved increases: minutes.
Reviewed increases (Service Quotas): 1 to 5 business days.
Cross-service launch increases (TAM-coordinated): 7 to 14 business days.
Hard quotas: never.

7.6 Can I set alerts when I am approaching a quota?

Yes. Service Quotas integrates with CloudWatch — for each adjustable quota that supports monitoring, you can enable CloudWatch usage metrics and create alarms (e.g. alarm at 80% of quota). Combine with EventBridge to notify an SRE channel.

Programmatic check pattern:

aws service-quotas get-service-quota \
  --service-code lambda \
  --quota-code L-B99A9384 \
  --query 'Quota.Value'

then compare against GetMetricStatistics for the relevant CloudWatch usage metric (AWS/Usage namespace).

7.7 Why is the Bedrock model RPM quota so low?

Bedrock model quotas are intentionally conservative because the underlying inference capacity is shared and pricing is per-token. AWS sizes these quotas based on aggregate region capacity and revises them often. For production workloads, you will almost certainly need to file an increase request 7+ days before launch. Cross-region inference profiles (model IDs prefixed with us. or eu., such as us.anthropic.claude-sonnet-4-6-...-v1:0) aggregate quotas across multiple Regions and are the preferred high-throughput path.

7.8 What is the difference between a quota and a throttle?

A quota is a ceiling enforced over a region / account / resource over a period (snapshot or aggregate). Going over yields LimitExceededException, ServiceQuotaExceededException, or service-specific error codes.
A throttle is a short-term rate limit, often token-bucket based, that returns ThrottlingException / Rate exceeded and is generally retryable with backoff.

Many adjustable quotas are technically throttle policies (KMS request rates, EventBridge RPS, API Gateway RPS). The distinction matters for retry strategy: throttles are transient and should be retried with exponential backoff and jitter; hard quota violations should fail closed and alert.

7.9 Do quotas apply differently to AWS-internal callers (e.g. EventBridge → Lambda)?

Yes, in some cases. EventBridge invocations to Lambda do count toward Lambda's concurrent execution limit but use a separate retry / DLQ mechanism. S3 events to Lambda likewise count. However, AWS services calling each other (e.g. CloudWatch Logs → Lambda for subscription filters) often use service-linked roles and may have their own internal back-pressure independent of customer quotas.

7.10 What changes the most often?

In my observation, in roughly this order:

Bedrock model RPM / TPM (revised multiple times per year as inference capacity scales).
EC2 new-account vCPU pools (AWS adjusts conservative defaults as fraud / abuse models improve).
Lambda burst concurrency (Region-specific, occasionally revised upward).
DynamoDB on-demand initial throughput (revised upward over time).
CloudFormation resources per stack (was 200, raised to 500).
DynamoDB GSIs per table (was 5, raised to 20).
S3 buckets per account (was 100, raised significantly).

Treat any quota value older than 12 months in a third-party blog with suspicion — including this one. The Service Quotas Console is always authoritative.

8. Summary

This article condensed the AWS Service Quotas surface area into a single-page reference for the major AWS services that production engineers touch most. The principles to internalize:

Default ≠ maximum. Most adjustable quotas can be raised; "we hit a limit" usually means "we did not ask in advance."
Per-prefix, per-partition, per-tunnel quotas hide inside per-region quotas. Look one level deeper than the headline number.
Hard quotas are design constraints. DynamoDB 400 KB, Lambda 900 sec, SQS 256 KB, Step Functions 25,000 events, CloudFormation 500 resources — design around them.
Regions are not equal. us-east-1, us-west-2, eu-west-1 carry higher per-Region rate quotas than other Regions.
Lead time matters. 7 to 14 business days for cross-service launches; the time to file the case is the day you start the project, not the day before launch.

The Service Quotas Console is the source of truth. This article is the index.

9. References

Related Articles on hidekazu-konishi.com

AWS Lambda Cold Start Mitigation Guide — Concurrency, burst, and provisioned-concurrency mechanics referenced in section 4.1.
Amazon DynamoDB Single-Table Design Guide — Partition throughput and hot-partition mechanics referenced in section 4.4.
Amazon DynamoDB Key Design — GSI and LSI Dictionary — GSI and LSI quotas in depth.
Amazon S3 Object Key Design Best Practices — Per-prefix request rate scaling.
AWS PrivateLink and VPC Endpoints Complete Guide — Interface and gateway endpoint quotas.

References:
Tech Blog with curated related content

Written by Hidekazu Konishi