AWS Service Quotas - A Practical Cheat Sheet for Major AWS Services

First Published:
Last Updated:

This article is a single-page numeric reference of the AWS service quotas (formerly known as service limits) that production engineers, capacity planners, and SREs hit most often in real workloads. It is intentionally not a complete catalog of every quota in every service — that catalog already exists in the AWS Service Quotas Console. Instead, this article curates the 5 to 15 quotas per service that actually matter when you size a workload, design for blast radius, or write a quota-increase request.

All numbers in this article are the default quotas as of 2026-05 and assume a standard commercial AWS region (us-east-1 / us-west-2 / eu-west-1). Quotas that AWS frequently revises, or that vary per account due to age and usage history, are labeled Account-specific (verify in console) so that you re-confirm the live value before designing a change.

This article is intentionally text-and-table only — no diagrams. Numeric references should be skimmable and searchable, not pictorial.

Table of Contents

1. Overview — Why a Service Quotas Cheat Sheet

Service Quotas (the public name AWS adopted to replace the older term "service limits") are the per-account, per-region (or per-resource) ceilings that AWS enforces on its APIs and resources. They exist for three overlapping reasons: to protect the underlying control plane from runaway consumption, to give AWS a forecasting signal for capacity provisioning, and to give customers a soft-fail boundary instead of a cliff.

In day-to-day work, three patterns repeat:

  1. A team finishes a load test in dev and is surprised when prod rejects the same call — because the prod account is newer and has lower default On-Demand vCPU and Lambda concurrent execution quotas than the older dev account.
  2. An incident postmortem cites "service limit reached" — Kinesis shard count, EventBridge target count, KMS request rate, and CloudFormation resources-per-stack are the classics here, and three of those four are adjustable but were never adjusted.
  3. A capacity planning spreadsheet uses the maximum theoretical throughput of a service — but the per-account default is a fraction of that maximum, and the gap is the lead time for a quota-increase request.

This cheat sheet exists so that all three problems can be resolved from a single page, with each quota labeled Adjustable or Non-adjustable so you can immediately answer "can we raise this in time?"

For deeper service-specific design treatments — particularly around concurrency, single-table design, and key/prefix scaling — see the related internal references at the bottom of this article.

2. How to Use This Cheat Sheet

The intended workflow is:

  1. Find the service in section 4 (Service-by-Service Quotas). Services are grouped by category — Compute, Container, Storage, Database, Network, AI/ML, Integration, Security, Observability, Management.
  2. Read the quota row. Each table has five columns: Quota, Default, Scope, Adjustable, Notes. Hover or scan the Notes column for the gotcha — for example, S3's request rate of 3,500 PUT/COPY/POST/DELETE per second is per prefix, not per bucket, which changes the entire layout strategy.
  3. If the workload depends on a value near the default, open the AWS Service Quotas Console (aws-region.console.aws.amazon.com/servicequotas/home) and confirm the live, account-specific value. A cell labeled Account-specific (verify in console) in this article means the value drifts between accounts or has been revised by AWS recently, so this article does not commit to a single default.
  4. If a quota is Adjustable: Yes, plan the request. Most adjustable quotas are processed in hours but some (regional vCPU pools, KMS request rates above 30,000 RPS, SES sending limits) can take 2 to 14 business days, and account managers can prioritize for production launches if you cite a date and a workload.
  5. If a quota is Adjustable: No, it is a hard architectural constraint. Design around it. Examples: DynamoDB item size 400 KB, SQS message size 256 KB, Lambda function timeout 900 seconds, S3 object size 5 TB. These will never be raised by a support ticket.

A short legend on scope:

  • Account: a single global ceiling per AWS account (e.g. IAM users, S3 buckets — though S3's bucket count is now soft and adjustable).
  • Region: per-account, per-region (e.g. Lambda concurrent executions, EC2 vCPUs, DynamoDB tables).
  • Per-resource: bounded per individual resource (e.g. items per DynamoDB table partition, parts per S3 multipart upload, rules per security group).
  • Per-API: a TPS or request-rate ceiling on a specific API operation (e.g. KMS Decrypt requests per second).

3. Quota Categories — Five Axes for Reasoning

Every AWS quota falls along five axes simultaneously. Understanding the axes is the difference between filing the right support case and the wrong one.

3.1 Adjustable vs. Non-adjustable

Adjustable (also called "soft") quotas are ceilings AWS chose for safety, not for technical reasons. They can be raised via a support case, an AWS account team request, or programmatically through the Service Quotas API. Most account-level quotas — vCPUs, Lambda concurrency, EBS storage, IAM roles, EventBridge rules — are adjustable.

Non-adjustable (also called "hard") quotas are technical or architectural ceilings of the underlying system. They will not be raised even for the largest customers. Examples:

  • DynamoDB item size: 400 KB.
  • SQS / SNS message body: 256 KB.
  • Lambda function timeout: 900 seconds (15 min).
  • S3 object size: 5 TB.
  • Step Functions Standard execution history: 25,000 events.

When you read a quota row, the Adjustable column is the first thing to look at. If it says No, your design choices are constrained.

3.2 Account-level vs. Region-level vs. Per-resource

  • Account-level quotas are global to the AWS account. The classic examples are IAM users (5,000), IAM roles (1,000, adjustable), Organizations OUs and accounts, and Service Control Policies.
  • Region-level quotas are the most common kind. EC2 vCPUs, Lambda concurrency, DynamoDB tables per region, VPC count — all enforced per (account, region) pair.
  • Per-resource quotas bound a specific resource. Parts in an S3 multipart upload (10,000), rules per security group (60+60 default), GSIs per DynamoDB table (20), subnets per VPC (200), targets per EventBridge rule (5), resources per CloudFormation stack (500).

When you scale horizontally — sharding across accounts via Organizations or across regions via active-active — you are typically buying yourself more copies of a region-level quota pool, not raising an account-level one.

3.3 Per-API and Throttle (Rate) Quotas

A second class of region-level quotas governs API request rate rather than resource count. These are usually expressed as TPS or requests per second:

  • KMS Decrypt, GenerateDataKey, Encrypt: 5,500 to 30,000 RPS depending on key spec and region.
  • API Gateway: 10,000 RPS account-level steady-state, 5,000 burst.
  • EventBridge PutEvents: 10,000 events per second per region (us-east-1, us-west-2, eu-west-1; lower in other regions).
  • CloudWatch PutMetricData: 150 TPS per region.

Rate quotas are often the first quotas to bite at scale because they multiply with traffic, not with resource count. A workload that pre-creates resources well under the resource-count limits can still hit a per-API rate limit at peak.

3.4 Per-Partition / Per-Prefix Quotas

A subtler category exists in storage and serverless services: quotas applied at a sub-resource level that the customer has indirect control over.

  • S3 request rate: 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix. The bucket has effectively unlimited aggregate throughput, but each prefix has a ceiling, so the bucket's effective throughput is a function of key design. See Amazon S3 Object Key Design Best Practices for the full treatment.
  • DynamoDB partition throughput: 3,000 RCU / 1,000 WCU per partition. A hot partition exhausts its throughput even when the table-level provisioned throughput is far from its quota. See Amazon DynamoDB Single-Table Design Guide for partition-key design.
  • Lambda burst concurrency: account-level burst limit of 500 to 3,000 concurrent executions at the moment of a traffic surge (depending on region), independent of the steady-state concurrency quota. After the burst, each function scales at +1,000 concurrent executions every 10 seconds.

These quotas often appear "phantom" because the high-level metric is fine but a sub-bucket of it is saturated.

3.5 Snapshot vs. Aggregate Quotas

Lastly, distinguish between a quota measured at an instant (snapshot) and a quota measured over a window (aggregate).

  • Snapshot: number of running EC2 instances, count of active S3 multipart uploads, currently allocated Elastic IPs, in-flight messages in an SQS queue.
  • Aggregate: requests per second to an API, events per second to EventBridge, sent messages per day in SES.

Adjustment lead times differ. Snapshot quotas often increase the moment AWS processes the case; aggregate (rate) quotas may require a 24- to 72-hour warm-up.

4. Service-by-Service Quotas

Forty-five services follow, grouped by category. Each table lists 5 to 15 quotas — the ones that actually constrain production. Service Quotas Console contains hundreds more per service; the values selected here are the ones you reach for in capacity reviews and incident analysis.

4.1 Compute

Amazon EC2
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) vCPUs5 vCPUsRegionYesNew-account default is intentionally low; production accounts typically raised to 64 to several thousand. The number is vCPUs, not instances.
Running On-Demand G and VT vCPUsAccount-specific (verify in console)RegionYesGPU and video-transcode instance families have their own per-family vCPU pool, often 0 in new accounts.
Running On-Demand P vCPUsAccount-specific (verify in console)RegionYesSame pattern — P-family (high-end GPU) has its own pool, requires explicit increase.
All Spot Instance Requests (vCPUs)5 vCPUsRegionYesSpot has independent vCPU pools per family, mirroring On-Demand.
Elastic IP addresses5RegionYesCounts both attached and unattached. Unattached EIPs accrue charges.
EBS-backed snapshots100,000RegionYesStack-up effect over time; revisit lifecycle policies before requesting.
AMIs50,000RegionYesPruning policy strongly recommended; old AMIs hold snapshots.
Security groups per VPC2,500Per-VPCYesUsed to be 500; the cap was raised in 2020. Inbound + outbound rules per SG combine.
Rules per security group60 inbound + 60 outboundPer-SGYesCan be raised up to 1,000 each, but rule count × SG-per-ENI is a hard product limit (1,000).
EC2-ClassicNot availableFully retired 2022-08-15. Mentioned here because legacy diagrams still reference it.

AWS Lambda
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Concurrent executions1,000RegionYesAccount-level pool shared across all functions. Reserved concurrency carves out a guaranteed slice.
Burst concurrency500 to 3,000 concurrent executionsRegionNoRegion-specific instantaneous burst limit. After the burst, each function scales at +1,000 concurrent executions every 10 seconds (changed from the older +500/minute account-level rate in December 2023). See AWS Lambda Cold Start Mitigation Guide.
Function memory128 MB to 10,240 MBPer-functionNoMemory determines vCPU allocation linearly. Above 1,769 MB you get more than 1 full vCPU.
Function timeout900 seconds (15 min)Per-functionNoHard ceiling. For longer work, use Step Functions or ECS/Fargate Tasks.
Deployment package size (zipped, direct upload)50 MBPer-functionNoS3-deployed zips can be larger but unzipped size still caps at 250 MB.
Deployment package size (unzipped)250 MBPer-functionNoZip deployments only. Container image deployments allow up to 10 GB.
Container image size10 GBPer-functionNoLambda pulls from ECR; first cold start with a 10 GB image is noticeably slower.
Environment variables size4 KBPer-functionNoTotal of all env vars including keys. Larger config should live in Parameter Store / Secrets Manager.
Layers per function5Per-functionNoLayer total unzipped size + function unzipped size must be ≤ 250 MB.
Ephemeral /tmp storage512 MB to 10,240 MBPer-functionNoPer-invocation scratch space. Configurable up to 10 GB; reused across warm invocations.
Function URL request timeout900 secondsPer-functionNoMatches function timeout. Streaming responses supported.
Payload size (sync)6 MBPer-invocationNoBoth request and response. Async invocation has its own limit (256 KB).
Payload size (async)256 KBPer-invocationNoAsync payload is queued via internal SQS-like buffer, hence the 256 KB ceiling.

AWS Fargate
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Fargate On-Demand vCPU resource count (ECS)Account-specific (verify in console)RegionYesDefaults vary by region and account age. Often 6 to 100 in new accounts.
Fargate Spot vCPU resource count (ECS)Account-specific (verify in console)RegionYesSeparate pool from On-Demand.
Task CPU0.25 to 16 vCPUPer-taskNoAllowed: 0.25, 0.5, 1, 2, 4, 8, 16 vCPU. Memory is bounded by CPU choice.
Task memory0.5 to 120 GBPer-taskNoMemory increments allowed depend on the chosen CPU bucket.
Task ephemeral storage20 GB to 200 GBPer-taskNoConfigurable per task definition. Default 20 GB.

AWS Batch
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Compute environments per Region50RegionYesPlan separate compute environments per workload class.
Job queues per Region50RegionYesQueues bind to compute environments by priority.
Job definitions per RegionAccount-specific (verify in console)RegionYesVersioned; old versions are retained unless explicitly deregistered.
Jobs in SUBMITTED state per queue1,000,000Per-queueYesLarge pipelines should batch-submit and watch for throttling.
Array job size10,000 child jobsPer-array-jobNoHard cap on parallel array job size.

4.2 Container

Amazon ECS
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Clusters per Region10,000RegionYesMost workloads stay under 50; one cluster per environment is typical.
Services per cluster5,000Per-clusterYesCluster split if approaching.
Tasks launched per service (ECS Service)5,000Per-serviceYesTied to desired count.
Tasks per cluster5,000Per-clusterYesAcross all services and standalone tasks.
Container instances per cluster (EC2 launch type)5,000Per-clusterYesNot applicable for Fargate launch type.
Task definition size64 KBPer-task-definitionNoJSON document hard cap.
Containers per task definition10Per-taskNoSidecar patterns (logging, service mesh) consume this quickly.

Amazon EKS
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Clusters per Region100RegionYesOne cluster per environment is the common pattern, multi-tenant via namespaces.
Managed node groups per cluster30Per-clusterYesSeparate node groups for taints / instance families.
Nodes per managed node group450Per-node-groupYesHard scaling limit per group, not per cluster.
Pods per nodeVPC-CNI / IP limitsPer-nodeNoBounded by ENI count and IPs per ENI; varies by instance type. Use prefix delegation to relax.
Fargate profiles per cluster10Per-clusterYesSelector limit (5 per profile) often forces multiple profiles.

Amazon ECR
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Repositories per Region10,000RegionYesPattern: one repo per micro-service; large fleets approach this.
Images per repository10,000Per-repositoryYesConfigure lifecycle policy to prune old tags.
Maximum image size10 GiBPer-imageNoSame as Lambda container image cap.
Image layer size10 GiBPer-layerNoSingle layer hard cap.
Rate of image pull (ECR Private)Account-specific (verify in console)RegionYesPulls do throttle at scale; VPC endpoint and image caching mitigate.

4.3 Storage

Amazon S3
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Buckets per account10,000AccountYesWas 100 for many years; raised significantly. Approach with caution — most designs need far fewer.
Object size5 TBPer-objectNoHard ceiling. Single PUT max 5 GB; above that requires multipart upload.
Multipart upload — parts10,000Per-uploadNoPart size 5 MB to 5 GB (last part may be smaller).
Multipart upload — part size5 MB to 5 GBPer-partNoLast part exempted from 5 MB minimum.
PUT/COPY/POST/DELETE rate3,500 RPS per prefixPer-prefixNo (scales automatically)Bucket scales horizontally as you add prefixes. See object key design.
GET/HEAD rate5,500 RPS per prefixPer-prefixNo (scales automatically)Aggregate bucket throughput is effectively unlimited if keys are sharded.
Bucket policy size20 KBPer-bucketNoJSON hard cap. Consider IAM policies or Access Points for complex auth.
Access Points per Region per account10,000RegionYesPer-bucket access patterns can be split out via Access Points.
S3 Object Lambda Access Points per account per Region1,000RegionYesUsed when transforming objects on retrieval.
Lifecycle rules per bucket1,000Per-bucketNoStack pattern: tier S3 Standard → IA → Glacier consumes 2 to 3 rules.

Amazon EBS
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Total storage for gp3 volumes (TiB)50RegionYesNew-account default; production typically raised to several hundred TiB.
Total storage for io2 volumes (TiB)20RegionYesio2 reserved for critical DB workloads.
Snapshots per Region100,000RegionYesLifecycle Manager strongly recommended at scale.
Volume size (gp3 / gp2 / io1 / io2)16 TiBPer-volumeNoio2 Block Express supports up to 64 TiB.
IOPS — gp33,000 baseline, 16,000 maxPer-volumeNoProvisionable up to 16,000.
IOPS — io2 Block Express256,000Per-volumeNoHighest single-volume IOPS in EBS.

Amazon EFS
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
File systems per Region1,000RegionYesMost workloads use 1 to 10 file systems.
Mount targets per file system1 per AZPer-file-systemNoOne mount target per AZ regardless of subnet count in that AZ.
Access Points per file system1,000Per-file-systemYesUsed for per-tenant POSIX root and UID/GID enforcement.
Maximum file size47.9 TiBPer-fileNo52,673,613,135,872 bytes per file (≈ 47.9 TiB / ≈ 52.7 TB decimal). NFSv4.1 protocol limit.
Throughput (Bursting mode)Scales with stored dataPer-file-systemNoBaseline 50 MiB/s per TiB stored; burst up to 100 MiB/s per TiB (minimum 100 MiB/s). Switch to Provisioned or Elastic mode for higher.

Amazon FSx
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
FSx for Lustre — file systems per Region100RegionYesHPC workloads cluster many small file systems per job.
FSx for Windows — file systems per Region100RegionYesSMB protocol; integrates with AD.
FSx for OpenZFS — file systems per Region100RegionYes
FSx for NetApp ONTAP — file systems per RegionAccount-specific (verify in console)RegionYes
Maximum file system size (Lustre)Account-specific (verify in console)Per-file-systemVaries by deployment type (Scratch 1 / Scratch 2 / Persistent 1 / Persistent 2).

Amazon S3 Glacier (Vaults, legacy API)
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Vaults per Region1,000RegionYesGlacier vault API is legacy; prefer S3 Glacier storage classes for new designs.
Archive size40 TBPer-archiveNoHard ceiling for single archive via vault API.
Archive retrieval time (Standard)3 to 5 hoursPer-jobNoBulk: 5 to 12 hours; Expedited: 1 to 5 minutes (subject to capacity).
Job description size1 KBPer-jobNoCustom job parameters.

4.4 Database

Amazon RDS
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
DB instances per Region40RegionYesIncludes Aurora cluster members.
Manual DB snapshots100RegionYesExcludes automated backups.
Total storage for DB instances (TiB)100RegionYesAcross all engines combined.
Read replicas per DB instance5 (MySQL / MariaDB / PostgreSQL)Per-instanceNoAurora supports 15 readers per cluster instead.
Parameter groups100RegionYesAcross all engine versions.
Maximum DB storage size — General Purpose SSD (MySQL, MariaDB, PostgreSQL)64 TiBPer-instanceNoSQL Server caps lower (16 TiB).
Tags per resource50Per-resourceNoCommon across most AWS services.

Amazon Aurora
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
DB clusters per Region40RegionYesCounts toward the 40 RDS DB instance ceiling indirectly.
Instances per cluster16Per-clusterNo1 writer + 15 readers maximum.
Cluster storage size128 TiBPer-clusterNoAuto-scaling; you pay for what you use.
Aurora Global Database — secondary Regions5Per-clusterNoPlus 1 primary = 6 Regions total.
Aurora Serverless v2 — ACU range0.5 to 256 ACUsPer-instanceNoAuto-scaling capacity unit. v1 used different model.

Amazon DynamoDB
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Tables per Region2,500RegionYesWas 256 for years; revised upward. Single-table design typically needs <10.
Item size400 KBPer-itemNoIncludes attribute names and values. Hard ceiling.
Partition key length2,048 bytesPer-keyNoUTF-8 encoded.
Sort key length1,024 bytesPer-keyNoUTF-8 encoded.
Global Secondary Indexes (GSI) per table20Per-tableYesWas 5, then raised. See DynamoDB Key Design GSI/LSI Dictionary.
Local Secondary Indexes (LSI) per table5Per-tableNoHard, must be declared at table-create time.
Provisioned table throughput (per Region)40,000 RCU / 40,000 WCURegionYesAccount-level default ceiling shared across tables.
Provisioned table throughput (per table)40,000 RCU / 40,000 WCUPer-tableYesPer-table cap independent of account ceiling.
On-demand initial table throughput4,000 WCU / 12,000 RCUPer-tableNoDoubles automatically up to twice previous peak.
Partition throughput1,000 WCU / 3,000 RCUPer-partitionNoHot partitions cap here even if table cap unused. See single-table design.
Transaction batch size100 actions / 4 MBPer-callNoTransactWriteItems / TransactGetItems.
BatchWriteItem batch size25 items / 16 MBPer-callNo
BatchGetItem batch size100 items / 16 MBPer-callNo
Query / Scan result page size1 MBPer-pageNoPaginate via LastEvaluatedKey.

Amazon ElastiCache
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Nodes per Region (across all clusters)300RegionYesCombined ElastiCache (Redis / Valkey / Memcached) ceiling.
Nodes per cluster (Redis cluster mode disabled)6 (1 primary + 5 replicas)Per-clusterNoCluster mode enabled has different limits.
Shards per cluster (Redis cluster mode enabled)500Per-clusterYesEach shard has 1 primary and up to 5 replicas.
Replicas per shard5Per-shardNo
Manual snapshots per account per Region150RegionYes

Amazon MemoryDB for Redis
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Clusters per Region50RegionYes
Shards per cluster500Per-clusterYes
Replicas per shard5Per-shardNo
Snapshots per account per RegionAccount-specific (verify in console)RegionYes

Amazon DocumentDB
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Clusters per Region40RegionYes
Instances per cluster16Per-clusterNo1 primary + 15 read replicas.
Storage per cluster128 TiBPer-clusterNoAuto-scaling.
Document size16 MBPer-documentNoMongoDB-compatible BSON cap.
Index size2,048 bytesPer-index-entryNo

Amazon Neptune
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Clusters per Region40RegionYes
Instances per cluster16Per-clusterNo1 writer + 15 readers.
Storage per cluster128 TiBPer-clusterNoAuto-scaling.
Concurrent loader jobs1Per-clusterNoLoader is single-threaded per cluster.
Query timeout120 seconds defaultPer-queryYesAdjustable via cluster parameter group up to several thousand seconds.

4.5 Network

Amazon VPC
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
VPCs per Region5RegionYesMulti-account / multi-region designs are the long-term answer.
Subnets per VPC200Per-VPCYesThree-tier (public / private / data) × 3 AZ = 9 typically.
Route tables per VPC200Per-VPCYesIncludes the main route table.
Routes per route table50Per-route-tableYesCan be raised to 1,000 but BGP-propagated routes count toward this.
Network ACLs per VPC200Per-VPCYes
Rules per network ACL20 inbound + 20 outboundPer-NACLYesCan be raised, but evaluation order matters.
Internet gateways per Region5RegionYesTied to VPC count.
NAT gateways per AZ5Per-AZYesOne NAT gateway scales to 45 Gbps; usually you want 1 per AZ.
Elastic Network Interfaces per Region5,000RegionYesLambda VPC attaches consume ENIs.
VPC peering connections per VPC50Per-VPCYesTransit Gateway scales better above ~10 VPCs.

AWS Transit Gateway
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Transit gateways per account per Region5RegionYesOne per Region is typical.
Attachments per TGW5,000Per-TGWYesVPC / VPN / Direct Connect Gateway / Peering attachments combined.
Routes per TGW route table10,000Per-route-tableNoHard ceiling.
Bandwidth per VPC attachment50 Gbps burstPer-attachmentNoVPC attachments scale up to 50 Gbps.
Bandwidth per VPN attachment1.25 Gbps per tunnelPer-tunnelNoECMP doubles this with 2 tunnels.

AWS PrivateLink (VPC Endpoints)
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Interface VPC endpoints per VPC50Per-VPCYesSee PrivateLink and VPC Endpoints Complete Guide.
Gateway VPC endpoints per VPC20Per-VPCYesS3 and DynamoDB only.
VPC endpoint services per Region50RegionYesFor providing your own service via PrivateLink.
Connections per VPC endpoint service50Per-serviceYesEach consumer VPC counts as one connection.
Endpoint policy size20,480 charactersPer-endpointNoJSON hard cap.

Amazon Route 53
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Hosted zones per account500AccountYes
Records per hosted zone10,000Per-zoneYesCan be raised; rare.
Health checks per account200AccountYes
Traffic policies per account50AccountYes
Domains per account (Registrar)20AccountYesRegistration is separate from DNS hosting.
Resolver endpoints per Region4RegionYes2 inbound + 2 outbound typical.

Amazon CloudFront
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Distributions per account200AccountYes
Origins per distribution25Per-distributionYesMulti-origin failover patterns approach this.
Cache behaviors per distribution25Per-distributionYesOne default + up to 24 path-pattern behaviors.
Alternate domain names (CNAMEs) per distribution100Per-distributionYes
Request rate per distribution250,000 RPSPer-distributionYesSoft, raised on request for known launches.
Data transfer rate per distribution150 GbpsPer-distributionYes
File size30 GBPer-fileNoUse S3 multipart and signed URLs for larger transfers.
Lambda@Edge function size1 MB (viewer-request / viewer-response) / 50 MB (origin-request / origin-response)Per-functionNoViewer events have a tight package cap; origin events allow larger bundles.

Amazon API Gateway
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Account-level steady-state request rate (REST API)10,000 RPSRegionYesAcross all APIs in the account / Region.
Account-level burst (REST API)5,000 requestsRegionYesToken-bucket burst before steady-state.
HTTP API request rate10,000 RPSRegionYesHTTP APIs have their own pool.
WebSocket connections per Region500,000 concurrentRegionYes
Integration timeout29 seconds (REST default), 30 seconds (HTTP)Per-integrationREST: Yes (up to 300 seconds via Service Quotas) / HTTP: NoSince November 2024, REST API integration timeout is configurable up to 300 seconds (5 minutes) via a Service Quotas request. HTTP APIs remain capped at 30 seconds. For longer work, use async patterns or Step Functions.
Payload size10 MBPer-requestNoBoth directions.
Resources per REST API300Per-APIYes
Routes per HTTP API300Per-APIYes
Stages per API10Per-APIYes
Usage plans per account300AccountYes

AWS Global Accelerator
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Accelerators per account20AccountYes
Listeners per accelerator10Per-acceleratorYes
Endpoint groups per listener10Per-listenerYesOne per Region typically.
Endpoints per endpoint group10Per-groupYes
Custom routing acceleratorsAccount-specific (verify in console)AccountYes

4.6 AI / ML

Amazon Bedrock
* You can sort the table by clicking on the column name.


Bedrock quotas are model-, modality-, and region-specific, and AWS revises them more often than most services. Numbers below are illustrative defaults; always confirm against the Service Quotas Console for the exact model ID and region before sizing.

QuotaDefaultScopeAdjustableNotes
InvokeModel requests per minute (per model)Account-specific (verify in console)RegionYesPer-model RPM. Anthropic Claude models have separate quotas from Nova, Llama, Cohere, etc.
InvokeModel tokens per minute (per model)Account-specific (verify in console)RegionYesCross-region inference profiles aggregate across regions.
Provisioned model units per accountAccount-specific (verify in console)RegionYesRequired for deterministic throughput.
Custom models per accountAccount-specific (verify in console)RegionYesImported and fine-tuned models share quota in many regions.
Knowledge bases per accountAccount-specific (verify in console)RegionYesBedrock Knowledge Bases.
Agents per accountAccount-specific (verify in console)RegionYesBedrock Agents.
Guardrails per accountAccount-specific (verify in console)RegionYesBedrock Guardrails policies.
Maximum prompt sizeModel-dependentPer-requestNoClaude Sonnet 4.x: 200K context. Nova Pro: 300K context. Confirm per model.

Amazon SageMaker
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Endpoints per RegionAccount-specific (verify in console)RegionYesUsually around 100 to several hundred; varies by instance type.
Endpoint configurations per RegionAccount-specific (verify in console)RegionYes
Notebook instances per RegionAccount-specific (verify in console)RegionYesSageMaker AI Studio domains have a separate ceiling.
Training jobs concurrent per RegionAccount-specific (verify in console)RegionYesPer-instance-type concurrent training caps.
Model artifact size5 GBPer-modelNoFor built-in algorithms; bring-your-own can be larger via ECR.
Maximum batch transform job duration28 daysPer-jobNo

Amazon Comprehend
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
DetectEntities sync TPS20 TPSRegionYes
DetectSentiment sync TPS20 TPSRegionYes
Custom classification models per Region10RegionYes
Custom entity recognition models per Region10RegionYes
Document size (single request)5,000 bytesPer-requestNoUse async batch jobs for larger documents.

Amazon Textract
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
DetectDocumentText sync TPS10 TPSRegionYes
AnalyzeDocument sync TPS2 TPSRegionYesTables / Forms / Queries features have separate sub-quotas.
StartDocumentTextDetection async jobs600 concurrentRegionYes
Maximum document size (sync, image)10 MB / 5 MB (PDF)Per-requestNoPDF supported only async > 5 MB.
Maximum pages per async document3,000Per-documentNo

4.7 Integration

Amazon SNS
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Topics per account100,000RegionYes
Subscriptions per topic12.5 millionPer-topicNo
Subscriptions per account200 millionRegionNo
Message body size256 KBPer-messageNoSNS Extended Library uses S3 to send up to 2 GB by reference.
FIFO topic message throughput300 TPS without batching, 3,000 with batchingPer-topicNo
Standard topic message throughput~30,000 TPS (US, EU regions)RegionYesLower defaults in other regions.
Filter policies per subscription5Per-subscriptionNo

Amazon SQS
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Queues per account per Region1,000,000RegionNoEffectively unlimited.
Message body size256 KBPer-messageNoExtended Library to S3 for >256 KB up to 2 GB.
Visibility timeout0 to 12 hoursPer-messageNoDefault 30 seconds.
Message retention1 minute to 14 daysPer-queueNoDefault 4 days.
Delay seconds0 to 900 (15 min)Per-message or queueNo
Inflight messages (Standard queue)120,000Per-queueNoInflight = received but not deleted.
Inflight messages (FIFO queue)20,000Per-queueNo
Throughput (Standard queue)Effectively unlimitedPer-queueNoScales horizontally with traffic.
Throughput (FIFO queue, high-throughput mode)9,000 TPS (us-east-1, us-west-2, eu-west-1) without batchingPer-queueNoOther regions: 3,000 TPS. Batching multiplies by 10.

Amazon EventBridge
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Event buses per account100RegionYesIncludes default bus, partner buses, and custom buses.
Rules per event bus300Per-busYes
Targets per rule5Per-ruleNoHard ceiling.
Event size256 KBPer-eventNoJSON document including envelope.
PutEvents requests per second10,000 RPS (us-east-1, us-west-2, eu-west-1)RegionYesLower in other regions, often 400 to 2,400.
Throttled events (DLQ)ConfigurablePer-ruleNoFailed deliveries routed to SQS DLQ if configured.
Archive sizeUnlimitedPer-archivePricing applies per GB stored and replayed.
Schemas per registry1,000Per-registryYesEventBridge Schemas / Schema Registry.

Amazon Kinesis Data Streams
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Shards per Region (Provisioned mode)500 (us-east-1, us-west-2, eu-west-1) / 200 (other regions)RegionYesSoft. Aggregate across all streams in the Region.
Data Streams per Region50RegionYesIndependent of shard count.
PutRecord(s) throughput per shard1 MiB/s or 1,000 records/sPer-shardNoWhichever limit hits first. Reshard or switch to On-Demand to scale.
GetRecords throughput per shard2 MiB/s or 5 calls/sPer-shardNoEnhanced Fan-Out (EFO) gives each consumer a dedicated 2 MiB/s per shard.
Maximum record size1 MiBPer-recordNoAggregate small records with the KPL to amortize overhead.
Data retention24 hours (default), up to 365 daysPer-streamNoBeyond 7 days incurs extended retention pricing.
On-Demand mode write throughput200 MiB/s and 200,000 records/sPer-streamYesDefault cap; raise via support case for higher.
On-Demand mode read throughput400 MiB/sPer-streamNo2x the write throughput, sized for two consumer applications.
Consumers per stream (Enhanced Fan-Out)20Per-streamYesEach EFO consumer gets a dedicated 2 MiB/s per shard.

AWS Step Functions
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
State machines per account10,000RegionYes
Execution history events (Standard)25,000 eventsPer-executionNoHard ceiling. Loops or Map states can blow this quickly.
Maximum execution duration (Standard)1 yearPer-executionNo
Maximum execution duration (Express)5 minutesPer-executionNo
State machine definition size1 MBPer-state-machineNoASL JSON document hard cap.
Maximum input / output size262,144 bytes (256 KB)Per-stateNoUse S3 references for larger payloads.
StartExecution requests (Standard)1,300 TPS / bucket size 5,000RegionYesExpress has higher TPS, ~100,000 RPS.
Map state max concurrency (Distributed Map)10,000Per-stateNoDistributed Map raised the cap from Inline Map's 40.

Amazon MWAA (Managed Workflows for Apache Airflow)
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Environments per Region5RegionYesOne environment per team / DAG-set is typical.
DAGs per environmentEffectively unlimited (memory-bound)Per-environmentNoWorker memory caps DAG count in practice.
Workers per environment1 to 25 (small) up to 1 to 50 (large)Per-environmentYes (env size)Environment class (mw1.small/medium/large) sets the ceiling.
Schedulers per environment2 to 5Per-environmentNoTied to env class.

4.8 Security

AWS IAM
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Roles per account1,000AccountYesAdjustable up to 5,000.
Users per account5,000AccountNoFor long term, prefer IAM Identity Center (SSO) over IAM users.
Groups per account300AccountNo
Customer-managed policies per account1,500AccountYes
Managed policies attached to a role10Per-roleYesAdjustable up to 20.
Managed policies attached to a user10Per-userYesAdjustable up to 20.
Inline policy size on a role10,240 charactersPer-roleNo
Inline policy size on a user2,048 charactersPer-userNo
Inline policy size on a group5,120 charactersPer-groupNo
Policy versions per managed policy5Per-policyNoOlder versions must be deleted before adding new.
Access keys per IAM user2Per-userNoFor key rotation, use both slots.
MFA devices per user8Per-userNoHardware + virtual + Passkeys combined.
Server certificates per account20AccountYesUse ACM for TLS certificates instead.

AWS KMS
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Customer managed keys (CMKs) per Region100,000RegionYesIncludes pending-deletion keys.
Aliases per CMK50Per-CMKNo
Grants per CMK50,000Per-CMKNoCommon with AWS services like DynamoDB / Lambda.
API request rate — symmetric Encrypt / Decrypt / GenerateDataKey (AES-256 / SYMMETRIC_DEFAULT)30,000 RPS (us-east-1, us-west-2, eu-west-1)RegionYes5,500 to 15,000 RPS in other regions.
API request rate — asymmetric Encrypt / Decrypt / Sign500 to 1,000 RPSRegionYesRSA / ECC keys are CPU-intensive.
Encrypt input size (symmetric)4,096 bytes (4 KB)Per-callNoFor larger data, encrypt with a data key locally.
Encrypt input size (asymmetric)~190 to 470 bytesPer-callNoDepends on key spec (RSA-2048 / RSA-3072 / RSA-4096).
Key policy size32 KBPer-CMKNo

AWS WAF
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Web ACLs per account per Region100RegionYesCloudFront Web ACLs sit in us-east-1 (Global).
WCUs per Web ACL1,500Per-Web-ACLYesWeb ACL Capacity Units; each rule consumes WCUs.
WCUs per rule group1,500Per-rule-groupYesWAF v2 rule group capacity is WCU-bound. Per-rule WCU cost ranges from 1 to several dozen depending on match type and text transformations. Request a WCU increase via the Service Quotas Console.
Rule groups per Web ACL20Per-Web-ACLNo
IP sets per account per Region100RegionYes
IP addresses per IP set10,000Per-IP-setYes
Maximum body inspection size (CloudFront)64 KBPer-requestNoFor ALB / API Gateway: 8 KB default, adjustable to 64 KB.

AWS Shield Advanced
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Protected resources per account1,000AccountYesShield Standard is free and automatic; Shield Advanced is paid.
Health-based detectionConfigurablePer-resourceRoute 53 health checks integrate.
Maximum custom mitigationAccount-specific (verify in console)AccountYesSRT (Shield Response Team) can apply ad-hoc rules.

AWS Secrets Manager
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Secrets per account per Region500,000RegionYes
Secret value size65,536 bytes (64 KB)Per-secretNoBoth ciphertext and plaintext.
GetSecretValue TPS10,000 RPSRegionYesUse a local cache for very high read volume.
Versions per secret~100Per-secretNoOld versions deprecated after 24 hours unless labeled.

4.9 Observability

Amazon CloudWatch
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Custom metrics per accountEffectively unlimitedRegionNoCost scales with metric count and resolution.
Metric resolution (high resolution)1 secondPer-metricNo1-second metrics retained 3 hours, then aggregated.
Alarms per Region5,000RegionYes
Composite alarms per account500AccountYes
PutMetricData TPS150 TPSRegionYesSoft. Use batch (up to 1,000 metrics per call) to reduce TPS.
Log groups per Region1,000,000RegionYes
Log group retention1 day to 10 years (or Never expire)Per-log-groupNoDefault "Never expire" — set retention to control cost.
Log streams per log groupEffectively unlimitedPer-log-groupNo
PutLogEvents per stream5 RPS (default)Per-streamYes
Maximum log event size256 KBPer-eventNo
Subscription filters per log group2Per-log-groupNoHard limit. For fan-out, route via a Lambda or Firehose subscriber that re-publishes.
Dashboards per account500RegionYesEach dashboard up to 500 widgets.

AWS X-Ray
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Traces per secondDefault sampling 1 req/sec + 5%Per-appYesSampling rules configurable.
Trace document size500 KBPer-traceNo
Segment document size64 KBPer-segmentNo
Trace retention30 daysRegionNoHard.
Groups per account per Region25RegionYes
Insights enabled per groupConfigurablePer-groupX-Ray Insights detects anomalies in traffic patterns.

4.10 Management

AWS CloudFormation
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Stacks per Region2,000RegionYes
Resources per stack500Per-stackNoWas 200 for years; raised to 500. Use nested stacks above this.
Parameters per stack200Per-stackNo
Outputs per stack200Per-stackNo
Mappings per template200Per-templateNo
Template body size (direct)51,200 bytes (50 KB)Per-templateNoAPI call limit.
Template body size (via S3)1 MBPer-templateNoReference template via TemplateURL.
Number of stack sets per administrator account100AccountYesStackSets fan out across Organization.
Operations per StackSet1,000 concurrentPer-StackSetYesThrottle by region / account.

AWS Organizations
* You can sort the table by clicking on the column name.


QuotaDefaultScopeAdjustableNotes
Member accounts per organization10OrganizationYesAdjustable to thousands via support case. Some customers run 5,000+ accounts.
OUs per organization1,000OrganizationNo
OU nesting depth5 levelsOrganizationNoPlus Root = 6 total.
Service Control Policies (SCPs) per organization1,000OrganizationNo
SCPs attached to a single entity (account / OU / root)5Per-entityNo
SCP document size5,120 charactersPer-SCPNoJSON document hard cap.
Backup policies per organization10OrganizationNoOrganization-level Backup policies.
Tag policies per organization10OrganizationNo

5. Quotas Most Frequently Hit in Production

The list below is a ranked digest of quotas that, in my experience and from customer postmortems, are the ones engineering teams most often hit during launches, traffic spikes, and incidents. Mark these and pre-check them before scaling events.

* You can sort the table by clicking on the column name.


RankServiceQuotaDefaultWhy it bites
1EC2Running On-Demand Standard vCPUs5 vCPUs (new accounts)Surprise during scale-out tests. New accounts often blocked at <100 vCPUs.
2LambdaConcurrent executions1,000Burst events (S3 event, EventBridge fan-out) saturate before steady-state warms.
3DynamoDBPartition throughput (1,000 WCU / 3,000 RCU)Per partitionHot partition exhausts even when table-level is far from cap.
4S3PUT/POST/COPY/DELETE rate per prefix3,500 RPSSequential keys serialize to one prefix; rehash strategy fixes.
5API GatewayIntegration timeout29 sec (REST default, up to 300 sec) / 30 sec (HTTP, hard)Long-running backend. REST adjustable via Service Quotas; HTTP remains hard.
6API GatewaySteady-state RPS10,000 RPSAccount-level, shared across all APIs.
7KMSSymmetric Decrypt RPS30,000 RPS (us-east-1) / 5,500 to 15,000 (others)Per-region throttle hits at burst (sign-in storms, batch decrypt).
8EventBridgePutEvents RPS10,000 (us-east-1, us-west-2, eu-west-1)Lower in other regions (400 to 2,400).
9Step FunctionsExecution history events (Standard)25,000 eventsLoops or large Maps blow the cap.
10CloudWatchPutMetricData TPS150 TPSEMF logs at high cardinality saturate.
11CloudFormationResources per stack500Hard; forces nested stacks.
12EC2Elastic IPs per Region5Multi-AZ NAT + reservations exhaust quickly.
13VPCVPCs per Region5Multi-environment in one account hits this.
14SQSInflight messages (Standard queue)120,000Slow consumer causes inflight to stack.
15RDSDB instances per Region40Microservice-per-DB pattern hits early.
16LambdaBurst concurrency500 to 3,000 concurrent executions (region-dependent)Cold start storm at deploy or after dormant period. Post-burst scaling now +1,000 concurrent per 10 seconds, per function.
17CloudFrontLambda@Edge function size (viewer)1 MBPulled in many SDKs cross the size cap.
18IAMManaged policies attached to a role10 (adjustable to 20)Permission-set sprawl hits quickly.
19OrganizationsSCPs attached to a single entity5Layered guardrails approach 5 quickly.
20BedrockInvokeModel RPM per modelVaries by model / regionProduction launches blocked until per-model quota raised. Plan 7+ days lead time.

A common pattern across these quotas: the issue is rarely "AWS can't scale here" — it is "AWS does not auto-scale this quota for you and you forgot to ask in advance." For the adjustable rows, set Service Quotas Console utilization alarms at 80%.

6. How to Request Quota Increases

The mechanics of getting an adjustable quota raised changed substantially in 2019 with the introduction of the unified Service Quotas service and again in subsequent years with finer-grained per-service quota lists. As of 2026-05, three paths exist.

6.1 Path 1 — Service Quotas Console (most quotas)

The fastest path for most adjustable quotas:

  1. Open the Service Quotas Console for the target Region.
  2. Search for the quota by name or service.
  3. Click Request quota increase, enter the new value, and submit.
  4. Many quotas auto-approve within minutes (especially EC2 vCPU for established accounts and low-multiple increases).
  5. Larger increases (10x or more) escalate to a human reviewer; SLA typically 1 to 5 business days.

You can also do this via API: aws service-quotas request-service-quota-increase --service-code <code> --quota-code <code> --desired-value <n>.

6.2 Path 2 — Support case (legacy quotas, complex changes)

Some service-specific limits still live in legacy support categories rather than Service Quotas. SES sending limit, Route 53 domain registration limit, and certain regional EC2 capacity pools fall in this group.

  1. Open Support Center → Create case.
  2. Choose Service limit increase, then the service and Region.
  3. Provide workload description, expected steady-state, and peak.
  4. SLA depends on support plan tier. Business and Enterprise plans typically respond within hours; Basic / Developer support may take days.

6.3 Path 3 — Account team escalation (large or time-sensitive)

For launches that need quota increases in multiple services simultaneously, large multi-account fleets, or quotas near the regional cap, work with your AWS Technical Account Manager (TAM) or Solutions Architect (SA):

  • Submit a structured launch plan: services, current quotas, desired quotas, expected go-live date.
  • Account teams can coordinate parallel approvals across teams (EC2 capacity, KMS, Bedrock, Lambda concurrency).
  • Lead time: plan 7 to 14 business days for cross-service launches. Bedrock model quotas in particular benefit from this path.

6.4 Strategies before requesting

Before filing a request, exhaust the architectural options:

  • Shard across accounts (Organizations + Account Vending) to multiply region-level quotas.
  • Shard across Regions (multi-region active-active) to multiply Region-level quotas.
  • Use reserved concurrency (Lambda) to carve a guaranteed slice without raising the account ceiling.
  • Use Provisioned Throughput (Bedrock, DynamoDB) where deterministic capacity is a tighter constraint than per-region quotas.
  • Use Service-Linked Roles and managed policies instead of large inline policies to avoid per-role IAM policy size caps.
  • Move config out of environment variables (Lambda 4 KB cap) into Parameter Store or Secrets Manager.

When you do file, always include expected steady-state RPS, expected peak RPS, peak duration, and go-live date. AWS reviewers approve faster when the numbers are concrete than when the request reads "we might grow."

6.5 Quotas you can NOT raise

Be realistic: roughly one-third of the quotas in this article are hard. Filing a support case for these wastes everyone's time. The hard quotas you should design around — never raise — include:

  • DynamoDB item size 400 KB
  • S3 object size 5 TB
  • SQS / SNS message size 256 KB
  • Lambda function timeout 900 seconds
  • Lambda payload (sync) 6 MB
  • Step Functions execution history 25,000 events
  • API Gateway HTTP API integration timeout 30 seconds (REST is now adjustable up to 300 seconds)
  • CloudFormation resources per stack 500
  • IAM inline policy size (per role) 10,240 characters
  • KMS symmetric Encrypt input 4 KB
  • Aurora cluster readers 15

For these, the design pattern is to compose smaller calls (Step Functions Map, Lambda chaining, multipart upload) or to externalize state to S3.

7. Frequently Asked Questions

7.1 Where are quotas different between Regions?

Compute and rate quotas are usually larger in us-east-1, us-west-2, and eu-west-1 than elsewhere. Examples:

  • KMS symmetric Decrypt: 30,000 RPS in those three Regions; 5,500 to 15,000 RPS elsewhere.
  • SNS Publish: ~30,000 TPS in those Regions; lower elsewhere.
  • EventBridge PutEvents: 10,000 RPS in those Regions; 400 to 2,400 RPS elsewhere.
  • SQS FIFO high-throughput: 9,000 TPS in those Regions; 3,000 TPS elsewhere.

When you deploy in less-trafficked Regions (e.g. ap-northeast-1 for some quotas, eu-central-1, ap-south-1), check Region-specific defaults before sizing.

7.2 Are quotas the same in all accounts of an Organization?

No. Each member account has its own per-Region quotas, defaulted to the new-account values. Establishing a new account does not inherit increases from the management account or sibling accounts. Use AWS Control Tower account-baseline automation or service-quotas request-service-quota-increase in account-vending pipelines.

7.3 Are quotas the same for AWS Free Tier accounts?

Yes — the quota architecture is identical; Free Tier only affects pricing. New accounts (free or paid) share the same conservative defaults.

7.4 How do I check my account's current quota for a specific service?

Programmatically:

aws service-quotas list-service-quotas --service-code <code>
aws service-quotas get-service-quota --service-code <code> --quota-code <code>
aws service-quotas list-aws-default-service-quotas --service-code <code>
Console: Service Quotas Console → choose service → filter by quota name.

Note that some quotas are not yet onboarded to Service Quotas API. For those, use the Trusted Advisor "Service Limits" check (Business / Enterprise support) or the service-specific console.

7.5 How quickly do quota changes take effect?

  • Auto-approved increases: minutes.
  • Reviewed increases (Service Quotas): 1 to 5 business days.
  • Cross-service launch increases (TAM-coordinated): 7 to 14 business days.
  • Hard quotas: never.

7.6 Can I set alerts when I am approaching a quota?

Yes. Service Quotas integrates with CloudWatch — for each adjustable quota that supports monitoring, you can enable CloudWatch usage metrics and create alarms (e.g. alarm at 80% of quota). Combine with EventBridge to notify an SRE channel.

Programmatic check pattern:

aws service-quotas get-service-quota \
  --service-code lambda \
  --quota-code L-B99A9384 \
  --query 'Quota.Value'
then compare against GetMetricStatistics for the relevant CloudWatch usage metric (AWS/Usage namespace).

7.7 Why is the Bedrock model RPM quota so low?

Bedrock model quotas are intentionally conservative because the underlying inference capacity is shared and pricing is per-token. AWS sizes these quotas based on aggregate region capacity and revises them often. For production workloads, you will almost certainly need to file an increase request 7+ days before launch. Cross-region inference profiles (model IDs prefixed with us. or eu., such as us.anthropic.claude-sonnet-4-6-...-v1:0) aggregate quotas across multiple Regions and are the preferred high-throughput path.

7.8 What is the difference between a quota and a throttle?

  • A quota is a ceiling enforced over a region / account / resource over a period (snapshot or aggregate). Going over yields LimitExceededException, ServiceQuotaExceededException, or service-specific error codes.
  • A throttle is a short-term rate limit, often token-bucket based, that returns ThrottlingException / Rate exceeded and is generally retryable with backoff.

Many adjustable quotas are technically throttle policies (KMS request rates, EventBridge RPS, API Gateway RPS). The distinction matters for retry strategy: throttles are transient and should be retried with exponential backoff and jitter; hard quota violations should fail closed and alert.

7.9 Do quotas apply differently to AWS-internal callers (e.g. EventBridge → Lambda)?

Yes, in some cases. EventBridge invocations to Lambda do count toward Lambda's concurrent execution limit but use a separate retry / DLQ mechanism. S3 events to Lambda likewise count. However, AWS services calling each other (e.g. CloudWatch Logs → Lambda for subscription filters) often use service-linked roles and may have their own internal back-pressure independent of customer quotas.

7.10 What changes the most often?

In my observation, in roughly this order:

  1. Bedrock model RPM / TPM (revised multiple times per year as inference capacity scales).
  2. EC2 new-account vCPU pools (AWS adjusts conservative defaults as fraud / abuse models improve).
  3. Lambda burst concurrency (Region-specific, occasionally revised upward).
  4. DynamoDB on-demand initial throughput (revised upward over time).
  5. CloudFormation resources per stack (was 200, raised to 500).
  6. DynamoDB GSIs per table (was 5, raised to 20).
  7. S3 buckets per account (was 100, raised significantly).

Treat any quota value older than 12 months in a third-party blog with suspicion — including this one. The Service Quotas Console is always authoritative.

8. Summary

This article condensed the AWS Service Quotas surface area into a single-page reference for the major AWS services that production engineers touch most. The principles to internalize:

  1. Default ≠ maximum. Most adjustable quotas can be raised; "we hit a limit" usually means "we did not ask in advance."
  2. Per-prefix, per-partition, per-tunnel quotas hide inside per-region quotas. Look one level deeper than the headline number.
  3. Hard quotas are design constraints. DynamoDB 400 KB, Lambda 900 sec, SQS 256 KB, Step Functions 25,000 events, CloudFormation 500 resources — design around them.
  4. Regions are not equal. us-east-1, us-west-2, eu-west-1 carry higher per-Region rate quotas than other Regions.
  5. Lead time matters. 7 to 14 business days for cross-service launches; the time to file the case is the day you start the project, not the day before launch.

The Service Quotas Console is the source of truth. This article is the index.

9. References

Related Articles on hidekazu-konishi.com


References:
Tech Blog with curated related content

Written by Hidekazu Konishi