hidekazu-konishi.com

AWS History and Timeline regarding Amazon Simple Queue Service - Overview, Functions, Features, Summary of Updates, and Introduction to SQS

First Published: 2024-05-14
Last Updated: 2024-05-14

This is the seventh installment in the series that I started with the "AWS History and Timeline - Almost All AWS Services List, Announcements, General Availability(GA)", where I extract features from the history and timeline of AWS services (I've previously written about Amazon S3, AWS Systems Manager, Amazon Route 53, Amazon EventBridge, AWS KMS).

This time, I have created a historical timeline for Amazon Simple Queue Service (Amazon SQS), which was first announced as an AWS infrastructure service in November 2004 and provides a fully managed message queuing service.
As Amazon SQS will celebrate its 20th anniversary since the announcement in November 2024, I wrote this article as an early celebration, although it is quite premature.
Just like before, I am summarizing the main features while following the birth of Amazon SQS and tracking its feature additions and updates as a "Current Overview, Functions, Features of Amazon SQS".
I hope these will provide clues as to what has remained the same and what has changed, in addition to the features and concepts of each AWS service.

Background and Method of Creating Amazon SQS Historical Timeline

The reason for creating a historical timeline of Amazon SQS this time is none other than that Amazon SQS will celebrate its 20th anniversary since the announcement in 2024.
Amazon SQS was the first AWS service announced as an AWS infrastructure service in November 2004 (*1).
Therefore, 2024 is also a milestone year marking the 20th anniversary of the announcement of AWS infrastructure services.

Another reason is that since Amazon SQS emerged as the first AWS infrastructure service in November 2004, it has continued to be used in distributed systems, microservices, serverless applications, etc., even as IT trends have changed, providing a fully managed message queuing service. Therefore, I wanted to organize the information of Amazon SQS with the following approaches.
  • Tracking the history of Amazon SQS and organizing the transition of updates
  • Summarizing the feature list and characteristics of Amazon SQS
This timeline primarily references the following blogs and document history regarding Amazon SQS.
There may be slight variations in the dates on the timeline due to differences in the timing of announcements or article postings in the references used.
The content posted is limited to major features related to the current Amazon SQS and necessary for the feature list and overview description.
In other words, please note that the items on this timeline are not all updates to Amazon SQS features, but are representative updates that I have picked out.
*1) Refer to Introducing the Amazon Simple Queue Service.
Apart from infrastructure services, a service called "Alexa Web Information Service (AWIS)" was listed in "What's New | 2004" prior to Amazon SQS on 2004-10-04.
Also, the first service to reach General Availability (GA) was Amazon Simple Storage Service (Amazon S3), which was announced and became GA on 2006-03-14.

Amazon SQS Historical Timeline (Updates from November 3, 2004)

Now, here is a timeline related to the functions of Amazon SQS. As of the time of writing this article, the history of Amazon SQS spans about 19 years and 6 months, with its 20th anniversary coming in November 2024.

* The table can be sorted by clicking on the column names.
Date Summary
2004-11-03 Amazon Simple Queue Service (Amazon SQS) is announced.
2006-07-11 Amazon Simple Queue Service (Amazon SQS) becomes generally available (GA).
2011-10-06 Support for Amazon SQS is added to the AWS Management Console.
2011-10-21 Support for Delay Queues, Message Timers, and Batch APIs is introduced.
2012-11-05 Announcement of Amazon SQS API version 2012-11-05.
2012-11-08 Support for long polling is introduced.
2012-11-21 Ability to subscribe Amazon SQS queues to Amazon SNS topics using AWS Management Console for Amazon SQS.
2013-06-18 Maximum payload size is increased from 64KB to 256KB.
2014-01-29 Support for dead-letter queues is added.
2014-05-06 Support for message attributes is added.
2014-07-16 AWS CloudTrail now logs API actions for Amazon SQS.
2014-12-08 Ability to delete messages in a queue using the PurgeQueue API action is added.
2014-12-29 Amazon SQS Java Messaging Library for JMS becomes available.
2015-10-27 Amazon SQS Extended Client Library for Java, allowing to send and receive messages with payloads up to 2GB using Amazon S3, becomes available.
2016-03-30 Amazon CloudWatch Events now supports Amazon SQS queues as event targets.
2016-08-31 ApproximateAgeOfOldestMessage CloudWatch metric becomes available for monitoring the elapsed time of the oldest message in a queue.
2016-11-17 FIFO (First-In-First-Out) queues become available, and existing queues are now named Standard queues.
2017-04-24 Amazon SQS Extended Client Library for Java and Amazon SQS Java Messaging Library for JMS begin supporting FIFO queues.
2017-04-28 Support for server-side encryption (SSE) is introduced.
2017-05-01 Amazon SQS becomes a HIPAA eligible service.
2017-05-19 Amazon SQS Extended Client Library for Java can now be used in conjunction with Amazon SQS Java Messaging Library for JMS.
2017-10-19 Tags become usable for Amazon SQS queues, allowing for the tracking of cost allocations using cost allocation tags.
2017-12-07 All API actions except ListQueues now support resource-level permissions.
2018-04-10 Amazon CloudWatch Events now supports Amazon SQS FIFO queues as event targets.
2018-06-28 AWS Lambda now supports Amazon SQS Standard queues as event sources.
2018-12-13 Support for AWS PrivateLink using Amazon VPC Endpoints is introduced, allowing private connections to AWS services supporting VPCs.
2019-04-04 VPC endpoint policy support is introduced.
2019-07-25 Temporary queue client becomes available.
2019-08-22 Support for Tag-on-create during queue creation is introduced. AWS IAM keys 'aws:TagKeys' and 'aws:RequestTag' can be specified.
2019-08-28 Support for troubleshooting queues using AWS X-Ray is introduced.
2019-11-19 AWS Lambda now supports Amazon SQS FIFO queues as event sources.
2019-12-11 One-minute Amazon CloudWatch metrics become available.
2020-06-22 Pagination support for the ListQueues and ListDeadLetterSourceQueues APIs is introduced, allowing for the specification of the maximum number of results returned from a request.
2020-12-17 High throughput mode for messages in FIFO queues is preview released.
2021-05-27 High throughput mode for messages in FIFO queues becomes generally available (GA).
2021-11-23 Managed server-side encryption (SSE-SQS) using Amazon SQS-owned encryption keys becomes available.
2021-12-01 Support for redrive in Standard queues' dead-letter queues is introduced.
2022-10-04 Support for default enabling of server-side encryption (SSE-SQS) using Amazon SQS-managed encryption keys is introduced.
2022-11-17 Attribute-Based Access Control (ABAC) is introduced.
2023-07-27 API requests using the AWS JSON protocol are preview released in Amazon SQS.
2023-11-27 Support for redrive in FIFO queues' dead-letter queues is introduced.
2024-02-06 Amazon SQS Extended Client Library for Python, allowing to send and receive messages with payloads up to 2GB using Amazon S3, becomes available.

Current Overview, Functions, Features of Amazon SQS

Here, I will explain in detail the main features of the current Amazon SQS.
Amazon Simple Queue Service (Amazon SQS) is a highly reliable and scalable fully managed messaging queuing service for microservices, distributed systems, and serverless applications.

Amazon SQS provides a messaging service with high reliability and durability, enabling asynchronous processing and efficient message exchange between applications.
Furthermore, its auto-scaling and data redundancy capabilities accommodate system load fluctuations and prevent message loss.
Moreover, its pay-as-you-go pricing model delivers a cost-effective solution, and high security is achieved through access management with AWS Identity and Access Management (IAM) and data encryption in transit and at rest.
In addition to these features, easy integration with other AWS services allows for rapid development and flexible operation of applications.

Use Cases for Amazon SQS

Amazon SQS is utilized in various scenarios, designed to enhance system fault tolerance and scalability.
The main use cases for the message queuing service provided by Amazon SQS include:
  • Asynchronous communication
    Transfers messages asynchronously between applications, allowing system components to scale and process independently.
  • Load leveling
    Accommodates peak-time loads and sudden traffic spikes by holding messages in the queue, preventing overload in subsequent systems.
  • Integration of distributed systems Exchanges messages between different system components within a distributed architecture.

Specific Examples of Use Cases

For example, here are some specific use case scenarios:
  • E-commerce order system
    Transmits order information received from a web application to SQS, and the order processing service retrieves and processes messages from the queue sequentially. This ensures that order data is processed securely without loss, even if customers place orders simultaneously, minimizing the risk of the order processing system going down.
  • Communication between microservices
    When adopting a microservices architecture, message exchanges between services can be conducted using SQS. Using a message queue helps maintain loose coupling between services, allowing the overall system to operate more resiliently.
  • Big data applications
    Collects data through SQS from data aggregation points and delivers it for batch or stream processing. This enables the processing of data at various speeds.
As such, Amazon SQS continues to provide a robust and scalable messaging queuing service for various use cases, enabling flexible application design tailored to business needs.

Amazon SQS Standard Queues

Amazon SQS Standard Queues are robust managed message queuing services that provide mechanisms for asynchronously sending and receiving messages between distributed systems.
The main features of standard queues include:
  • Unlimited transactions
    Standard queues support an unlimited number of transactions, with no upper limit on the number of calls to API actions such as SendMessage, ReceiveMessage, and DeleteMessage. This is suitable for applications requiring high processing volume or throughput.
  • At least once delivery guarantee
    Standard queues guarantee that messages will be delivered at least once. However, it is important to note that standard queues may deliver messages multiple times.
  • Best effort order maintenance
    In most cases, messages in standard queues are received in the order they were sent. However, standard queues do not guarantee strict ordering of messages, and the order may occasionally be switched, so it is recommended to use FIFO queues when complete order is required.
Thus, standard queues can be effectively utilized in various applications seeking messaging solutions with durability and scalability. However, it is assumed that the application can accommodate duplicate message deliveries and order changes.

Amazon SQS FIFO Queues

Amazon SQS FIFO Queues (FIFO queues) are queue services designed for specific use cases that provide strict message ordering and eliminate duplicate messages.
FIFO queues ensure that messages are retrieved from the queue in the order they were sent, thanks to the order maintenance feature, and guarantee "exactly once delivery" through the deduplication feature.

Strict Message Ordering (Order Maintenance Feature)

In FIFO queues, the order of messages sent is strictly maintained. Messages are processed based on the first-in-first-out principle, and their order is not changed within the same message group.

Exactly Once Processing (Deduplication Feature)

Unlike standard queues, FIFO queues minimize duplicate messages. By default, FIFO queues are set with a five-minute deduplication window, during which a message with the same Message Deduplication ID sent again within this period is not added to the queue more than once. There are two methods for deduplication:
  • Automatic generation of Message Deduplication ID
    When content-based deduplication is enabled, Amazon SQS automatically generates a unique message deduplication ID using a SHA-256 hash of the message content. This ID is generated from the message body content, and message attributes are not considered.
  • Explicit specification of Message Deduplication ID
    Developers can specify their own message deduplication ID when sending a message. If a message with the same ID is sent within the same deduplication window, the corresponding message will be added to the queue only once.
By utilizing these features, FIFO queues achieve "exactly once processing," ensuring that messages are processed without duplication or loss.

Transition from Standard Queues to FIFO Queues

When existing applications using standard queues require the strict order and exactly once processing features, an appropriate transition to FIFO queues is necessary. This transition is often used when aiming to ensure strict message order maintenance and deduplication in practice.

However, direct conversion from standard queues to FIFO queues is not possible, and the transition is made using one of the following procedures:
  • Delete the existing standard queue and create a new FIFO queue
  • Simply add a new FIFO queue and retain the existing standard queue
Either method requires appropriate configuration changes to fully utilize the characteristics of FIFO queues and maintain order and processing consistency. Through this process, applications can accommodate more advanced messaging requirements.

Summary of transition process and points to consider when transitioning from standard queues to FIFO queues:
Transition Process
  1. Queue recreation
    Direct conversion from a standard queue to a FIFO queue is not possible. You must either create a new FIFO queue or delete the existing standard queue and recreate it as a FIFO queue.
  2. Application adjustment
    Changes to the application code are required to utilize the unique features of FIFO queues (such as message group ID and deduplication ID).
Points to Consider
  • Using high throughput mode
    Enabling high throughput mode for FIFO queues maximizes message sending throughput. However, this mode has specific limitations, so please check the details in the AWS documentation before use.
  • Setting delays
    In FIFO queues, delays can be set for the entire queue rather than individual messages. If individual message delays are necessary, corresponding adjustments on the application side are required.
  • Using message groups
    Grouping similar or related tasks allows messages to maintain their order while being processed in parallel by multiple processors. Proper design of message group IDs is important for effective scaling.
  • Deduplication strategy
    Using deduplication IDs helps prevent the same message from being processed multiple times. Additionally, content-based deduplication functionality can also be enabled.
  • Visibility timeout management
    If message processing takes a long time and an extension of the visibility timeout is necessary, consider adding a receive request attempt ID to each ReceiveMessage action to manage retry attempts for receive requests.
By making these changes appropriately, applications can fully utilize the characteristics of FIFO queues and achieve more reliable and efficient message processing.

High Throughput for FIFO Queues

High throughput for FIFO queues provides support for increasing the number of requests per second per API.
Enabling high throughput for FIFO queues allows you to increase the number of requests per second per API.
To increase the number of requests for FIFO queues, you can increase the number of message groups used.

Amazon SQS stores FIFO queue data in partitions.
Partitions, which are automatically replicated across multiple availability zones within an AWS region, are the storage allocated for the queue.
Users do not manage partitions; Amazon SQS handles partition management.

High throughput can be enabled for new FIFO queues or existing FIFO queues.
The following three options are provided when creating or editing FIFO queues:
  • Enable high throughput FIFO This allows higher throughput for messages in the current FIFO queue.
  • Deduplication range Specifies whether deduplication occurs across the entire queue or on a per-message group basis.
  • FIFO throughput limit Specifies whether the throughput limit for messages in the FIFO queue is set for the entire queue or on a per-message group basis.
To enable high throughput for FIFO queues, select the option to enable high throughput FIFO when creating or editing the queue. This automatically makes the following settings:
  • The deduplication range is set to message group
  • The FIFO throughput limit is set per message group ID
When high throughput FIFO is enabled, the deduplication range and FIFO throughput limit are automatically set on a per-message group basis.
If changes are necessary for these setting requirements, normal throughput applies to the queue, and deduplication occurs as specified.

After creating or editing the queue, you can send, receive, and delete messages at a high transaction rate.
In high throughput mode for FIFO queues, up to 3,000 messages per second can be sent and received per message group ID.
However, this limit is applied per message group ID, so increasing the number of message groups can enhance the overall throughput of the queue.

Main Features of Amazon SQS

Here, I will describe the main features available for both Amazon SQS standard queues and FIFO queues.

Short Polling and Long Polling

Amazon SQS offers two message reception methods: short polling and long polling.
Short polling is the default setting and is effective in environments where messages arrive frequently, but long polling helps reduce costs and improve efficiency in low message arrival environments.
Short Polling
During short polling, ReceiveMessage requests are made to a subset of the queue (a randomly selected group of servers), and available messages are immediately returned.
Even if no messages exist, SQS returns a response immediately.
Therefore, if there are few messages in the queue, you may retrieve messages in successive requests, but one request may not capture all messages.
Repeated requests can evenly receive messages from the queue, but repeated requests when no messages are present incur costs and increase traffic.
Long Polling
Long polling involves waiting for the time specified by the ReceiveMessageWaitTimeSeconds parameter, querying all SQS servers during this time.
SQS holds off on the response until at least one message is found or the specified wait time has expired.
If new messages arrive, they are immediately returned as a response, and the system continues to wait for more messages up to the maximum number of messages until the wait time expires.
If no messages are found when the wait time ends, an empty response is returned.
Long polling reduces unnecessary requests and minimizes the occurrence of false empty responses (cases where messages are available but not included in the response), improving cost efficiency.
Long polling is enabled by setting ReceiveMessageWaitTimeSeconds to a value greater than zero, with a maximum of 20 seconds of waiting allowed.
Since no additional traffic occurs during the wait time, cost performance is improved.
This allows for efficient message processing when the frequency of message arrival is low or when the application has sufficient processing capacity.

Dead-letter Queues

Dead-letter queues are used to isolate messages that fail to process, aiding in debugging and issue analysis.
A message is automatically moved to a dead-letter queue after exceeding a specified number of receive attempts (maxReceiveCount). This feature is crucial for isolating problematic messages and facilitating cause analysis.

Dead-letter queues are not created automatically. You must create a queue to use as a dead-letter queue.
Additionally, a FIFO queue's dead-letter queue must also be a FIFO queue, and similarly, a standard queue's must be a standard queue.
Dead-letter queues and other queues must exist within the same AWS account and region.

maxReceiveCount specifies how many times a message is received from the source queue before it is moved to the dead-letter queue.
Setting a lower value moves messages to the dead-letter queue after fewer receive attempts, such as in cases of network errors or client dependency errors.
However, it is necessary to set the value of maxReceiveCount considering sufficient retry opportunities so the system can recover from errors.

Once the cause of a message's failure has been identified, or when messages are consumable again, a Redrive Policy can be used to move messages from the dead-letter queue back to the source queue.
Moving messages from the dead-letter queue incurs charges based on the number of API calls, billed according to Amazon SQS pricing.

Dead-letter queues are configured using the Amazon SQS console. First, you create a queue to use as the dead-letter queue.
Then, in the settings for the source queue, you specify the queue to use as the dead-letter queue and set the value of maxReceiveCount.
This ensures that messages that fail to process are automatically moved to the designated dead-letter queue.

Scenarios for using dead-letter queues include cases where message processing fails, and it is necessary to identify the problem's cause and take appropriate action.
For instance, when integration with an external service fails or when the message format is incorrect.

To move messages from a dead-letter queue, you use the Redrive Policy.
Redrive Policy is a setting used to move messages back to the original source queue from a dead-letter queue.
By setting a Redrive Policy, you can reprocess messages accumulated in the dead-letter queue.

For debugging and troubleshooting dead-letter queues, you can utilize CloudWatch metrics.
By monitoring metrics related to the dead-letter queue, you can understand the number of failed messages and the number of messages lingering in the dead-letter queue.
Based on this information, you can take appropriate action.

When setting up and using dead-letter queues, there are numerous implementation considerations and best practices, making proper design and operation crucial.
For example, setting the size of the dead-letter queue appropriately and regularly monitoring messages lingering in the dead-letter queue are recommended.
These considerations should be carefully addressed to ensure proper setup and operation.

Visibility Timeout

The "visibility timeout" is the period during which other consumers (applications that receive and process messages) cannot receive a message while it is being processed.
During this timeout period, Amazon SQS blocks the message from being received by other consumers.
The default visibility timeout is 30 seconds, but it can be set between 0 seconds and 12 hours.

The correct setting of the visibility timeout is crucial. The consumer must complete the processing and deletion of the message within this period.
If the timeout expires before processing is completed, the message may be received again by another consumer. This reduces the risk of the same message being processed multiple times.

However, even with the visibility timeout set, standard queues cannot completely prevent the duplicate reception of messages (due to the "at-least-once delivery" policy).
On the other hand, FIFO queues use the message deduplication ID and the receive request attempt ID, allowing producers and consumers to retry sending and receiving as needed.
In-flight Messages
Amazon SQS messages have three states:
  • State after being sent by the producer to the queue
  • State after being received from the queue by the consumer
  • State after being deleted from the queue
Messages that have been sent by the producer but not yet received are considered "stored" and there is no limit to their number.
Conversely, messages that have been received by the consumer but not deleted are in "in-flight" state, and there is a limit to their number.
The limit for in-flight messages is 120,000 for standard queues and 20,000 for FIFO queues.

You can set and change the visibility timeout using the AWS console.
You can change the default value for each queue or specify a specific timeout value when receiving a message.
If you need a processing time of more than 12 hours, consider using AWS Step Functions.
Best Practices
If the processing time for a message is uncertain, implement a heartbeat in the consumer process.
For example, set the initial visibility timeout to 2 minutes and continue to add 2 minutes of timeout every minute as long as the consumer is processing.
If you find that the timeout is insufficient after starting the process, you can specify a new timeout value using the ChangeMessageVisibility action to extend or shorten the visibility timeout.
Also, if you decide not to process, you can set the timeout to zero using the ChangeMessageVisibility action, making the message immediately visible to other components for reprocessing.

Delay Queues

Amazon SQS delay queues allow you to withhold messages from being delivered to consumers for a specified period.
This feature is useful, for example, when consumer applications need additional time to process messages.
Messages sent to a delay queue remain invisible to consumers within the queue until the specified delay period has elapsed.

The default (minimum) delay time is 0 seconds, and the maximum delay time is as follows:
  • For standard queues, the maximum delay time is 900 seconds (15 minutes)
  • For FIFO queues, the maximum delay time is 900 seconds (15 minutes)
Delay queues can be configured using AWS Console, AWS CLI, AWS SDK, or AWS CloudFormation.

In standard queues, changing the delay setting does not affect messages already in the queue.
The new setting applies only to messages added after the change. Similarly, in FIFO queues, changes only apply to new messages.

Delay queues are similar to visibility timeouts, but the main difference is:
  • Delay queues: messages are hidden immediately after being added to the queue
  • Visibility timeouts: messages are hidden after being retrieved from the queue
Instead of setting a uniform delay time for all messages, if you want to specify a delay time for individual messages, use message timers.
Using message timers, you can set a unique delay time for each message, up to a maximum of 900 seconds (15 minutes), which is the same maximum value for the queue-level DelaySeconds attribute.
If both the queue-level DelaySeconds attribute and the message timer DelaySeconds value are set, the message timer value takes precedence.
In other words, if a message timer is set, the queue's DelaySeconds value is ignored.
By using delay queues and message timers together, you can set a default delay time for the entire queue while specifying individual delay times for specific messages.

Temporary Queues

Amazon SQS's temporary queue feature saves development and deployment costs when using common messaging patterns such as request-response.
The "Temporary Queue Client" creates dynamic, on-demand temporary queues for specific processes, offering high throughput and cost efficiency under application control.

This client automatically maps multiple temporary queues created for a specific process to a single Amazon SQS queue.
This allows the application to reduce API calls and improve throughput, even if traffic to each temporary queue is low.
When a temporary queue is no longer needed, the client automatically deletes it.
This happens even if all processes using the client do not shut down properly.
Benefits of temporary queues include:
  • Functioning as lightweight communication channels dedicated to specific threads or processes
  • Capability to be created and deleted without incurring additional costs
  • API compatibility with non-temporary (regular) Amazon SQS queues (existing code can send and receive messages)
For example, on the server side, you could create a LoginServer class to process login requests from clients, launching a thread that polls the queue for login requests and calls the handleLoginRequest() method for each message.
Inside the handleLoginRequest() method, you would call the doLogin() method to perform the login process.
To ensure proper cleanup of queues, it is necessary to call the shutdown() method when the application no longer uses the temporary queue client.
Similarly, you can use the shutdown() method of the AmazonSQSRequester interface. This ensures that temporary queues are properly deleted, preventing resource wastage.

Message Timers

Amazon SQS message timers allow you to specify the initial invisibility period for each message added to the queue.
This means that even though a message has arrived in the queue, it will not be visible to the consumer until after the set time has passed.
For example, a message with a 45-second timer will be invisible to the consumer for the first 45 seconds after it reaches the queue.

The minimum (default) delay time is 0 seconds, but you can set a delay of up to 15 minutes.
Message timers can be set when sending messages using the AWS Management Console or AWS SDK.

However, message timers cannot be set for individual messages in FIFO (first-in, first-out) queues.
If you want to set a delay for messages in a FIFO queue, you need to use delay queues, which allow you to set a delay period for the entire queue.
Also, the message timer setting for an individual message takes precedence over the Amazon SQS delay queue's DelaySeconds value set.
This means that the message timer setting overrides the delay queue setting.

Attribute-Based Access Control (ABAC)

Amazon SQS allows you to manage access permissions finely using Attribute-Based Access Control (ABAC), which utilizes IAM policies based on tags assigned to users or AWS resources.
This enables you to allow authenticated IAM principals to edit policies or manage permissions without having to manage Amazon SQS queue access based on associated tags or aliases.

Using ABAC, you can scale permission management by setting IAM access permissions using tags added for different business roles.
This also saves the effort of updating policies every time new resources are added.
Additionally, you can create ABAC policies by tagging IAM principals, and design policies that allow Amazon SQS operations when the tags on the Amazon SQS queue match those on the IAM user role.

Benefits of using ABAC include reducing the number of different policies needed for different functions, thus reducing operational burdens.
It also enables rapid scaling of teams, with automatic permission grants when new resources are properly tagged.
Furthermore, you can use the permissions on IAM principals to restrict resource access and track which users are accessing resources through AWS CloudTrail.

Specific control with ABAC includes creating IAM policies that allow operations on Amazon SQS queues only under conditions where resource tags or request tags match specific keys and values.
You can also use the AWS Management Console or AWS CloudFormation to create IAM users with ABAC policies or Amazon SQS queues.

Proper use of ABAC enables more secure and flexible operation of Amazon SQS, particularly in large-scale environments, meeting organizational security requirements while managing resources efficiently.

Amazon SQS Best Practices

Best Practices Common to Standard Queues and FIFO Queues

To use Amazon SQS efficiently and reduce costs, the following best practices are recommended. These practices apply to both Standard Queues and FIFO Queues.
  • Efficient Message Processing
    Set the Visibility Timeout based on the time it takes to receive, process, and delete a message. The maximum value for Visibility Timeout is 12 hours, but it may not be possible to set it fully for each message. Also, if the message processing time is unknown, consider setting an initial Visibility Timeout and then extending it periodically during work.
  • Handling Request Errors
    If you're using the AWS SDK, automatic retries and backoff logic are available. If you're not using the SDK, make sure to wait a certain amount of time before retrying the ReceiveMessage action. It's desirable to back off exponentially; for example, wait 1 second before the first retry, 2 seconds before the second, and 4 seconds before the third.
  • Setting Long Polling
    Use long polling to avoid unnecessary polling and reduce costs. The WaitTimeSeconds parameter can be set up to a maximum of 20 seconds. Make sure that the HTTP response timeout is longer than this parameter. For example, if WaitTimeSeconds is set to 20 seconds, set the HTTP response timeout to at least 30 seconds.
  • Capturing Problematic Messages
    Set up to move messages that are difficult to process to the Dead-letter Queue and obtain accurate CloudWatch metrics. The Dead-letter Queue is where messages that fail to process are moved. This allows you to isolate problematic messages without disrupting the processing of the main queue and investigate the causes of errors.
  • Dead-letter Queue Retention Period Settings
    It's best practice to set the retention period of the Dead-letter Queue longer than that of the original queue. Also, in standard queues, the expiry of messages is always based on the original enqueue time, but in FIFO queues, the timestamp is reset when moved to the Dead-letter Queue (the expiry of dead-lettered messages starts from the point of dead-lettering). This allows you to investigate and address messages that have moved to the Dead-letter Queue over an adequate period.
  • Avoiding Inconsistent Message Processing
    A common issue in distributed systems is that messages are marked as delivered but not received by the consumer. Therefore, it is not recommended to set the maximum receive count to 1 for the Dead-letter Queue. Setting the maximum receive count to 2 or more reduces the risk of messages being lost due to temporary errors.
  • Implementing a Request-Response System
    When implementing an RPC (Remote Procedure Call) system or a request-response pattern, it is recommended to create a reply queue for each producer at startup and associate requests and responses using a correlation ID, instead of creating a reply queue for each message. This simplifies the management of reply queues and improves system performance.
  • Cost Reduction
    Batch processing of message actions and the use of the Buffered Asynchronous Client included in the Java AWS SDK can combine client-side buffering and request batching. This reduces the number of API requests and lowers costs.
  • Using the Appropriate Polling Mode
    Long Polling is recommended to reduce unnecessary ReceiveMessage actions on empty queues. Short Polling is useful when an immediate response is needed, but be aware that the same charges apply. Choosing the appropriate polling mode based on application requirements is crucial.

Best Practices for FIFO Queues

To make the most efficient use of Amazon Simple Queue Service (SQS) FIFO queues, it is essential to understand and properly apply several best practices. Below, I explain the optimal ways to use message deduplication IDs and message group IDs.
Using Message Deduplication IDs
  • Ensuring Uniqueness of Messages
    The message deduplication ID is a token that ensures the messages sent are not duplicates. A message sent with a particular deduplication ID that has been successfully sent will not be delivered if another message with the same ID is sent within the 5-minute deduplication interval. This prevents duplicate messages and ensures uniqueness.
  • Providing Message Deduplication IDs
    Producers need to specify a message deduplication ID for messages that have the exact same body, those that have the same content but different attributes, or those that SQS should consider as duplicates (including retry counts). This allows SQS to properly handle duplicate messages.
Handling Duplicates in a Single Producer/Single Consumer System
In systems with a single producer and a single consumer, enable content-based deduplication. In this case, producers can omit the message deduplication ID. Consumers do not need to provide a receive request attempt ID, but doing so is considered best practice. This allows for a simpler system design.
Designing for Outage Recovery
The deduplication process of FIFO queues is time-sensitive. When designing applications, it's essential that both producers and consumers can recover from client or network outages. Especially since SQS's deduplication interval is 5 minutes, producers must be aware that if an outage lasts more than 5 minutes, they need to resend messages with a new deduplication ID.
Using Message Group IDs
  • Implementing Message Grouping
    Messages with the same message group ID are processed sequentially in a strict order for that message group (order is not guaranteed between different message groups). By processing multiple ordered message groups alternately within the queue, multiple consumers can handle the tasks, but each user's session data is processed in FIFO order. This ensures the order of messages within the group while enabling parallel processing.
Handling Duplicates in a Multiple Producer/Multiple Consumer System
In systems prioritizing throughput and latency with multiple producers and consumers, it's recommended to generate a unique message group ID for each message. This ensures order is not guaranteed, but duplicates are avoided. This maintains system scalability while preventing duplicate messages.
Using Receive Request Attempt IDs
  • Duplicate Request Prevention
    During long-term network outages, if you experience connection issues between the SDK and Amazon SQS, provide a receive request attempt ID and use the same ID to retry if the SDK operation fails. This prevents the same message from being received multiple times due to network issues.
By properly applying the above best practices, you can maximize the performance and reliability of applications using Amazon SQS FIFO queues. It is crucial to select and implement the appropriate best practices based on the system's requirements.

Reference:
AWS Documentation (Amazon Simple Queue Service)

Summary

In this session, I created a historical timeline for Amazon SQS and explored the list and overview of Amazon SQS features.

Amazon Simple Queue Service (Amazon SQS), a message queuing service for distributed computing environments, was announced as the first AWS infrastructure service in 2004 and reached GA in 2006.
Initially announced in November 2004, the reaction at the time was "Huh? Why would Amazon do that?" (Reference: The AWS Blog: The First Five Years).
Almost 20 years later, Amazon SQS continues to provide an extremely important component of messaging and queuing for modern computing such as microservices architecture, distributed systems, and serverless computing, continually updating its features.
As evident from the current situation, the vision and design philosophy at the time of the initial announcement of Amazon SQS, which was difficult to understand, accurately captured the future needs and progression of system architecture.

Amazon SQS was already realizing the concepts of serverless, microservices, and fully managed services at that time by prioritizing use through APIs rather than GUI, making it a flexible IaaS that can be easily integrated with other systems in a loosely coupled manner and adapt to changes in use purposes.

I would like to continue watching the trends of what kind of features Amazon SQS will continue to provide in the future.

In addition, there is also a timeline of the entire AWS services including Amazon SQS, so please have a look if you are interested.

AWS History and Timeline - Almost All AWS Services List, Announcements, General Availability(GA)


Written by Hidekazu Konishi


Copyright © Hidekazu Konishi ( hidekazu-konishi.com ) All Rights Reserved.