Many articles talk about how to get AWS SQS deployed. This blog as well handles an overview on how to do a CloudFormation based deployed of an AWS SQS into the AWS Cloud.
However, this article rather is about the benefit and value Cloud native services can provide. Furthermore how they can ease operations and add operational stability – while AWS SQS is taken as an example.
What is AWS SQS?
AWS SQS stands for Amazon Simple Queue Service and provides a message queuing service, fully managed by AWS.
Similar to other queuing technologies it helps to design and build decoupled and event driven applications where a queue or broker receives messages from producers and provides those to consumers to process further.
Costs
Similar to other cloud native services only the consumed and used service would have to be paid for. Therefore it allows a very cost effective usage since no idle time on resources would have to be considered.
Services such like AWS SQS ideally would be deployed based on CloudFormation templates which allow a quick and re-usable pattern into a fully automated – for example event driven, on-demand driven – setup. Of course a setup through the AWS CLI and the AWS Console can be done as well.
A significant cost savings factor which would have to be taken into consideration is the deployment itself where time and effort would have to be included into a cost model to host services such like RabbitMQ, AMQ and similar and compare those with native cloud services such like AWS SQS in this example. Once deployed follow-up efforts would have to be considered into a full cost model such like OS patching, service update/upgrade, service health monitoring and so forth which again impacts time and efforts. Surrounding frameworks to install and maintain a service such like Chef or Puppet which by itself need maintenance, uptime, monitoring and so forth would have to be taken into a full 360° view as well. Very quickly it can end up in a hairball of dependencies where various teams have to interact accordingly to get a service deployed and maintained.
Conclusion: The chain of dependencies may have an influence on how complex and time consuming a self-hosted deployment for a given service can become. All-in-all how costly it may be potentially. These efforts compared to Cloud native deployments would provide a significant parameter which would have to be taken into consideration when Cloud native services get compared with self-maintained deployments.
Performance and Scale
AWS SQS can handle messages in a FIFO pattern in case an order of messages is a key requirement – however, a FIFO has limits regarding message handling throughput. The standard message handling is built for scale and throughput at maximum levels. While a SQS FIFO queue can handle up to 3000 messages per second the standard SQS message queue function can handle almost unlimited transactions per second – the so called TPS value.
While this article takes AWS SQS as an example specifically to compare Cloud native services vs self-maintained deployments the above described performance and scale for AWS SQS typically can be expected around Cloud natives services in general. Therefore scale up or down to meet ongoing needs but also leads into an ability to effectively manage performance and costs.
Deployment
Since AWS SQS is a fairly non-complex service from a usage perspective the deployment experience itself is similarly easy. A little attention should be taken into the producers and consumers on how they handle messages. Specifically how often a read/pull would be planned for. Each call to the SQS API to receive a message will be charged – no matter if the returned answer would contain an empty body or not.
AWS SQS configuration parameters such like the receive message wait time help to control costs and to lower empty body replies. On the application side an added logic which enforces a slow down when an empty queue is discovered are additional methods to control costs for empty calls. Since only what is used would be paid for it makes sense to avoid empty calls as much as possible so that costs would stay as low as possible.
Operations, Maintenance
An alternative to a cloud managed service would be a self-deployed, hosted and managed service. Earlier in this article the impact a self-maintained and hosted service can result into got reflected from a costs and efforts perspective. Given that queuing services such like RabbitMQ, AMQ are easy to deploy and maintain the efforts to deploy, maintain, monitor can become cumbersome.
Typically a certain uptime and reliability is required which may require a failover and may end up into a clusterized setup to support business needs. Rather simple tasks such like OS patching can causes all sorts of challenges such like when to patch, what is the roll back path, how long can a given service be taken offline, failover options, what if patching is ignored and potential vulnerabilities are not closed, … – the list can be endless. In highly regulated environments certain actions have to be followed – such like OS patching.
Given shared responsibility in a Cloud environment questions and tasks such like patching, environmental stability and so forth is handled by the Cloud provider – such like Amazon AWS, Google GCP, Microsoft Azure to just name a very few.
What does AWS SQS provide from an operations and maintenance perspective?
Basically AWS ensures high reliability. While SQS is a regional service AWS ensures that a compute node failure, network outage even up to an AZ (availability zone) failure would not impact the SQS service itself.
Typically in a self hosts and maintained environment this level of reliability is not reached. If it is it would come with a number of efforts such like containerization to abstract the dependency to compute, cluster functionality to ensure High-Availability and resiliency.
In case producers and consumers rely on a queuing service with an ongoing, always available, never slow down experience – so a true 24×7 availability – a self-hosted queue service will hit limitations.
- What if the underlying compute needs to be patched from an OS perspective?
- Who can be reached at any time (24×7) if the service needs maintenance?
- What are the impacts and follow up costs and efforts if an internally hosts queue service would become unresponsive?
- Which efforts does it take to ensure High-Availability
- Will the queue be able to store messages in a reliable pattern?
- What compute needs to be provided to ensure average but also peak loads?
- What if producers keep sending messages but consumers cannot process?
- How large needs a backlog to be considered?
Use case experience
After couple years on a self-maintained environment, running a queuing service (RabbitMQ in this case) and all implications had to be handled as described throughout this article a transition was made to integrate AWS SQS as a replacement.
The adaption from a code, software perspective had been straight forward, no issues at all with little planing and design beforehand.
Overall conclusion
While AWS SQS was taken specifically as an example for this article the overall outcome is obvious and likely not a secret:
Utilization of Cloud native services can outrun self-hosted, self-maintained services on various levels. Specifically when it comes to deployment, maintenance and operations Cloud native services can provide a significant advantage and always may want to be kept in mind as valid alternatives and options.
In case planning is done wisely costs can be controlled very precisely and foreseeably. Overhead costs, dependencies on various layers can be removed entirely.
Ad-hoc changes in requirement can be fulfilled very easily up to a fully automated fashion.
Last but not least with the shared responsibility model a public cloud provider such like AWS helps to take away the burden to operate and maintain a given service.