In previous articles on this blog it was talked about High Speed Logging as well as designs on how to process HSL, High Speed Logs.
There’s more than one way to skin a cat …
This article and a series of others will handle design and architectural ideas and options to turn a self-hosted, AWS EC2 based infrastructure into a fully automated and full service based deployment by utilizing public Cloud provider offerings such like AWS provides.
It furthermore will be shown which benefits various deployment options would provide since – as always – there’s more than one way to skin a cat.
AWS ELB’s logs are sourced to provide an infrastructure and security type monitoring. The log format itself looks similar to a web access log. Typically an AWS ELB – Elastic Load Balancer – creates a new log file every 5 minutes. In very high traffic situations this can happen to be more often.
- The log file is stored in an AWS S3 bucket as configured with in the AWS ELB’s configuration.
- AWS S3 generates a trigger to an AWS SQS queue which an AWS Lambda function listens to.
- Whenever new messages arrived in AWS SQS Lambda picks them up. The individual message contains an information about the object’s key (so the path to the log file as it has been stored by AWS ELB).
- The AWS Lambda Function reads the object to break the various log lines into key/values to process further. In this use case where data is processed into AWS Elasticsearch the k/v is translated into a JSON formatted string. This ensures on the processing later in the chain that AWS Elasticsearch can get the document consumed and indexed easily.
- The AWS Lambda Function code also could enrich data where and if needed to meet business requirements.
- Next step then is to get the data pushed forward which is where AWS Kinesis joins as it is highly capable to consume loads of records in short amount of time.
- AWS Firehose acts as the glue in the chain to consume records out of AWS Kinesis and to move those forward into AWS Elasticsearch as the use case scenario in this article. If desired AWS Firehose furthermore could manipulate records via AWS Lambda if it would have to be.
- Final step then is for AWS Elasticsearch to index and store the data for further consumptions, analytics or search in general.
What does the architecture help to meet?
Increasing traffic patterns
In case of growing traffic volumes more log files would be generated. In a different article the processing of ELBs logs has been described as an asynchronous and self-learning environment which is hosted on EC2 instances. While the solution has pros it has one hard limitation which is throughput fr high scale. The architecture described in the schema above is meant to allow high throughput in a given AWS Region with the intention to store data into a search engine.
All components chosen can be deployed by utilizing AWS CloudFormation which allows a full abstraction away from traditional strategies on deployments. Therefore a move towards Infrastructure as Code principles. A different article handles AWS CloudFormation Stacks and StackSets so that a full environment can be turned into a push-button deployment.
All-in-all even manually deployed through the AWS Console or the AWS CLI the utilization of CloudFormation templates makes a deployment re-usable so that a deployment by region or into various AWS accounts can be done in a repeating pattern while consistency is ensured.
In traditional instance based deployments every other while certain routines would have to be taken care about. A prominent example certainly would be OS security patch routines to keep an EC2 based deployment up to date on latest security patches. A procedure which typically requires lots of overhead and attention by application owners, IT teams as well as teams where all maintenance efforts have to be handled.Each patch cycle though provides a risk on stability as each patching effort is followed by an instance reboot which simply can cause struggles to applications which are executed on those EC2 instances or those which would be dependent and may struggle on the absence during reboot.
Components chosen to support the architecture described in this article are meant to scale and to meet traffic patterns as those may raise. Monitoring can be implemented to let AWS Kinesis increase or decrease shards when certain thresholds would be crossed. Certain traffic patterns even could require to let incoming events into AWS Kinesis to be throttled down. A variety of options would be possible.
Lambda would scale alongside the amount of messages which may wait in AWS SQS to be picked up. A scale here would very likely be faster compared to a full AWS EC2 deployment including application deployment and configurations. During low traffic time windows it would be ensured to operate with less to no overhead as it would be with AWS EC2 instances otherwise while they could end up in idle time.