In previous articles on this blog, we discussed High-Speed Logging (HSL) and various designs for processing HSL logs.
As the saying goes, “There’s more than one way to skin a cat.” This article, along with a series of others, will explore design and architectural ideas for transforming a self-hosted, AWS EC2-based infrastructure into a fully automated, service-based deployment by utilizing public cloud offerings such as those provided by AWS. We will also highlight the benefits of various deployment options, as there are multiple approaches to consider.
The Architecture
ELB Logs in a Serverless High-Speed Log Processing Architecture
AWS Elastic Load Balancer (ELB) logs are essential for infrastructure and security monitoring. The log format resembles a web access log, and typically, an AWS ELB creates a new log file every 5 minutes. In high-traffic situations, this frequency can increase.
The Steps:
- The log file is stored in an AWS S3 bucket as configured in the AWS ELB’s settings.
- AWS S3 generates a trigger to an AWS SQS queue, which an AWS Lambda function listens to.
- Whenever new messages arrive in AWS SQS, Lambda picks them up. Each message contains information about the object’s key (the path to the log file stored by AWS ELB).
- The AWS Lambda function reads the object and breaks the various log lines into key/value pairs for further processing. In this use case, where data is processed into AWS Elasticsearch, the key/value pairs are translated into a JSON-formatted string. This ensures that AWS Elasticsearch can easily consume and index the document later in the processing chain.
- The AWS Lambda function can also enrich data as needed to meet business requirements.
- The next step is to push the data forward, where AWS Kinesis comes into play due to its capability to handle large volumes of records quickly.
- AWS Firehose acts as the intermediary to consume records from AWS Kinesis and forward them into AWS Elasticsearch for this use case scenario. If desired, AWS Firehose can also manipulate records via AWS Lambda.
- The final step is for AWS Elasticsearch to index and store the data for further consumption, analytics, or general search purposes.
Benefits of This Architecture
Increasing Traffic Patterns
With growing traffic volumes, more log files will be generated. In another article, we described the processing of ELB logs as an asynchronous and self-learning environment hosted on EC2 instances. While this solution has its advantages, it has a significant limitation regarding throughput at high scale. The architecture outlined above is designed to allow high throughput within a given AWS Region while efficiently storing data in a search engine.
Automated Deployment
All components can be deployed using AWS CloudFormation, allowing for full abstraction from traditional deployment strategies. This approach aligns with Infrastructure as Code principles. A separate article discusses AWS CloudFormation Stacks and StackSets, enabling push-button deployments for entire environments.
Even when manually deployed through the AWS Console or CLI, utilizing CloudFormation templates makes deployments reusable across different regions or various AWS accounts while ensuring consistency.
Maintenance
In traditional instance-based deployments, certain routines must be periodically managed. A prominent example is OS security patch routines necessary to keep an EC2-based deployment updated with the latest security patches. This process typically requires significant overhead and attention from application owners and IT teams. Each patch cycle introduces risks to stability since patching efforts often require instance reboots that may disrupt applications running on those EC2 instances.
Scalability
The components chosen to support this architecture are designed to scale according to changing traffic patterns. Monitoring can be implemented to allow AWS Kinesis to increase or decrease shards when specific thresholds are crossed. Some traffic patterns may even require throttling incoming events into AWS Kinesis.
AWS Lambda will scale alongside the number of messages waiting in AWS SQS for processing. This scaling is likely faster than a full EC2 deployment that includes application deployment and configuration changes. During low traffic periods, it ensures minimal overhead compared to idle EC2 instances.


Leave a comment