Within the article HSL – High Speed Logging it was talked about High Speed Logging itself and how this type of data which purely is technical and has an infrastructure source can be used to support SLA reporting.
This article will provide an insight on how to process and basically ship the amount of data from its source into a data store. On purpose the design does not utilize Cloud native services to keep it re-usable with various deployment strategies.
Source data generation on premises vs in the Cloud
In an on premises environment the data generation and shipping has to be handled in a different way as it would have to be handled in a Cloud based deployment.
Log data collected with a Load Balancer will have to be shipped to a central syslog host. Often some data manipulation comes on top (such like date format translation, data which may have to be manipulated, information to be added for later processing). Logstash’s syslog input plugin can be used to create a central syslog message endpoint. With Logstash’s filters data then can be massaged as needed. Elsewhere in this blog the so called ELK stack will be reflected in more depth.
In a Cloud based deployment the data collection changes entirely. Basically the Cloud provider takes care to provide the logs into storage from where it can be picked up. AWS for such purpose uses Simple Storage aka S3.
Taking AWS as an example: AWS ELBs can store their respective logs into a S3 bucket from where logs (actually objects located in a bucket) can be pulled individually for further processing.
Each component in the stack is chosen to a allow to be changed by a different solution and/or deployment strategy. Connections and communications between the components is handled in a loosely composed pattern to avoid a hard wired setup.
Redis in this design is used to provide a flexible, fast acting, data structure store used as an in-memory database.
RabbitMQ in this design is used to provide a broker mechanism to handle the actual objects (logs).
Containers are used to ensure reliability, availability and scaleability for the individual microservices . A single container image is designed where an environment parameter switches the container’s behavior during startup. Docker Swarm Service provides an easy to use environment to handle a scalability as well as an ability to recover a microservice.
3 software components
- Listing – Creation of a library of available, configured ELBs in a Region. Basically the listing function sets the stage for the next to components.
- Receiving – Sort of an orchestration job which creates work packages for processing.
- Parsing – The actual work horse in the design with a need to scale up and down where it translates the ELB logs into JSON arrays and enriches data for later analytics.
- Region based deployment such like us-east-1, eu-west-1
- Self-learning system which identifies ELBs in a given region. This eliminates ongoing hands-on tasks to maintain newly added ELB’s log processing
- Automated discovery if an ELB’s configuration would have logging disabled for reporting, alerting purposes
- Systems tracks itself when an ELB was seen first time vs reviewed last time vs when objects got processed last time
- Message brokerage to handle work packages to process (download and unpack) chunks of objects (logs)
- Automated removal of objects in the various S3 buckets after processing which turns the individual S3 bucket into a buffer. Furthermore adds a layer to allow fail-safe operations
- In case a process dies insight a container it will recover itself after a moment to ensure a highly available processing mechanics.