Very obviously AWS provides server less data processing for container based workloads via products and services such like AWS ECS and AWS EKS. Utilization of AWS services such like Fargate allows to let the deployment, handling as well as maintenance be handled by AWS so that it becomes obsolete to handle such topics and focus can be towards application development.
However, there might be use cases where non of the Fargate, ECS, EKS options would fit and an individual solution might fit. One of needs could be a requirement where it needs a fleet of containers to run for a couple hours throughout a day with an added requirement to execute the workload at lowest runtime costs in conjunction with a highly automated deployment process.
High level function description
- CloudFormation based deployment to maintain needs and dependencies. This also allows a naturally clean deployment as in case of a deletion/removal all referencing components would be able to be removed as well – therefore no left-overs which would lead into potential costs
- CloudFormation contains
- InstanceProfile creation
- One Auto Scaling Group (aka ASG) which creates a Docker Swarm deployment to maintain a baseline workload (runs 24/7). These type of hosts would perform ongoing workloads.
- Another Auto Scaling Group (ASG) which is based on SPOT instances by 100%. The deployment is configured to utilize a cron type scheduler where instances outside the time window would be terminated and destroyed – therefore no idle costs. These type of hosts would be used to distribute batch time executions such like reporting calculations.
During EC2 instance deployment via the first ASG the initiating cloud-init code would do the following
- if no cluster exists a new Swarm would be created
- if there’s a Swarm but less than X master nodes in total a given EC2 instance will join the cluster a master type node
- if there a Swarm but X master nodes exist a given EC2 instance will join the Swarm as a worker/member node
- if a master node is gone the node configuration will be removed from the Swarm but also from the SSM Parameter Store
- if a work node is gone (i.e. time windows gets closed) the node will be removed from the Swarm