AWS CloudFormation best practices

At the beginning when AWS CloudFormation is being looked into first time it might be overwhelming on the sheer amount of options and how a successfully working deployment might be put together.

However, all-in-all, once acclimated it becomes a very useful, very powerful and rather handy to use toolset which is highly recommended to use for workload deployments into AWS. Specifically at the beginning of utilizing AWS CloudFormation it might be faster to just get some deployment done quickly through the AWS Console, the AWS CLI or AWS relevant API.

Certainly the initial invest in time and get to known with CloudFormation to follow a pattern of Infrastructure as Code deployment methodologies might feel to be a burden and would initially take time versus a traditional klick and hack through a Console, CLI etc… However it is worth to do the initial invest to become familiar with CloudFormation usage.

Keep it all clean

Utilizing workload deployments via CloudFormation will help to keep deployments clean. Naming conventions can be taken forward through various accounts and VPCs and will help to avoid to a variety to naming. Furthermore it’ll help to get everything removed alongside a turn down/removal of stack deployment. For example parameters written into AWS SSM Parameter Store, Security Groups, Subnets, VPCs etc – everything which literally would be generated and deployed as part of a stack deployment – can be removed whenever a stack is turned down.

This will not just keep a given AWS account clean but also will help to keep things consistent up to a point where audits would be done much easier, quicker.

Reusable code and templates

While a deployment which would be done through the AWS Console, CLI, API directly would not be able to be just re-used in a different account, VPC or deployment in general taken benefit of AWS CloudFormation templates will tremendously simplify reusability of Infrastructure as Code concepts.
Example: a deployment of a number of EC2 instances into a VPC, with Security Groups, Subnets, EBS volumes and so on defined and described within the CloudFormation template can be just re-used in a different account, VPC and so on – of dependent to the fact that a template would have to be described independently enough by utilizing pseudo parameters and such.

Agnostic to account and regions

AWS CloudFormation provides so called pseudo parameters. Those would not have to be described as they are predefined by AWS already. Parameters such like AWS::Region, AWS::AccountId are just available in any AWS region. Those predefined parameters help to write templates which are agnostic to accounts and regions and help to be able just get focused on the Infrastructure as Code description itself.

Ideally templates should be written in a way that they are agnostic to an AWS account and a region so that a CloudFormation template can be used in various accounts (such like testing, development, production)

Infrastructure as Code

Utilizing AWS CloudFormation literally means to describe the to be deployed AWS infrastructure for a given or to be created VPC in a given region and so on.

Infrastructure as Code (or IaC) basically is a procedure and methodology to create a definition (such like written in JSON, YAML) which simply describes what and how a workload should be deployed.

AWS obviously provides CloudFormation as a framework and toolset help to automate IaC defined deployments.

In the on-premise world tools such like Chef, Puppet, Terraform and many more help to declare and describe a deployment.

Custom resources

AWS improves the tool sets continuously. Therefore frameworks such like CloudFormation get enriched with more and more features which help to describe a to be deployed infrastructure. However, not everything can be described and sometimes so called custom resources would have to be added. An example would be to attach a given ENI or EBS to a certain EC2 instance which would need to be done uniquely where a custom resource based on a Lambda function will help to do what would be desired.

Basically a custom resource would be added when a required resource would not exist otherwise but would need to be created during stack deployment

Export Values

Values generated throughout a stack deployment can be referenced by other resources where they belong to that given stack. However, it’ll become a need quickly to be able to use values outside a given stack deployment. Luckily values can be exported for further usage by other AWS CloudFormation stack deployments.

A very useful and powerful service is the AWS SSM Parameter Store which belongs to AWS SSM (AWS Systems Manager). Values generated throughout a deployment can be written and stored into the AWS SSM Parameter Store easily where they then can be used by applications, via the API, CLI and various software SDKs.

Last but not least

  • Take good usage of the Description field. The description field in the various CloudFormation resource types is not just helpful to read through the IaC description much easier more importantly these description will be seen again alongside the various resource deployments. As mentioned above in this post ‘keep things clean’ – in this case a description field helps to maintain deployments much easier and time to search and audit will be able to be reduced
  • tag, tag, tag … For a number of reasons tagging resources is a key need and not just a best practice but rather something that should be considered as must have. The better tagging is done the better cost assignments, security audits and so one will be able to be done
  • AWS resource and property types reference … super useful and a reference to be able to write templates

AWS CloudFormation – AWS ElastiCache deployment

Scenario

  • Creation of a repeatable, reusable and less time consuming option to deploy a service into AWS
  • AWS ElastiCache (which is fully managed Redis and Memcache) as an example taken for this post
  • Automated creation of Security Groups during deployment to keep it all simple, tight and clean
  • Output of a connection string after successful deployment for next level usage

Benefits

  • While the Security Group gets created as part of the AWS CloudFormation based deployment the regional dependency becomes agnostic since no Security Group Id would have to be given to the AWS CloudFormation (nor in template, nor as parameter during execution).
  • During the stack deployment CF (= CloudFormation) will create a Security Group. The Security Group Id becomes an parameter which can be referenced within the CloudFormation stack during the existence of the stack deployment for later usage – i.e. to get it assigned to the ElastiCache’s configuration
  • No need to maintain extra (such like homegrown) tools on top of native cloud services and tools
  • Deployments can be done outside a production AWS account where the exact same template may be used for production level workloads. Core teams who maintain the AWS account then may act as a gatekeeper where burden to deploy would be less dependent to a single team – workloads actually would be distributed while costs and compliance still can be maintained closely
  • Teams would be able to own their stack deployments which not just only increases decoupled processing but also allows to help teams to identify themselves better with what they ‘own’.

AWS EC2 Automated Docker Swarm deployment

Very obviously AWS provides server less data processing for container based workloads via products and services such like AWS ECS and AWS EKS. Utilization of AWS services such like Fargate allows to let the deployment, handling as well as maintenance be handled by AWS so that it becomes obsolete to handle such topics and focus can be towards application development.

However, there might be use cases where non of the Fargate, ECS, EKS options would fit and an individual solution might fit. One of needs could be a requirement where it needs a fleet of containers to run for a couple hours throughout a day with an added requirement to execute the workload at lowest runtime costs in conjunction with a highly automated deployment process.

Context

High level function description

CloudFormation

  • CloudFormation based deployment to maintain needs and dependencies. This also allows a naturally clean deployment as in case of a deletion/removal all referencing components would be able to be removed as well – therefore no left-overs which would lead into potential costs
  • CloudFormation contains
    • InstanceProfile creation
    • One Auto Scaling Group (aka ASG) which creates a Docker Swarm deployment to maintain a baseline workload (runs 24/7). These type of hosts would perform ongoing workloads.
    • Another Auto Scaling Group (ASG) which is based on SPOT instances by 100%. The deployment is configured to utilize a cron type scheduler where instances outside the time window would be terminated and destroyed – therefore no idle costs. These type of hosts would be used to distribute batch time executions such like reporting calculations.

Docker Swarm

During EC2 instance deployment via the first ASG the initiating cloud-init code would do the following

  • if no cluster exists a new Swarm would be created
  • if there’s a Swarm but less than X master nodes in total a given EC2 instance will join the cluster a master type node
  • if there a Swarm but X master nodes exist a given EC2 instance will join the Swarm as a worker/member node

Cleanup functionality

  • if a master node is gone the node configuration will be removed from the Swarm but also from the SSM Parameter Store
  • if a work node is gone (i.e. time windows gets closed) the node will be removed from the Swarm

Strategies to Migrate into the Cloud

Formulating a strategy

Many companies consider to migrate either applications or full data center infrastructures into the Cloud. There are a number of strategies which describe on how to migrate a given on-premises infrastructure into the Cloud.

While those strategies may help to identify a best path for an upcoming migration it only may help to guide an individual business on their way to plan a migration strategy.

Overall a key factor for any given migration is to get a strategy defined as it will save time, money and provides a realistic scenario – as well as an adjustable one.

Know your installation

Typically it should be known what got installed, deployed – what is in use. However, depending to the companies size, the diversity on how applications and services get deployed and maintained it may become a need to look into opportunities to get discovered what’s out there.

Unless a repository exists (i.e. based on ITSM and ITIL mechanics) to get an overview on applications, installations created various additional options exist which can be utilized to get a repository created.

In case of AWS services and tool sets such like AWS SSM, AWS ADS can be a great help to either generate a full repository from scratch or get a given repository completed to get to a good 360° view.

The better the repository and understanding on a given infrastructure – the more likely a successful migration plan can be planned while reducing risks of unknown parameters.

Know your abilities

Though this would be a case-by-case topic. Depending on how a given company plans to migrate into the Cloud and certainly depending to an overall size and budget dependency (fully self-driven vs all supported by consultancy or something in-between) it may be helpful for teams who look into the Cloud first time to consider introduction level courses. Nothing large, complex but surely it’ll be helpful to get a baseline set.

Re-Host / Lift-and-Shift

Many companies consider the re-host – also known as lift-and-shift – strategy as their best option to get a given on-premises infrastructure migrated into the Public Cloud, such like AWS. Obviously in large infrastructure deployments – supported by an understanding of an above mentioned repository – this already can lead into a savings component.

Large parts during a re-hosting process can be automated which can help to simplify a migration process as well as it can help to speed up a delivery timeline. This option though would have to be looked into in a case-by-case situation.

Re-Platforming

Either as part of a lift-and-shift process or after a deployment into the Cloud a re-platform option might be considered. What does that mean? After a successful deployment and a review on budget spendings a change from individually maintained deployments such like RDBMS, Cache applications such like Redis or Memcache, Brokers such like RabbitMQ could be reviewed to be changed into Cloud based and maintained services.

For example a local, ec2 instance deployed RabbitMQ setup could be considered to be changed into a managed service deployed product such like AWS MQ. Or caches such like Redis or Memcache could be considered to be exchanged by AWS Elasticache

A re-platforming may lead into another level of further cost savings. Some thoughts are put together in an article in this blog. It’ll support interchangeability, operability as well as maintainability.

All-in-all the core architecture of a given application will not be touched or changed during a re-plattform process.

Re-Architecturing

This concept certainly is the most complex one as it will change the core architecture of a given application. A number of factors may require or lead into a need re-architecture a given environment or application.

Driven by patterns such like a need to be able to highly scale, support server less execution, meet certain performance levels could empower to utilize Cloud native services.

Example: an application with a decoupled processing uses RabbitMQ (or AWS MQ) to establish an EMB to handle message brokerage between components in an application to get those processed. An alternative could be a native Cloud service such like AWS SQS.

While the efforts to change tend to be the most complex and therefore re-architecture may be expensive otherwise this can be a very beneficial option to be considered it agility, performance, high availability at typically lower pricing could be the outcome. Though this comes with an invest and efforts initially.

AWS Config to monitor compliance

AWS Config is a service which helps to keep track on resources deployed and associated along with an AWS account.

It monitors on configurations based on rules, their relationships to other entities and resources and how those – if – over time.

The Dashboard itself provides a timeline which represents which changes got tracked over time and when a resource either became compliant or noncompliant.

Configuring AWS Config

Resource types to record

Select “All resource” to be able to get any changes tracked

S3 bucket

S3 bucket to store configuration history

In my case I had a bucket available already. Otherwise AWS Config can get created one. The syntax will be “config-bucket-<account-id>/<prefix>/AWSLogs”

SNS topic

A SNS topic – default name will be “config-topic”

An added SNS topic will help to get other resources notified on changes. The SNS topic also can be used to get automations created to let automated remediations solve a noncompliant status or invoke something else in case a compliance state would be triggered. For the sake of convenience the SNS topic created in this example will have an email subscription.

Role

AWS Config role selection

As with any resource at AWS that is configured to execute on-behalf it needs to be allowed by a role. AWS Config provides an existing service-linked role which adds some convenience.

Creating rules

Rules will provide the actual monitoring rule that is going to be executed to track a noncompliant or compliant status.

In this case 2 pre-configured rules will be added to the configuration:

  • ebs-optimized-instance
    This rule monitors if EC2 instances in the AWS account have been configured with an EBS optimized instance option. Though AWS seem to more and more get instances provided where EBS optimization is just enabled and turning on/off no longer makes a difference.
  • ec2-instance-detailed-monitoring-enabled
    This (as well) pre-configured AWS Config rule monitors if a given EC2 instance has the option for detailed monitoring enabled. It might be desired to have detailed monitoring enabled so an automated remediation could be implemented to let AWS Config take a correction action in case of a noncompliant finding. (please watch out costs for monitoring otherwise as well)

AWS Config Timeline

Once everything is configured and first iterations for compliance tracking are executed by the AWS Config service a Timeline can be opened up to visualize the configuration status on a given resource over time.

The timeline as well can be drilled into to see what changes occurred during a compliance or noncompliance finding.

Grouping Jobs in Rundeck

An overview on how to do group creations in Rundeck

At starting a project in Rundeck it might be easy to just create new jobs as a request arrives. However over time this may end up in a large list which then may become a bit cumbersome to be maintained.

Rundeck provides an option to organize jobs into groups. This really is just a logical set of jobs and a given group (or groups) can exist inside another one. In the jobs menu this then will be exposed as a folder structure.

Here’s a quick how to on creating groups in Rundeck:

During a new job creation enter a job name and the group name into the second field. In this is case it’s just group_level_1

Rundeck job name and group level

While entering another job it then becomes possible to either directly choose the already given group (or folder) name via the Choose button:

Rundeck group structure

It also will be able to just write a grouping structure similar as it would be an URL typing:

Rundeck job name and group level with slash forward typing

Create a Python package for Lambda

In some cases the modules available in boto3 do not provide those which might be required in a project to let Lambda act as desired.

Preparing a function package

Create a folder and inside the folder install the modules that will be packed. Pip is just helpful in this case. Afterwards do a recursive zip inside the folder. In my case I had to install the DataStax Cassandra drivers.

sudo pip3 install cassandra-driver -t .
cd ./aws_lambda
zip -r aws_lambda ./*

Now zip file need to be uploaded into the Lambda function – depends on the size of the package it will not be able to get code shown in the editor insight the UI.

If everything else is set correctly a quick test should show if the function works and if modules can be loaded.

In this case a quick select has been executed to verify functionality.

cql = 'SELECT max_index_interval, min_index_interval, gc_grace_seconds, speculative_retry from tables LIMIT 1000'
session.row_factory = dict_factory

AWS Access private Ec2 resources with Lambda functions

Various jobs and functions can be leveraged via AWS’ Lambda functions. That way Serverless components can be turned on. Doing so would open options to reduce a need to get Ec2 instances deployed as it would cause ongoing maintenance such like systems patching, environment configurations, monitoring etc etc.

In the example outlined here there is no Trigger attached which otherwise would invoke the execution of a Lambda function. The focus is to let Lambda access a private Ec2 instance. This may be outlined in a different blog entry.

AWS settings

VPC, Subnets, Gateways

  • A VPC with 2 subnets should be available for redundancy reasons
  • In case Lambda would need to be able to access public resources (i.e. the Internet) the VPC needs to be configured with an ENI, so a NAT gateway
  • Routes need to be created accordingly – verify the default route
    • Public subnet: Destination 0.0.0.0/0, Target will be an igw-…
    • Private subnet: Destination 0.0.0.0/0, Target will be a nat-… or an eni-…

IAM

As common at AWS to let a resource interact with another one it has to be allowed. To make it a bit easier AWS provides a preconfigured role called AWSLambdaVPCAccessExecutionRole which extends the AWSLambdaBasicExecutionRole role. Specifically the ec2:* policies as shown below are necessary to allow a given Lambda function to access an internal Ec2 instance. The logs:* policies will be needed in any case to allow Lambda to function.

Once IAM is set Lambda can be looked into

Lambda

Via the Console the Lambda menu option will be chosen and a new function will be created. In this case I used latest Python and took an existing role under the permissions graph. This is the role I created in the IAM section above.

Finally network settings need to be set accordingly:

In my case I took one subnet only – AWS allows it but shows a warning. As mentioned earlier a second subnet would be needed to ensure reliability. It would need to be ensured to choose the correct VPC along with a Security Group that allows to let resources connected from within the SG.