Introduction:
Automation is the most effective path to implementing a reliable, robust, and flexible cloud-based CICD solution that can grow with your organization’s needs. To fully utilize the cloud you must automate - this is non-negotiable. Given this simple fact a review of the philosophy and practice of automation in the cloud is beneficial to conduct periodically.
The reason automation works so well is because it allows your organization to keep engineers, developers and testers away from important critical CICD processes. This reduced need for an engineer’s manual intervention helps improve and maintain operational efficiency while reducing the opportunity for errors.
However, it is important to note that automation is an action and can be implemented with varying degrees of success.
A good DevOps engineer will focus on a commitment to implementing and maintaining the highest possible level of automation for the service(s) they support. Yes it’s a trade off that is continually under review for improvements(CICD) by engineers and leadership.
Definition:
Before reviewing let’s refine our definition of automation with the help of IBM[1]:
Automation is the application of technology, programs, robotics or processes to achieve outcomes with minimal human input.
Consideration of this simple definition when designing, building and maintaining a CICD platform will help guide you to a robust, reliable, and flexible CICD solution for your organization.
Benefits:
There are benefits at many levels with automation that a DevOps engineer can leverage. From a simple repeatable and reliable build process to an overall increase in operational efficiency and a reduction in cost. There are two additional important reasons why you want to invest in automation, the effects of manual operations and organizational policies. But first let’s review the benefits of automation as described by the major cloud providers[2][3].
Enabler
- A critical component of cloud environments
- Standardization of CICD processes
- Enabler for operational excellence
- Automated build, deploy, and test enable recovery
- Entry point for automated recovery
- Recovery enables automated remediation
Reduction in operational overhead
- Automated recovery is cheaper and less time consuming than manual recoveries
- Automated processes have a lower error rate in comparison to manual processes
- Improved process reliability and robustness
Cost Savings
- Manual intervention is slower and more expensive than automated solutions
- Reduce time spent manually fixing errors leads to cost savings
- Automated deployments are cheaper
- Free engineers to work on more important tasks
Infrastructure as Code
Without infrastructure as code (IaC) you cannot practice automation. No IaC, no automation in the cloud, it really is that simple. This is because the infrastructure code is the mechanism by which change is delivered. When IaC is not used manual configuration is the only alternative, which by today’s computing standards is not an acceptable solution and will eventually negatively impact your organization.
Human Interactions:
The DevOps engineer’s aim is not to remove human input but to formalize and codify the location and time for those human interactions. To do this the DevOps engineer and leadership must identify the organizational roles involved with the CICD processes and the information these roles require from the CICD processes. From a personal perspective I have found once an engineer has the ability to access the information they need they are more than happy to consume the information and have that task removed from their workload.
Logging:
The time to recover from an incident or failure is directly impacted by the logging implementation for a given process. Logging is a required component of error handling with the initial step of error handling being logging in many cases.
Consistency in logging errors and failures enables standardization and promotes team interoperability - since everyone is implementing and reading the same logging style.
Dashboards:
Engineers want observability and visibility into their CICD processes. Dashboards are a great way to provide this and offer a flexible solution that can grow with your CICD processes. Dashboards allow engineers to monitor key performance indicators and receive alerts in real time from a central location. Dashboards do come with their own overhead, but I believe it’s well worth the investment.
Notifications:
Notifications are enabled by alerts and state changes in your CICD processes. Notifications help record these alerts and process state changes and inform team members and stake holders of their outcome. Often this is the first step in error handling procedures.
Manual Operations:
When a CICD process has manual operations the process becomes honor bound. Each time the manual operation is required in the processes lifecycle an individual team member must honor the CICD processes design. Stating it bluntly every time a manual operation is required it is an opportunity to fail regardless of valid process input parameters.
Manual operations may also become an organization’s culture simply from team members repeatedly executing the task over a number of years. A DevOps engineer has little to no chance of changing an organization’s culture without leaderships support and even then many leaders recognize cultural changes can be hard to implement.
Organizational Policies:
I have found there is one-to-one relationship between engineering organizational policies and CICD processes. A common example in my experience is the pull request with no checks or tests ran but simply needs manual approval to incorporate the new changes in the code base.
An organization’s testing policies can have a dramatic impact on CICD processes trust worthiness. Whatever the reason if an organization’s CICD processes cannot enforce green means go and red means stop! approach to testing then the process is compromised and will produce undefined outcomes that will have a direct bearing on software engineer’s perception of the CICD process and their ability to trust it.
Green == passing tests, red == failing tests
Conclusion:
On consideration of the above I would like to add to IBM’s definition. I find that time and its effects are absent from the original definition. Automation is an iterative process and that requires the inclusion of time.
Automation is the application of technology, programs, robotics or processes over time to achieve outcomes with minimal human input.
Once again I happily arrive at the conclusion that automation is a foundational requirement for CICD processes. Yet automation is an action which benefits from discipline and can be executed either elegantly or poorly. Leadership influences CICD processes and automation efforts by the policies they choose to enact and the support they provide them.
Manual operations are a blocker to further automation efforts and represent a scheduled opportunity to fail every time regardless of valid input parameters! Manual operations have a cultural component to them.
CICD that relies on testing and is automated provides the highest level of trust. Testing must be a first class citizen in your CICD solution.
Automation builds on its self. For example to remediate a failure you need to be able to recovery from the failure and that recovery relies on the ability to roll back those new changes. I constantly find myself thinking of a digital toolbox of CICD process components.
Resources:
- https://www.ibm.com/think/topics/automation
- https://docs.aws.amazon.com/whitepapers/latest/introduction-devops-aws/automation.html
- https://learn.microsoft.com/en-us/azure/automation/overview