Making DevOps work for you

Making DevOps work is one of those things everyone wants. Out with the old and in with the new. But what drives that demand and how obtainable are the benefits? To deal with those, it’s worth establishing some kind of exact definition as to what DevOps actually is. Wikipedia comes to our rescue here and defines it as:

DevOps (a clipped compound of development and operations) is a culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information technology (IT) professionals while automating the process of software delivery and infrastructure changes. It aims at establishing a culture and environment where building, testing, and releasing software can happen rapidly, frequently, and more reliably.

So there we have it. There are many other definitions used elsewhere, but I personally feel the Wikipedia one isn’t a bad starting point.

In working professionally and speaking to those leading other businesses, I’ve seen a set of common themes and drivers for wanting to adopt what those leaders tend to understand of DevOps. The keywords and phrases look like “agility,” “collaboration,” “time to market,” “repeatability,” “reliability” and “cost.” There are, of course, many others, but these tend to be some of those heard in the first few sentences exchanged with CIOs, CTOs and other technology business leaders.

1.  What does the old world look like?

Many compare organizational practices such as the ITIL Service-Management framework (unfavorably) with DevOps as if the two were equivalent in their functional coverage. They’re not. They serve different needs, and although there are some common touchpoints, both can operate very effectively together. Adoption of ITIL (or other equivalent service-management framework) can provide a mechanism to drive dependable service-management, but is not an effective means of moving software products quickly along the conveyor belt of the organization. Making DevOps work means you have to think of it as more of a change to the software development process rather than service management.

When making adjustments to speed up the production line that is the legacy software development methodology, the roles and responsibilities quickly become either unclear or too restrictive, and gaps are created that no RACI matrix can promptly fix. The build-up of the process can become unwieldy and in this lies the issue of poor agility, slow and infrequent release cycles and inflexibility. That’s the old world and still the present world for very many organizations.

I find the comparison with software development itself quite illuminating. There have always been many software development life cycle (SDLC) approaches (SSADM, Waterfall, etc.) and a similar set of problems were leveled at these. Out of this thinking in the 1980s and 1990s came rapid application development (RAD) and this approach evolved (with varying success) over the years. These include several almost extreme and aggressive development methodologies aimed at addressing agility and time to market, but have tended to sacrifice reliability, repeatability, scalability and security. Quite possibly (for prototypes, as an example) this matters little, so the approach is still a very valid one.

In the case of DevOps (that marriage between the tail-end of software engineering and the take-on by an operational function), a similar set of factors exist. Clearly, it’s a great thing to have the best brains of an engineering and development function working hand in hand with the best minds in IT operations management to derive an effective flow of software and product to production platforms. The challenges come in being able to define a common set of responsibilities, lightweight processes, measurable outcomes and an operational framework that allow the methodology to run smoothly and without conflict.

So, where previously the part of the SDLC that relates to the piece between engineering and operations would have operated around a formalized set of gates and acceptance, the new world will change some of these — and most certainly the hand-offs between the two key functional groups.

2.  Will it work for us, and what do we need to put in place?

Not every organization will be suited to the changes required in making DevOps work. It’s as much a cultural shift as a technological one. Much will depend on the nature of the organization’s business and its need for regular product releases. If the software product set is small, mainly off the shelf and releases or deployments infrequent, the case for DevOps will be much weaker. If there are many products (built largely in-house using open-source), releasing regularly and to different target environments (think on-premise, Web, Cloud, Mobile, etc.) then the case is a strong one and the effort is very likely to be worth it.

On top of that qualification, there will need to be a judgment as to whether there is going to be an appetite for a culture of collaboration and transparent communication. If there is, the program of change then needs to be fully supported by an appropriate executive sponsor within the organization who is willing to communicate the intentions. Once that is established, a future operating model can be designed and communicated more widely.

It may be the case that the resources within the engineering and operations teams either already have a number of the skills required, or are able and willing to embrace the change and learn the new processes and tools that will be required. If this isn’t the case, changes within the teams will most likely be required. I do not intend to go into example toolsets within this report, but suffice it to say that Automation (deliberately with a capital “A”) is going to play a very key part and that the tools supporting automated configuration-management, deployment and testing are going to be the enabling technologies of your DevOps capability.

Defining and agreeing to the outlined operating model, success criteria, and division of responsibilities is going to be essential. If a DevOps function is plugged into an organization that doesn’t change its engineering or operations boundaries, hand-offs and expectations, it will not work effectively and may even add to the pains by introducing further confusion. A point often missed by teams keen to move forward with collaboration in their new, DevOps world is that the rigor of testing, configuration management and change-control absolutely extends to all of the automation scripts used. There is little joy in finding out (after the fact) that a product doesn’t work as expected because there had been a change not in the product itself, but how it was tested or deployed.

The people you will be looking for will have something of a unique blend of strong skills in infrastructure, open-source software, tools, automation and testing. Without these, you will struggle to succeed. At the same time, you will find individuals who possess the skills necessary but do not want to operate outside of an architecture / build function. These are not the people for a DevOps function.

3.  Where will the tensions lie?

Let’s not make a pretense about it; development teams and operations teams have always tended to have something of a conflict between them. The engineering side wants and expects IT operations people to have a much stronger understanding of the architecture, the implications of any change or condition and an ability to troubleshoot to a detailed level before bothering them with anything. At the same time, operations staff are usually working to strict service hours, have hand-offs between shifts, have a focus on SLAs and a priority on keeping systems running or recovery rather than on the engineering nuances of the platforms.

What is required are development-leaning people who can understand more about the underlying infrastructure and operations; people who can understand more about the workings of the software. Merge these and something of a DevOps function can start to emerge. The ability to automate functional testing must be brought in by this function from the start, never as an afterthought. At the same time, serious consideration must be given to non-functional requirements, such as performance, security and time-to-recover (resilience or RPO / RTO). These are absolutely critical to the operations function but are rarely at the forefront of the design and engineering teams. Well — within DevOps, these must be communicated up the chain to those engineering functions and delivered to the operations functions.

So where does QA fit into this picture? It simply becomes part of the process such that engineered solutions pass through the DevOps function and successfully pass QA (the usual one — against defined and controlled sets of tests) as a prerequisite of executing the production deployment scripts. Again, the same emphasis on automation must exist. The ability to shift to continuous deployment models will rely very heavily on automation and test, and QA functions must remain critical parts of that deployment workflow.

Make no mistake about it — the introduction of DevOps is going to add to the workload and stress of your development staff because their extent of responsibility extends across a much wider scope of the lifecycle. And that’s just how it should be because the operational success factors that make the difference between good product, good execution, and bad product, bad execution have to be built-in from the very start.

One function that will benefit all, but mostly the teams responsible for the ongoing operation of platforms and products is a functional dashboard. At the most basic, think traffic light — a red / amber / green status that shows the status and health of a product. Delve more deeply into this, and the dashboard should be able to show the health at a much more granular level that includes infrastructure, platform, product and operational use. The dashboard should, conceptually, be seen as the single pane of glass through which operations staff can observe overall health. Taken further, common operational processes can be built into the dashboard, and automation can be built in.

How to integrate legacy platforms into the DevOps function is beyond the scope of this report, but be assured that integration is, at least partially, possible and benefits can be derived. It may not be possible to integrate measurement points driving dashboards as seamlessly with legacy products and platforms, but there are usually ways to collect key indicators from the log and alert data, from the execution of simulated or synthetic transactions, or in turn from existing monitoring systems. Encourage your DevOps teams to look to single interfaces rather than to present an operational requirement that requires monitoring a large number of different tools and dashboards. This will enhance the reliability of the operation, reduce costs and improve both internal and customer satisfaction.