Why DevOps Fails: How To Get Back On Track

Why Devops Fails: How To Get Back On Track

Have you ever watched ‘Kitchen Nightmares’ with Gordon Ramsay?

The show is about an English Chef who goes into mom & pop restaurants all over the US and works with the owners and staff to fix their failing operations.

Usually, Gordon fixes all the problems by the end of a 60-minute episode (and with the magic of video editing). Each episode normally climaxes and ends with a big group hug and a lot of tears; then Gordon walks away into the night after making a snarky comment.

You’re probably wondering how this relates to DevOps?

Here’s how. If you’ve read my free DevOps ebook then you know I talk a lot about hang-ups. If not, here’s the quick and dirty meaning…

Hang-ups are habits and behaviors that keep pulling us back to the BAD old ways.

Kitchen Nightmares has a couple of special episodes where Gordon returns to some of his projects and guess what he finds?

Well, sometimes he finds success.

And you guessed it, other times he finds failure!

That’s right, Hang-ups have caused the BAD old ways to return.

Hmm…It’s interesting how restaurant and IT organizations have similar hang-ups…

Unpacking Why DevOps Fails

Every DevOps initiative is not the same so I’m only going to explain 4 common failures I have dealt with. You are welcome to share your own successes and failures in the comments area below.

Let’s not hold any punches and get straight to it.

1. Infrastructure Hoops

I’ve seen DevOps initiatives fail right out of the gate because IT departments are still not dealing with all the technical debt that has accumulated in their data center over the years.

Avoiding this hoop is one key reason start-ups kick butt…they are not hung up on managing data centers or haggling over which server, network, or storage vendor to use.

Click! Click! Click! They are going straight to the cloud and leveraging advanced cloud services via an API on day-one.

Whereas, many self-hosted IT operations are still struggling with all of the above as well as huge (and rigorous) processes for building their infrastructure.

Even if they are using VMware or OpenStack, in the beginning, they are still stuck throwing tickets back and forth between teams.

But this isn’t the only problem faced.

Let’s not forget in some cases (or should I say most) how hard it is to get approvals to buy new hardware.

I love the idea of re-purposing old hardware to save money but sometimes you need to buy shiny new hardware to get the full value from your DevOps skills.

For example, running 50 virtual servers on an old HP server with 24GB of memory is just flat-out wrong!

I see all of the above as technical debt and IT operations like this will cause DevOps efforts to fail.

Why?

It’s because there is too much time and effort going into tasks that should be transparent, or another reason is underlining circumstances waiting to bubble up; performance and over-subscription-related issues.

If building and adding capacity to your infrastructure is taking days, weeks, or months, then DevOps will not fix your service delivery problems.

First, you will need to take a good look at why it is taking so much time and effort to build infrastructure and solve this problem.

Here are some ideas to help get your DevOps transition plan on track:

  1. New hardware approval process – put some clear guidelines in place and be reasonable.
  2. No more over-subscribing old hardware – if you have to re-use old hardware then do it with care. Set thresholds that don’t allow oversubscribed CPU or memory.
  3. Reduce or eliminate build rigor – automate, or at least streamline, the build process to cut down on tickets and tasks. This may be the biggest challenge to solve because unless you’ve already converged your teams, there are likely 4 or 5 teams owning different parts of the workflow.

And finally, if these suggestions don’t fix the problem here’s a hint from my free eBook: is there a CONTROL PROBLEM?

2. Sprawl & Drift

The next reason I want to cover is due to software(s). And for the sake of not over-complicating, let me include operating systems and hypervisors along with applications in this topic.

Here’s some interesting history. At a few places where I have consulted, I’ve had to deal with 4 different hypervisors running and getting managed by different teams (silos) within the same organization. VMs aren’t the only servers that can sprawl.

We’re talking free-for-all to do anything in Dev, QA and Prod.

But that’s not all.

Then there’s a mixture of Windows and Linux still being supported that goes back to the days before the Internet.

Here’s an example most of us have seen: 

A key business application that is slow and crashes randomly, especially when under a load.

In almost every case where I’ve had to troubleshoot this problem, it normally comes down to drift in the configuration of the servers or App. When I say drift, I am referring to inconsistencies between configurations on database nodes, web servers, or application servers.

What do I mean?

If your application is designed to run in an application pool then you need to be able to validate each server has the same build config. This is the same requirement for virtual or physical hardware and it covers everything all the way up to the application settings.

Build and configuration management, as well as standardization, are desired outcomes when DevOps is working. But they are not easy outcomes to achieve without unique scripting skills.

Here are a few more ideas to help your plan:

  1. Reign in the rogue environments – stop allowing a free-for-all. Pick one or two hypervisors and set some standards. This includes halting upgrades for new versions until a plan is in place to ensure compliance with other interfacing resources.
  2. Deal with OS drift – update all OS templates and/or images used for deploying servers.
  3. Test and evaluate some of the more common DevOps tools that are used for automating environment deployments, and are also used for configuration management. Here’s a link for popular DevOps Tools.

And finally, if these suggestions are not helpful here’s another hint from my free eBook: is there a TRUST problem?

3. Communication Breakdown

Let me ask a question.

Think about the last 30 days. How many times have you had to deal with friction and tension between system admins, database admins, network admins, data center staff, and developers because something was built or configured wrong? Take your time…

If this is happening in your IT shop then people are not communicating.

And, yes, this happens even when you use automation!

I’ve seen this over and over in my career as an engineer, and as an Operations Manager, and it drives me nuts.

Unless you have a cookie-cutter process for adding capacity to your application, someone needs to be talking with others to figure out (and agree) on the plan from the beginning.

Here are a few planning suggestions:

  1. Technical discussion – get the right people talking with the person(s) making the request. Far too often a business analyst is having this conversation and information, and requirements, are not getting vetted until the environment is built and doesn’t work.
  2. Technical drawings – understanding how a new environment will be laid out before building servers should be a no-brainer, but far too often server requests are made without even a drawing on a whiteboard. Remember GIGO?
  3. Hardware and software requirements – validate beforehand that there is a plan and approval for resources. Vendors will often leave out these facts when they are selling the business a shiny new software application. Finding out early in the process if MS SQL licenses, new servers, and storage hardware are needed. These are important facts that could cause big problems later.

Good communication practices are at the core of DevOps, which is why discovering a problem sooner than later helps to avoid timely disruptions and friction caused by the blame game.

4. Fuzzy Leadership

Like most IT professionals, I want to know what the requirements are: what’s expected?

This is why a clear message and mission are absolutely important for DevOps to succeed. People (especially IT people) need clarity or we will not change the way we operate.

People also rally around leaders who are decisive. Notice I did not say ‘perfect’ leader!

Even great leaders make mistakes. There is nothing wrong with setting a course and then adjusting it as you go.

DevOps is about the journey to a destination, and the continuous improvements that are made as often as are needed.

Let’s review these destinations…

The improvement is with hardware – the destination is transparent, and with software the destination is consistent, and with communication, the destination is having awareness (which may not always mean agreement); and with leadership, the destination is clear expectations about what the outcome should look like.

Here are suggestions for leaders who are struggling with a failing DevOps initiative:

  1. Review your message – is your mission clear on what you want to achieve by making the changes you are asking people to make? Have you clearly defined what the outcome should look like? Is it realistic given the staff and resources you are working with?
  2. Review your organization – are there people and processes causing friction that are not necessary? Trust and control are 2 hang-ups I’ve covered over and over that hinder DevOps from really getting up-to-speed. Are they why your DevOps efforts are failing?
  3. Have a plan – let’s be real, okay! People will always be people. Getting from transition to transformation will require leadership and not just expecting people to do the right thing. Setting clear expectations for behavior is important. And so is enforcing those expectations…
  4. Rewards – always look for ways to publicly recognize people who do the right thing without being asked.
  5. And here’s a final tip. Don’t be phony because IT Pros resent phony leaders!

Reality check…

I really hope this list helps you get back on track.

Here’s the truth. Improving service delivery isn’t easy, even with DevOps. And if you haven’t already dealt with these 4 DevOps failures (or the many others that are awaiting you) then you are not trying hard enough to change.

Failure is painful, I know. But I’m NOT suggesting you avoid what I’ve been talking about. Not at all. I want you to face these challenges head-on and then figure out why your hardware problems exist, why your software stack keeps crashing the service, or why you spend so much time rework because nobody is communicating.

And finally, why nobody is taking you seriously and DevOps has become a four-letter word – or worst, a joke – because it’s not happening except in the stories you are telling yourself.

What’s next?

Read the DevOps Handbook by Gene Kim then rethink your DevOps plan…

2 Comments

  1. Arvind Kumar

Leave a Reply