If you’ve been following my blog for any time, or if you’ve downloaded my free VMware beginner’s guide, then you probably know where I stand on OpenStack.
If not, here it is, I think OpenStack is awesome!
With that said, I wasn’t surprised when I read the post on Forbes by Ben Kepes, about PayPal’s decision to Axe VMware. (I did also notice this has been going on since 2013, but it looks like it’s finally happened.)
This decision by PayPal doesn’t surprise me one bit.
I’ve managed IT Operation on both cloud solutions, and it makes sense, and here’s why.
It’s because it takes a lot of time and staff to manage a Private OpenStack Cloud. But it takes a bit more time and staff to manage a vSphere Infrastructure of the same size.
Less hassle means more productivity.
Let me explain my point because it’s not how it sounds, it’s also about emotions.
Why You Should Boost Your OpenStack Skills
For those who are new to both VMware and OpenStack let’s begin with the three reasons mentioned in the Forbes article by PayPal’s VP of Global Infrastructure, Sri Shivananda:
OpenStack gives them the agility, availability and manageability they need…
In VMware’s defense, I would say vSphere also gives agility. For example, isn’t ESXi running under the hood for all VMware products from vCoud to Horizon Desktops? And can’t it support Windows and Linux?
Yes! But I don’t think that’s what Shri is talking about.
I’m going to read into Ben’s article a little because ‘Agility’ must mean something else.
Let’s think about this from a cloud architects perspective. Hmm… it must be something to do with the OpenStack API versus the vCoud or vSphere API.
Yup, there it is…
…PayPal has converted almost 100 percent of the traffic of its web/API and mid-tier services to run on an internal private cloud that has been built on OpenStack.
Here’s the rationale behind this comment.
The ability to easily couple applications via a JSON, RESTful, and SOAP APIs is a critical requirement for applications running in the cloud (even internal clouds).
As a matter of fact use cases like PayPal’s decision is why I wrote about why companies need a cloud readiness assessment.
Let’s see. Ben mentions 26 languages, 203 markets, and 162 million customers.
Wow, speaking as an Ops Manager, that would require serious availability – I’m guessing 365x24x7 and zero downtime to meet their SLA.
The largest VMware environment I’ve supported had almost 650 ESXi hosts and 7,000+ VMs running across multiple data centers to support a couple hundred thousand customers. But this doesn’t come close to supporting 162 million users.
Lessons I’ve learned.
In the last 3 to 5 years I can recall some cases when my VMware team was scrambling to recover from different types of wide-scale outages that were VMware specific. And to be fair, there were also incidents that were the network, storage, and server hardware related.
What I’ve learned along the way from these experiences is to take availability very serious, which is why I ‘think service first,’ now.
Ops managers should always be thinking about (and avoiding) impact to customers (not servers). Always!
Building VMware or OpenStack environments to fail and transparently transfer workloads during the failure is crucial to availability.
OpenStack running with Mirantis or CoreOS does have true clustering versus VMware HA, which causes minutes of downtime while HA handles the failure. (Yes, I know vSphere now supports fault tolerance in 6.x, which is good for those who can afford the taxing of their systems to build duplicate business critical servers.)
Admins, even a few minutes of downtime can have a huge impact on customers!
I like to read blogs about VMware. And there’s popular guys who get a lot of attention writing a lot about shiny new objects (SNO). But take it from someone managing large teams of admins and engineers, who deals with the day-to-day feeding and keeping of a growing-expanding vSphere, it’s not all SNO – most of the time the job is hard work!
Getting back to Ben’s article though, I can understand where Shri is coming from on manageability because every quarter or so my team had 10,000+ VMs that needed to reboot multiple times because of VMware updates. (This is on top of regular Windows and Linux updates.)
Disruptive and time-consuming.
Now to be fair, OpenStack releases updates as well. And depending on the hypervisor, reboots are going to happen, too. But will it be as bad as putting 650 ESXi hosts into maintenance-mode and doing rolling patches and reboots for 1000’s of virtual servers?
A key takeaway here for admins and managers – disruptions are customer impacting and painful!
Cost isn’t mentioned in the article, but I’m sure it was on the list of reasons for Shri’s decision.
Without many of the tools, VMware charges additional fees for you are limited to building and provision VMs manually or by using templates, which unfortunately is what many IT departments still do.
Yup, this means teams are still throwing tickets over the fence back and forth across silos. I’ve seen this problem everywhere I’ve worked and it’s a huge manageability issue that sucks time down the drain.
OpenStack on the other-hand is open-source ‘for now,’ and supported by huge open-source communities. They also have 3rd party partners who can offer paid support, if required. And the list of open-source tools available to automate environments, and server build and configuration via API calls, is amazing.
But let me note, open-source tools like Chef and Puppet can also work for VMware, which is a GREAT opportunity for VMware admins to get creative!
Regardless which cloud solution or tools you use, scripting always comes in handy.
‘Return On Investment’ is another reason that wasn’t mentioned, but I will speak honestly about the fallacy of cost savings by trying to run ESXi 5.5 on old hardware.
Yes, it can be done. And the virtual machines will run pretty good. But when you do the math, is the return on a $2,000 – 4,000 license justified running 4 – 8 virtual machines on an old Dell 1950 with 24 GB of memory? Not!
Whereas, is running free OpenStack with KVM on old hardware and getting the same or better performance a better ROI? Absolutely YES!
To clarify, I’m only bringing up old server hardware because not everyone can buy new UCS hardware. Some IT teams don’t have SNO to play with and are still running production on hardware they have had for 5 – 7 years.
Is there anything wrong with getting 2 – 3 more years out of your old hardware? Well, yes and no!
Yes, if you want to save on power and cooling, and get higher density.
And no if it doesn’t matter, and budget is the constraint.
But, I also want to mention that KVM runs on new hardware too, so the difference comes down to ROI.
I don’t know which operating systems PayPal uses to run their applications on (probably several), but I’m going to take a guess they have a huge Linux presence in their cloud. This means they have a bunch of sharp Linux admins, engineers, and architects who love scripting.
Now from my experience working with Linux teams, I’ve known a few, more than 10 Linux admins, who all want to run their favorite Linux Distro on something other than ESXi. You know it’s true!
Does Linux run on VMware? Yes, it does. And it runs well. But I think it’s more of an emotional choice.
Here’s why I say it’s emotional…
Many vSphere clouds have Linux and Windows sharing the same host and storage. And hardcore Linux admins just don’t dig that. Which is another reason I bet PayPal is axing VMware?
My gut tells me their Linux teams have shared their input and OpenStack is a more natural to their way of life. It makes them feel good, which is important!
Hence, here’s my prediction and basis for this list…
Larger infrastructure environments that support huge numbers of Linux servers that are supported by large Linux teams will soon start axing VMware for OpenStack purely out of preference (emotional choice).
There it is. Emotions. Write this down and remember it!
This is purely speculation, and I don’t have the data to support it, but I have dealt with it sometimes on a smaller scale with rouge deployments of KVM and OpenStack. This is why Linux skills are also important for VMware admins.
I think it’s crucial for new and old VMware admins to learn Linux ASAP because it allows an easy transition to OpenStack in the event your VP or CTO axes VMware, too!
6. Bare Metal Servers
I’ve always wanted to see my bare metal servers (not running ESXi) in the vCenter inventory, but it’s not possible (anyways not out-of-the-box). But OpenStack has a feature called Bare Metal Services that can provision and manage bare metal servers. This is another difference I like about OpenStack.
The Bare Metal Service is a collection of components that provides support to manage and provision physical machines.
Why is this so important? Because unfortunately, there are still applications that require bare metal servers, even in the cloud.
Yes, you can virtualize them and run a single VM that consumes all the system resources as a way around this. But at the cost of an another license – and in some cases, heightened emotions of the DBA or application owner who wants dedicated physical hardware.
I’m not going to go too deep on the compatibility topic.
If it runs ESXi, 99.999% chance it will run KVM.
8. Converged Technologies
Let’s talk about converging technology for a moment.
Many IT leaders are reconsidering the old methods of putting 1000’s of VMs in a single large pool of storage. Why?
Because even the best storage solutions can crash, which makes the risk of something going wrong very real.
Like many other Ops Managers, I’ve had my share of storage-related outages so I understand why some CTO’s might want smaller aggregates of VMs. And on a global scale, VMware has also heard and seen the fallout from these mishaps and has come to the rescue with VSAN.
What is VSAN?
VSAN (virtual SAN) is a cool new way VMware is using local disk to build shared scale-out storage.
But I must add, converging storage with server hardware is not only an additional cost in SAS disks, SSD, and upgraded controllers; it’s also another VMware license cost.
While on the other hand, many OpenStack environments are leveraging similar ways to converge technologies using open-source products such as Ceph for handling storage. And Ceph fits right into OpenStack with Chef cookbooks for automating configurations.
And while I’m on the topic of converged technology, I might as well throw out another buzz word – ‘software-defined networking.’
Yes, the same can be said for networking, which is another reason I keep harping on admins to update their scripting and coding skills.
If you haven’t already played with Docker, then you are missing out.
I admit I went bonkers for a while when I was reading The Docker Book.
Yes, Docker containers run on VMware VMs running Linux, and they’ll even run on Windows using Boot2Docker. But the best place to run Docker containers is on bare metal Linux servers.
OpenStack works well with Docker and has the magic to provision bare metal Linux servers, and then with Chef or another tool, spin up containers running 100’s of different applications and configs from Github.
I understand that Google Search and Gmail are unique Linux containers similar to how Docker works. Read more about Google containers.
Ben’s article doesn’t mention it but I think it’s worth mentioning here. I’m talking about DevOps. Let me put a bit of context around this.
I’ve managed a few large Ops teams and environments, and all of them were having troubles with automating vSphere. Why?
Because we were trying to do it without the VMware vCloud and Orchestration components. The money just wasn’t in the budget to add the licenses.
So we did what most IT departments are doing. We had a couple of good VMware engineers building infrastructure. And a few Windows systems admins building and configuring Windows VMs from templates.
But I noticed something different going on with the Linux team.
They were figuring out how to automate the deployment of complete Linux and MySQL environments. And they were doing it without vCloud Director. How were they doing it?
They were doing it the DevOps way – using Chef and Ansible.
DevOps is the new normal.
Along with better agility, availability, and manageability; I assure you DevOps is really the reason why PayPal (and many more to come Linux shops) will be moving to OpenStack.
Here’s my Proof!
In my research for this list, I found this video on the Puppet Labs website about PayPal. Coincidentally, it was made in 2013 when PayPal started talking about moving away from VMware.
Puppet Video – Case Study of PayPal
The Writing on the Wall
My closing thoughts why PayPal has opted for OpenStack as a VMware alternative is because of better agility, availability, and manageability; it was because they have a large Linux team using DevOps to run their operations.
This is why I keep advising anyone considering a career in IT to learn Linux, OpenStack and become a highly skilled script creator who can write code, fix bugs, and automate everything possible via an API.
VMware, do you see the writing on the wall? It’s become an emotional choice that you are losing with Linux admins!
Where to get training.
There you have it. A list of 10 reasons why I say VMware admins should start developing OpenStack skills and experience. But not only OpenStack but also learn to be a scripting guru who fits into a DevOps culture.
There’s a lot of places online where you can learn for free, and many vendors offer classes in just about every city. There’s also CBT Nuggets and PluralSight online, but here’s a good low-cost place to get DevOps training that covers the key scripting and cloud skills that will get you started, today.
Finally, build yourself a home lab on VirtualBox and get going building your OpenStack skills!
Build it and tear it down until you are confident enough to put it on your resume.
Please feel free to share this list! Thanks…