Years of managing virtualization has given me a unique perspective most IT Managers will soon develop as their vSphere Operations Management tasks grow exponentially.
The perspective is – vSphere Operations Management is time-consuming and work-intensive, to say the least…
Before we get started, let me recommend also reading my new guide to the best step by step VMware vSphere training online.
Sharing a Lesson Learned
Yes, virtualization is cool and fun to set up in the beginning, but the lesson I learned is – later on, vSphere operations management can cause nightmares due to all the tasks required to maintain health and performance service levels. Tasks such as:
- Keeping servers and storage hardware operational & firmware and drivers updated.
- Updating vCenter and VMtools & installing ESXi patches and hotfixes.
- Performing 100s of server P2Vs and storage migrations that can go on for months.
These are all operational tasks that need to be done on a regular basis (see the larger list of tasks below).
In my book, VCP for Hire, I share more lessons learned and provide job descriptions that cover many of the engineering and operations duties required for maintaining most vSphere environments.
Excerpt Job Description from VCP for Hire:
The Virtualization Administrator (VA) is responsible for handling build, “Support” and “Operations”.
They perform day-to-day incident management, build and expand environments and perform daily operational tasks.
The VA role is responsible for remediation of issues reported via service desk tickets that are related to anything affecting hardware, software, host or VM.
They also support the hosted operating system running on the virtual machine. They are responsible for building and expanding of the virtual environment as it grows i.e., adding more ESXi hosts, memory, or disk space to a single or many VMs.
They solve issues related to the data network interfaces on a host or cluster, and may deal with other technology teams on changes that have been requested by the…
Excerpt of VA Skills
- 2 – 4 years of hands-on experience as a Windows or Linux system administrator. Providing day-to-day operating system support. Engineering and designing server environments that include a combination of application, web, and databases, as well as AD, DNS, and file and print services.
- 1 – 3 years of hands-on experience as a hybrid SysAdmin supporting a small virtual infrastructure of 5 – 20 ESXi hosts with a hundred or more VMs. Experience handling daily duties, incident management, and changes to the environment that are required to develop the knowledge and instincts required for success as a VA.
- At least 1 year of hands-on network and storage experience. This will provide a foundation required for supporting environments that could potentially impact, or be impacted, by other technologies that converge in virtual infrastructure.
- The ability to read and understand detailed Visio drawings of virtual environments which include deep-dive views of network, storage and server hardware configurations. An understanding of data-flow traffic diagrams and network port ladders.
*The point I’m making is you’ll need someone who can handle all the operations tasks.
Tools for Keeping up with Operations
As virtualization technology evolves, there have been many vendors that have created tools to help with vSphere Operations Management duties.
On VMinstall, I’ve written about Veeam, vKernel, HotLink, VMExplore and ManageEngine – they all offer bits and pieces to assist the infrastructure team deal with all the tasks required to keep virtual environments optimized and healthy.
Another tool that has been added recently is VMware vCenter Operations Manager. It’s definitely a few steps up from most tools, and as of 2/2013, it’s being offered to SMB at a discount.
In hindsight, I can understand why VMware would offer Operation Manager to SMB customers – vSphere operations management is a nightmare for small teams.
vSphere Operations Task List
- Updating ESXi security patches and hotfixes. (This can be done using rolling maintenance-mode outages in clustered environments.)
- Updating security patches on the virtual server operating system. (Sometimes this is handled by server owners.)
- Updating VMtools after major ESXi upgrades. (This may or may not require rebooting virtual servers; worst case it can require rebooting all VMs.)
- Server P2V migrations. (Sometimes this can require complete server rebuilding when the old physical server will not P2V.)
- VM deployments. (Unless self-service is set up, this can be a big “time” consumer.)
- Building and keeping Gold Templates up-to-date. (This can be time-consuming and requires a lot of testing if 3rd party apps are included in the template.)
- Routine virtual environment health checks to reclaim system resources. (zombie VMs, storage, IPs, snapshots and orphan files.)
- Physical and virtual server “Break-Fix” or incident tickets. (Seems like they never stop.)
- Adding and changing networks and datastores. (Something is always changing.)
- Adding and changing vCenter permissions. (Worst case, server admins want admin rights to vCenter.)
- Adding or changing ESXi host capacity. (Time-consuming and you normally don’t know what’s coming until it’s too late.)
- Life-cycle of old server hardware. (Decommissioning old hardware can be time-consuming.)
- Creating and updating virtualization process and procedures documentation. (Very unpopular task.)
- Creating daily, weekly and monthly operations and capacity reports. (Extremely unpopular task!)
- Handling NOC escalations for performance problems and service outages. (Unfortunately, VMware seems to get blamed first and much of the time virtualization teams must prove it’s not VMware before the network or storage teams will look at their own environment for the root cause of a problem.)
- Handling “Drive By” requests for just about anything and everything that someone wants to be done ASAP!
- Almost forgot! Handling backups, monitoring, audits and vCenter upgrades…
As you can see, the vSphere operations management task list is long and many of the items cannot be handled by a tool and “will” require IT Managers to schedule work after hours to complete some of the service impacting tasks. I also recommend that you take a look at all the VMware lessons and best practices in case there is something else you can use.
Do you have tips or suggestion about vSphere Operations Management, please comment.