Re-Thinking Storage Architecture (Is VSAN A Better Plan For Failures?)

VSAN storage

Bullet Proofing Storage!

My body still cringes when I remember the moment I heard the news…”The storage crashed!”

It was 4 years ago (2010) when I lived through a large-scale outage that was caused by a SAN failure.  Which BTW, according to experts, “Should have never happened”, but it did.

For 7 straight days my team and I orchestrated the recovery of nearly 500 virtual servers that vanished when the storage crashed. Needless to say, it occurred during peak business hours.

And hour after hour as we brought servers back online, I had the dreaded task of sending email updates to developers, business partners, VPs and C-level staff.

They all kept asking me the same question…how did this happen?

LIFE changing event.

Let me tell you when something like this happens you’re not the same afterwards.

When it’s over, you walk away a different person. And no matter what any vendor [or zealot] says about a cool SAN, Server, or Network device, you still can’t completely get it out of you head. You’ve experienced nothing is perfect.

Re-Thinking Storage Architecture

Since the experience I just shared, there have been many new storage solutions come to market. Names such as: Pure Storage, Nimble, and Tintri come to mind.

They all advertise some new concept for handling scaling and performance. But recently the solution which has captured my attention for a more practical reason is VSAN.

Why? Because it spreads out the failure domain almost as wide at you want it. And it can be built on lower cost server hardware. The benefit is all your VMs are not in one aggregate of spindles.

But let me be clear! I am not naive, there are still Pros and Cons to using VSAN that will have to be worked out.

I know it won’t be easy to change  how IT thinks about storage, i.e. SAN, NAS, NFS, FC, iSCSI.

We love BIG storage! But face it, there’s too much risk when all the data and VMs are in one place and the cost goes up when you replicate and snap to yet other SANs.

Big is out, Wide is in.

How do we move away from BIG storage architecture with 100’s or 1000’s of spindles? I say with smaller, wider, storage pools running directly on ESXi. For me it’s a no-brain-er because I see the value.

Recently I met with one of VMware’s leading experts on the topic, Rawlison Rivera, VCDX, and we went deep into the nuts and bolts how VSAN works. Yes, his pitch was more about software defined storage but let me say my interest was more in the practical application, smaller failure domains.

Big storage SANs will always have their place but spreading your applications across multiple ESXi hosts with multiple copies really interests me. Sure, snapshots and replication do this also but unless you have a proven and tested disaster recovery strategy (which we thought we had when our storage crashed), then your just banking on [HOPE].

I’ve learned my lesson.

As a tempered OPS manager, I don’t hope anymore. I test. I make sure we are not making assumptions. My thinking since the SAN failure has changed. I want smaller failure domains with less impact to the business when something goes BANG!

Do your own research and you’ll find there are many solutions on the market for spreading fault domains to reduce the impact of hardware failures. But for me, I am listening to what my GUT tells me; and what VMware has to say about VSAN. Why? Because my thinking about storage has changed.

This is obviously only one use-case, and VSAN will only work for providing shared storage for VMware products for now. I’m guessing this might change in the future. As for how to architect storage using VSAN, there’s much to come and how to design VSAN books are yet to be written according to a quick check on Amazon.

The other option is to use a smaller data storage server for each workload. That’s what we’ll cover next time.

Wrap Up:

I’ve briefly covered what has change my mind about BIG storage solutions. I also covered what intrigues me about VSAN and why.

I see a new trend coming which is why VMware is getting ahead of it with their own storage solution. Why?  Because the risk is too high running all your VMs from a big pool of storage and it’s our responsibility to protect the business when technology fails. Spreading the risk out on to smaller fault domains makes sense.

What I took away from the experience I started this post with was eye-opening: A reality check that even a million dollar storage platform can fail! And when it happens it isn’t the vendor who rebuilds…

WAIT! Before You Go... Join Other Go-Getters

Enter your name & email address below to get started.

We're listening. Go ahead and share your ideas?

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker
25 VMware Interview Questions: Tough & Technical (Download)
VMware jobs in Phoenix
VMware Jobs in Phoenix Arizona (Comparing Opportunities)
DevOps training
10 Reasons VMware Admins Should Boost OpenStack Skills
DevOps Tools
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)
MSDevOps Guide
Microsoft DevOps: 25+ Power-Packed Windows SysAdmin Skills (NEW Guide!)
continuous integration tools
DevOps Training Courses That Will Give You The Edge (Perfect For Beginners Who Want Skills That Are Noticed By Google & Facebook…)
Why DevOps Fails: How To Get Back On Track
Why DevOps Fails: How To Get Back On Track
Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Part 4 – VPS Case Study To Make Rank Higher In Google Through Faster Hosting & High Availability @ Lower Cost
Best Hosting for WordPress
Case Study: Best VPS For Easy WordPress Migration
cloudways review updated
Cloudways Review: 12 Pros & Cons From My Deep Dive Testing (VPS Series: Part 2)
ServerPilot Review: Installing WordPress On DigitalOcean (VPS Series: Part 1)
Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)
Rise of the Robots
I’m Amazed After Reading Rise of the Robots – Best Jobs In 2017 (You won’t Guess what’s going on!)
top virtual reality jobs
30+ Awesome Virtual Reality Jobs That Take You From Boring To Amazing! (Must Read…)
Thinking Like Seth Godin
How To STOP A 300% Salary Loss Because YOU Think Like A SysAdmin: Start Thinking Like Seth Godin (Read Now!)
3 Open Source Storage Software Alternatives For Beginners
shades of devops
Free DevOps eBook Project (Read It Now!)
How to install Genymotion
How To Install Genymotion for Beginners (Includes Google Play)
20 Questions VMware Newbies Want Answered (Free Career Guide)