Re-Thinking Storage Architecture (Is VSAN A Better Plan For Failures?)

VSAN storage

Bullet Proofing Storage!

My body still cringes when I remember the moment I heard the news…”The storage crashed!”

It was 4 years ago (2010) when I lived through a large-scale outage that was caused by a SAN failure.  Which BTW, according to experts, “Should have never happened”, but it did.

For 7 straight days my team and I orchestrated the recovery of nearly 500 virtual servers that vanished when the storage crashed. Needless to say, it occurred during peak business hours.

And hour after hour as we brought servers back online, I had the dreaded task of sending email updates to developers, business partners, VPs and C-level staff.

They all kept asking me the same question…how did this happen?

LIFE changing event.

Let me tell you when something like this happens you’re not the same afterwards.

When it’s over, you walk away a different person. And no matter what any vendor [or zealot] says about a cool SAN, Server, or Network device, you still can’t completely get it out of you head. You’ve experienced nothing is perfect.

Re-Thinking Storage Architecture

Since the experience I just shared, there have been many new storage solutions come to market. Names such as: Pure Storage, Nimble, and Tintri come to mind.

They all advertise some new concept for handling scaling and performance. But recently the solution which has captured my attention for a more practical reason is VSAN.

Why? Because it spreads out the failure domain almost as wide at you want it. And it can be built on lower cost server hardware. The benefit is all your VMs are not in one aggregate of spindles.

But let me be clear! I am not naive, there are still Pros and Cons to using VSAN that will have to be worked out.

I know it won’t be easy to change  how IT thinks about storage, i.e. SAN, NAS, NFS, FC, iSCSI.

We love BIG storage! But face it, there’s too much risk when all the data and VMs are in one place and the cost goes up when you replicate and snap to yet other SANs.

Big is out, Wide is in.

How do we move away from BIG storage architecture with 100’s or 1000’s of spindles? I say with smaller, wider, storage pools running directly on ESXi. For me it’s a no-brain-er because I see the value.

Recently I met with one of VMware’s leading experts on the topic, Rawlison Rivera, VCDX, and we went deep into the nuts and bolts how VSAN works. Yes, his pitch was more about software defined storage but let me say my interest was more in the practical application, smaller failure domains.

Big storage SANs will always have their place but spreading your applications across multiple ESXi hosts with multiple copies really interests me. Sure, snapshots and replication do this also but unless you have a proven and tested disaster recovery strategy (which we thought we had when our storage crashed), then your just banking on [HOPE].

I’ve learned my lesson.

As a tempered OPS manager, I don’t hope anymore. I test. I make sure we are not making assumptions. My thinking since the SAN failure has changed. I want smaller failure domains with less impact to the business when something goes BANG!

Do your own research and you’ll find there are many solutions on the market for spreading fault domains to reduce the impact of hardware failures. But for me, I am listening to what my GUT tells me; and what VMware has to say about VSAN. Why? Because my thinking about storage has changed.

This is obviously only one use-case, and VSAN will only work for providing shared storage for VMware products for now. I’m guessing this might change in the future. As for how to architect storage using VSAN, there’s much to come and how to design VSAN books are yet to be written according to a quick check on Amazon.

The other option is to use a smaller data storage server for each workload. That’s what we’ll cover next time.

Wrap Up:

I’ve briefly covered what has change my mind about BIG storage solutions. I also covered what intrigues me about VSAN and why.

I see a new trend coming which is why VMware is getting ahead of it with their own storage solution. Why?  Because the risk is too high running all your VMs from a big pool of storage and it’s our responsibility to protect the business when technology fails. Spreading the risk out on to smaller fault domains makes sense.

What I took away from the experience I started this post with was eye-opening: A reality check that even a million dollar storage platform can fail! And when it happens it isn’t the vendor who rebuilds…

Leave a Reply

vSphere Data Protection EOA
3 Alternative vSphere Backup Replacements For VDP

Important Notification: vSphere Data Protection (VDP) End of Availability (EOA) That...

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker

Editor’s note: Keith Barker has been a CBT Nuggets Trainer since 2012. Some...

VMware Interview Questions
25 VMware Interview Questions And Answers: Tough & Technical (Download PDF)

Free VMware vSphere Interview Guide In this VMinstall Guide I’ll share a...

Project Photon
VMware Project Photon: Technical Review for Linux Admins

My quick and dirty review of Project Photon. I was very happy...

DevOps Plan
Best DevOps Strategy Hack (Winners Start With Why)

What is DevOps? It’s a model. It’s a culture. It’s a...

DevOps Tools Download
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)

Hack Your Resume with DevOps Skills Lately I’ve been getting a...

DevOps Guide
Microsoft DevOps: 25+ Powerful Windows SysAdmin Skills (NEW Guide!)

What is Microsoft DevOps? This is an interesting question because first off,...

continuous integration tools
DevOps Training Courses That Will Give You The Edge (Perfect For Beginners Who Want Skills That Are Noticed By Google & Facebook…)

When I was just a fledgling starting out as a Jr...

Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Part 4 – VPS Case Study To Make Rank Higher In Google Through Faster Hosting & High Availability @ Lower Cost

This is going to be a quick and dirty part 4...

Best Hosting for WordPress
Case Study: Best VPS For Easy WordPress Migration

Why I moved my WordPress blog off slow shared hosting… Let’s...

cloudways review updated
11 Cloudways Review Pros & Cons From My Deep Dive Testing (VPS Series: Part 2)

Which is a better cPanel alternative of a VPS – Cloudways...

ServerPilot Review
ServerPilot Review: Installing WordPress On DigitalOcean (VPS Series: Part 1)

How to install WordPress on a cloud server like a Pro...

What’s So Interesting About Woz U?

On 10/12/2017, I had the privilege of attending the AZ Tech...

tensorflow training
TensorFlow Tools for Beginners (7 Easy Takeaways To Get You Started)

Introduction To Deep Learning As a follow-up to my book review...

job rut
How To Get MOVING When You’re Stuck In A Career Rut (Video)

You worked hard to get your degree and after graduation you...

Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)

3 BIG Skills New Storage Engineers Are Missing Out On… Behind...

Open Source Storage Software
3 Open Source Storage Software Alternatives For Beginners

Build Your Own Storage Solution! Reflecting on my trip to VMworld...

VMware VCP FOR Hire eBook
Read Excerpts from VCP for Hire eBook

BIG Thanks… Before I share the excerpt of VCP for Hire,...

How to Self Publish Your VMware eBook KDP
How to Self Publish Your VMware eBook KDP

Do you have a VMware book you’d like to self publish...

free chef for 5 servers
Free Hosted Opscode to Try Chef On 5 Servers Before You Buy

Is Chef the configuration management and automation tool for your operations team?...