Re-Thinking Storage Architecture (Is VSAN A Better Plan For Failures?)

VSAN storage

Bullet Proofing Storage!

My body still cringes when I remember the moment I heard the news…”The storage crashed!”

It was 4 years ago (2010) when I lived through a large-scale outage that was caused by a SAN failure.  Which BTW, according to experts, “Should have never happened”, but it did.

For 7 straight days my team and I orchestrated the recovery of nearly 500 virtual servers that vanished when the storage crashed. Needless to say, it occurred during peak business hours.

And hour after hour as we brought servers back online, I had the dreaded task of sending email updates to developers, business partners, VPs and C-level staff.

They all kept asking me the same question…how did this happen?

LIFE changing event.

Let me tell you when something like this happens you’re not the same afterwards.

When it’s over, you walk away a different person. And no matter what any vendor [or zealot] says about a cool SAN, Server, or Network device, you still can’t completely get it out of you head. You’ve experienced nothing is perfect.

Re-Thinking Storage Architecture

Since the experience I just shared, there have been many new storage solutions come to market. Names such as: Pure Storage, Nimble, and Tintri come to mind.

They all advertise some new concept for handling scaling and performance. But recently the solution which has captured my attention for a more practical reason is VSAN.

Why? Because it spreads out the failure domain almost as wide at you want it. And it can be built on lower cost server hardware. The benefit is all your VMs are not in one aggregate of spindles.

But let me be clear! I am not naive, there are still Pros and Cons to using VSAN that will have to be worked out.

I know it won’t be easy to change  how IT thinks about storage, i.e. SAN, NAS, NFS, FC, iSCSI.

We love BIG storage! But face it, there’s too much risk when all the data and VMs are in one place and the cost goes up when you replicate and snap to yet other SANs.

Big is out, Wide is in.

How do we move away from BIG storage architecture with 100’s or 1000’s of spindles? I say with smaller, wider, storage pools running directly on ESXi. For me it’s a no-brain-er because I see the value.

Recently I met with one of VMware’s leading experts on the topic, Rawlison Rivera, VCDX, and we went deep into the nuts and bolts how VSAN works. Yes, his pitch was more about software defined storage but let me say my interest was more in the practical application, smaller failure domains.

Big storage SANs will always have their place but spreading your applications across multiple ESXi hosts with multiple copies really interests me. Sure, snapshots and replication do this also but unless you have a proven and tested disaster recovery strategy (which we thought we had when our storage crashed), then your just banking on [HOPE].

I’ve learned my lesson.

As a tempered OPS manager, I don’t hope anymore. I test. I make sure we are not making assumptions. My thinking since the SAN failure has changed. I want smaller failure domains with less impact to the business when something goes BANG!

Do your own research and you’ll find there are many solutions on the market for spreading fault domains to reduce the impact of hardware failures. But for me, I am listening to what my GUT tells me; and what VMware has to say about VSAN. Why? Because my thinking about storage has changed.

This is obviously only one use-case, and VSAN will only work for providing shared storage for VMware products for now. I’m guessing this might change in the future. As for how to architect storage using VSAN, there’s much to come and how to design VSAN books are yet to be written according to a quick check on Amazon.

The other option is to use a smaller data storage server for each workload. That’s what we’ll cover next time.

Wrap Up:

I’ve briefly covered what has change my mind about BIG storage solutions. I also covered what intrigues me about VSAN and why.

I see a new trend coming which is why VMware is getting ahead of it with their own storage solution. Why?  Because the risk is too high running all your VMs from a big pool of storage and it’s our responsibility to protect the business when technology fails. Spreading the risk out on to smaller fault domains makes sense.

What I took away from the experience I started this post with was eye-opening: A reality check that even a million dollar storage platform can fail! And when it happens it isn’t the vendor who rebuilds…

  • 50

Leave a Reply

vSphere Data Protection EOA
Finding The Best vSphere Backup Replacement For VDP (3 Alternatives)

Important Notification: vSphere Data Protection (VDP) End of Availability (EOA) That...

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker

Editor’s note: Keith Barker has been a CBT Nuggets Trainer since 2012. Some...

VMware Interview Questions
25 VMware Interview Questions And Answers: Tough & Technical (Download PDF)

Free VMware vSphere Interview Guide In this VMinstall Guide, I’ll share...

Project Photon
VMware Project Photon: Technical Review for Linux Admins

My quick and dirty review of Project Photon. I was very happy...

The Best Blockchain Jobs
10 Best Blockchain Jobs Near You (Perfect Match For DevOps Skills)

We’ll cover the best Blockchain jobs in a minute but first, let...

DevOps Plan
Best DevOps Strategy Hack (Winners Start With Why)

As IT Leaders, we’re all looking for innovative ways to improve ourselves,...

DevOps Tools Download
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)

Hack Your Resume with DevOps Skills Lately, I’ve been getting a...

DevOps Guide
The Best Microsoft DevOps Skills: 25+ For Windows SysAdmins

What is Microsoft DevOps? This is an interesting question because first off,...

Shift to the Cloud – Will It Shift the Jobs by Danish Wadhwa

Editor’s note: Danish Wadhwa is a strategic thinker and an IT Pro....

VULTR Reviews
VULTR Review – Best 2018 VPS Alternatives (25 PROs CONs & FAQs)

Should Your App, Website or Blog be Powered by VULTR VPS?...

Assessing Your Company's Cloud Readiness
13 Cloud Readiness Assessment Tips To Guide Your Migration Success (Updated)

Planning your migration to the cloud? If you’re planning to move...

devops engineer skills
11 Best DevOps Skills: Finding The Elusive DevOps Engineer (Updated)

What It Takes To Be A DevOps Engineer in 2018 Growing Demand...

Nutanix Software
Nutanix Software: Find Out If Hyper Converged Infrastructure Is All Hype

Small. Fast. Smart. Hey, thanks for dropping in… If you’re looking...

What’s So Interesting About Woz U?

On 10/12/2017, I had the privilege of attending the AZ Tech...

TensorFlow Tools
TensorFlow Tools (7 Takeaways Toward A Deep Learning Career)

Intro: Deep Learning & A.I. Technology As a follow-up to my...

job rut
How To Get MOVING When You’re Stuck In A Career Rut (Video)

You worked hard to get your degree and after graduation you...

VULTR Reviews
VULTR Review – Best 2018 VPS Alternatives (25 PROs CONs & FAQs)
Website for Small Business
Best Small Business Website Packages (6 Alternatives That Rank!)
Rise of the Robots
Rise Of The Robots Review (You Won’t Guess What’s Going On!)
Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Learn To Rank Higher In Google