Re-Thinking Storage Architecture (Is VSAN A Better Plan For Failures?)

VSAN storage

Bullet Proofing Storage!

My body still cringes when I remember the moment I heard the news…”The storage crashed!”

It was 4 years ago (2010) when I lived through a large-scale outage that was caused by a SAN failure.  Which BTW, according to experts, “Should have never happened”, but it did.

For 7 straight days my team and I orchestrated the recovery of nearly 500 virtual servers that vanished when the storage crashed. Needless to say, it occurred during peak business hours.

And hour after hour as we brought servers back online, I had the dreaded task of sending email updates to developers, business partners, VPs and C-level staff.

They all kept asking me the same question…how did this happen?

LIFE changing event.

Let me tell you when something like this happens you’re not the same afterwards.

When it’s over, you walk away a different person. And no matter what any vendor [or zealot] says about a cool SAN, Server, or Network device, you still can’t completely get it out of you head. You’ve experienced nothing is perfect.

Re-Thinking Storage Architecture

Since the experience I just shared, there have been many new storage solutions come to market. Names such as: Pure Storage, Nimble, and Tintri come to mind.

They all advertise some new concept for handling scaling and performance. But recently the solution which has captured my attention for a more practical reason is VSAN.

Why? Because it spreads out the failure domain almost as wide at you want it. And it can be built on lower cost server hardware. The benefit is all your VMs are not in one aggregate of spindles.

But let me be clear! I am not naive, there are still Pros and Cons to using VSAN that will have to be worked out.

I know it won’t be easy to change  how IT thinks about storage, i.e. SAN, NAS, NFS, FC, iSCSI.

We love BIG storage! But face it, there’s too much risk when all the data and VMs are in one place and the cost goes up when you replicate and snap to yet other SANs.

Big is out, Wide is in.

How do we move away from BIG storage architecture with 100’s or 1000’s of spindles? I say with smaller, wider, storage pools running directly on ESXi. For me it’s a no-brain-er because I see the value.

Recently I met with one of VMware’s leading experts on the topic, Rawlison Rivera, VCDX, and we went deep into the nuts and bolts how VSAN works. Yes, his pitch was more about software defined storage but let me say my interest was more in the practical application, smaller failure domains.

Big storage SANs will always have their place but spreading your applications across multiple ESXi hosts with multiple copies really interests me. Sure, snapshots and replication do this also but unless you have a proven and tested disaster recovery strategy (which we thought we had when our storage crashed), then your just banking on [HOPE].

I’ve learned my lesson.

As a tempered OPS manager, I don’t hope anymore. I test. I make sure we are not making assumptions. My thinking since the SAN failure has changed. I want smaller failure domains with less impact to the business when something goes BANG!

Do your own research and you’ll find there are many solutions on the market for spreading fault domains to reduce the impact of hardware failures. But for me, I am listening to what my GUT tells me; and what VMware has to say about VSAN. Why? Because my thinking about storage has changed.

This is obviously only one use-case, and VSAN will only work for providing shared storage for VMware products for now. I’m guessing this might change in the future. As for how to architect storage using VSAN, there’s much to come and how to design VSAN books are yet to be written according to a quick check on Amazon.

The other option is to use a smaller data storage server for each workload. That’s what we’ll cover next time.

Wrap Up:

I’ve briefly covered what has change my mind about BIG storage solutions. I also covered what intrigues me about VSAN and why.

I see a new trend coming which is why VMware is getting ahead of it with their own storage solution. Why?  Because the risk is too high running all your VMs from a big pool of storage and it’s our responsibility to protect the business when technology fails. Spreading the risk out on to smaller fault domains makes sense.

What I took away from the experience I started this post with was eye-opening: A reality check that even a million dollar storage platform can fail! And when it happens it isn’t the vendor who rebuilds…

Leave a Reply

vSphere Data Protection EOA
Finding The Best vSphere Backup Replacement For VDP (3 Alternatives)

Important Notification: vSphere Data Protection (VDP) End of Availability (EOA) That...

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker

Editor’s note: Keith Barker has been a CBT Nuggets Trainer since 2012. Some...

VMware Interview Questions
25 VMware Interview Questions And Answers: Tough & Technical (Download PDF)

Free VMware vSphere Interview Guide In this VMinstall Guide, I’ll share...

Project Photon
VMware Project Photon: Technical Review for Linux Admins

My quick and dirty review of Project Photon. I was very happy...

DevOps Plan
Best DevOps Strategy Hack (Winners Start With Why)

What is DevOps? It’s a model. It’s a culture. It’s a...

DevOps Tools Download
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)

Hack Your Resume with DevOps Skills Lately, I’ve been getting a...

DevOps Guide
The Best Microsoft DevOps Skills: 25+ For Windows SysAdmins

What is Microsoft DevOps? This is an interesting question because first off,...

continuous integration tools
The Best Automation Training Video Courses (DevOps Tools)

When I was just starting out as a Jr SysAdmin, I...

Assessing Your Company's Cloud Readiness
13 Cloud Readiness Assessment Tips To Guide Your Migration Success (Updated)

Planning your migration to the cloud? If you’re planning to move...

How Bare Metal Virtualization Made Cloud Computing A Reality
How Bare Metal Virtualization Made Cloud Computing A Reality

An Overview of Bare Metal Virtualization Let’s start by explaining what...

Advantages of Cloud Computing
Advantages of Cloud Computing (Private versus Public)

Advantages of Cloud Computing Whether you’re an enterprise IT department planning...

Free Offical Amazon VPC & EC2 Cloud Guides for Beginners

Free Cloud Guides for Beginners Are you researching Amazon’s AWS Virtual...

DesTechAZ
What’s So Interesting About Woz U?

On 10/12/2017, I had the privilege of attending the AZ Tech...

tensorflow training
TensorFlow Tools for Beginners (7 Easy Takeaways To Get You Started)

Introduction To Deep Learning As a follow-up to my book review...

job rut
How To Get MOVING When You’re Stuck In A Career Rut (Video)

You worked hard to get your degree and after graduation you...

Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)

3 BIG Skills New Storage Engineers Are Missing Out On… Behind...

VULTR Reviews
VULTR Review And Comparison: Best VPS Platforms Now Thru 2018
Best Hosting for WordPress
Case Study: Best VPS For Easy WordPress Migration
White Box Storage
White Box Storage that Rocks! Great for SME
cloudways review updated
11 Cloudways Review Pros & Cons From My Deep Dive Testing (VPS Series: Part 2)
Website for Small Business
Best Small Business Website Packages (6 Alternatives That Rank!)
Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Learn To Rank Higher In Google