5.5

What happens in a VSAN cluster in the case of an SSD failure?

Duncan Epping · Dec 19, 2013 ·

The question that keeps coming up over and over again at VMUG events, on my blog and the various forums is: What happens in a VSAN cluster in the case of an SSD failure? I answered the question in one of my blog posts around failure scenarios a while back, but figured I would write it down in a separate post considering people keep asking for it. It makes it a bit easier to point people to the answer and also makes it a bit easier to find the answer on google. Lets sketch a situation first, what does (or will) the average VSAN environment look like:

In this case what you are looking at is:

4 host cluster
Each host with 1 disk group
Each disk group has 1 SSD and 3 HDDs
Virtual machine running with a “failures to tolerate” of 1

As you hopefully know by now a VSAN Disk Group can hold 7 HDDs and requires an SSD on top of that. The SSD is used as a Read Cache (70%) and a Write Buffer (30%) for the components stored on it. The SSD is literally the first location IO is stored; as depicted in the diagram above. So what happens when the SSD fails?

When the SSD fails the whole Disk Group and all of the components will be reported as degraded or absent. The state (degraded vs absent) will depend on the type of failure, typically though when an SSD fails VSAN will recognize this and mark it as degraded and as such instantly create new copies of your objects (disks, vmx files etc) as depicted in the diagram above.

From a design perspective it is good to realize the following (for the current release):

A disk group can only hold 1 SSD
A disk group can be seen as a failure domain
- E.g. as such there could be a benefit in creating 2 x 3HDD+1SSD versus 6HDD+1SSD diskgroup
SSD availability is critical, select a reliable SSD! Yes some consumer grade SSDs do deliver a great performance, but they typically also burn out fast.

Let is be clear that if you run with the default storage policies you are protecting yourself against 1 component failure. This means that 1 SSD can fail or 1 host can fail or 1 disk group can fail, without loss of data and as mentioned typically VSAN will quickly recreate the impacted objects on top of that.

Doesn mean you should try safe money on reliability if you ask me. If you are wondering which SSD to select for your VSAN environment I recommend reading this post by Wade Holmes on the VMware vSphere Blog. Especially take note of the Endurance Requirements section! If I had to give a recommendation though, the Intel S3700 seems to still be the sweet spot when it comes to price / endurance / performance!

Re: VMware VSAN VS the simplicity of hyperconvergence

Duncan Epping · Dec 11, 2013 ·

I was reading this awesome article by “the other” Scott Lowe. (That is how he calls himself on twitter.) I really enjoyed the article and think it is a pretty fair write-up. Although I’m not sure I really agree with some of the statements or conclusions drawn. Again, do not get me wrong… I really like the article and effort Scott has put in, and I hope everyone takes the time to read it!

A couple of things I want to comment on:

VMware VSAN VS the simplicity of hyperconvergence

I guess I should start with the title… Just like for companies like SimpliVity (Hey guys congrats on winning the well deserved award for best converged solution) and Nutanix their software is the enabler or their hyper-converged solution. Virtual SAN could be that, if you buy a certain type of hardware of course that is.

Hyper-converged infrastructure takes an appliance-based approach to convergence using, in general, commodity x86-based hardware and internal storage rather than traditional storage array architectures. Hyper-converged appliances are purpose-built hardware devices.

Keyword in this sentence if you ask me is “purpose-built”. In most cases there is nothing purpose-built about the hardware. (Except for SimpliVity as they use a purpose built component for deduplication.) In May of 2011 I wrote about these HPC Servers that SuperMicro was selling and how they could be a nice platform for virtualization, I even ask in my article which company would be the first to start using these in a different way. Funny, as I didn’t know back then that Nutanix was planning on leveraging these which was something I found out in August of 2011. The servers used by most of the Hyper-converged players today those HPC servers and are very much generic hardware devices. The magic is not the hardware being used, the magic is the software if you ask me and I am guessing vendors like Nutanix will agree on me that.

Due to its VMware-centric nature and that fact that VSAN doesn’t present typical storage constructs, such as LUNs and volumes, some describe it as a VMDK storage server.

Not sure I agree with this statement. What I personally actually like about VSAN is that it does present a “typical storage construct” namely a (Virtual SAN) data store. From a UI point of view it just looks like a regular datastore. When you deploy a virtual machine the only difference is that you will be picking a VM Storage Policy on top of that, other than that it is just business as usual. For users, nothing new or confusing about it!

As is the case in some hybrid storage systems, VSAN can accelerate the I/O operations destined for the hard disk tier, providing many of the benefits of flash storage without all of the costs. This kind of configuration is particularly well-suited for VDI scenarios with a high degree of duplication among virtual machines where the caching layer can provide maximum benefit. Further, in organizations that run many virtual machines with the same operating system, this breakdown can achieve similar performance goals. However, in organizations in which there isn’t much benefit from cached data — highly heterogeneous, very mixed workloads — the overall benefit would be much less.

VSAN can accelerate ANY type of I/O if you ask me. It has a write buffer and a read cache. Depending on the size of your working set (active data), the size of the cache and the type of policy used you should always benefit regardless of the type of workload used. From a writing perspective as mentioned it will always go to the buffer, but from a read perspective your working set should be in cache. Of course there are always types of workloads where this will not apply but for the majority it should.

VSAN is very much a “build your own” approach to the storage layer and will, theoretically, work with any hardware on VMware Hardware Compatibility list. However, not every hardware combination is tested and validated. This will be one of the primary drawbacks to VSAN…

This is not entirely true. VMware is working on a program called Virtual SAN ready nodes. These Virtual SAN ready nodes will be pre-configured, certified and tested configurations which are optimized for things like performance / capacity etc. I haven’t seen the final list yet, but I can imagine certain vendors like for instance Dell and HP will want to list specific types of servers with an X number of Disks and a specific SSD types to ensure optimal user experience. So although VSAN is indeed a “bring your own hardware” solution, but I think that is the great thing about VSAN… you have the flexibility to use the hardware you want to use. No need to change your operational procedures because you are introducing a new type of hardware, just use what you are familiar with.

PS: I want to point out there are some technical inaccuracies in Scott’s post. I’ve pointed these out and am guessing they will be corrected soon.

Removing a disk group from a VSAN host

Duncan Epping · Dec 4, 2013 ·

I had been playing around with my VSAN cluster for a bit the last couple of weeks and it literally has become messy. I created many VMs and many snapshots and removed many of those again, all of this while pulling cables of servers and pulling disks. Basically stress testing VSAN while injecting faults to see how it responds. It was time to clean up and upgrade to a later build as the beta refresh was just released. After deleting a bunch of VMs I noticed that not everything was removed, I had also uploaded ISOs and some other random stuff which I probably should not have. Anyway, I needed to clean one of my hosts up.

I figured I would use RVC for the exercise just to get a bit more familiar with it. First I wanted to check what the current state was of my cluster, I used the “vsan.disks_stats” command:

Then I figured I would want to just simply remove the disk group for Server “prmb-esx08” using “vsan.host_wipe_vsan_disks”:

Note that you can also do this using the UI:

Go to your cluster
Click “Manage” and “Virtual SAN” -> “Disk Management”
Select the “Disk Group” and click the “Remove the Disk Group” icon

VSAN VDI Benchmarking and Beta refresh!

Duncan Epping · Nov 26, 2013 ·

I was reading this blog post on VSAN VDI Benchmarking today on Vroom, the VMware Performance blog. You see a lot of people doing synthetic tests (max iops with sequential reads) on all sorts of storage devices, but lately more and more vendors are doing these more “real world performance tests”. While reading this article about VDI benchmarking, and I suggest you check out all parts (part 1, part 2, part 3), there was one thing that stood out to me and that was the comparison between VSAN and an All Flash Array.

The following quotes show the strength of VSAN if you ask me:

we see that VSAN can consolidate 677 heavy users (VDImark) for 7-node and 767 heavy users for 8-node cluster. When compared to the all flash array, we don’t see more than 5% difference in the user consolidation.

Believe me when I say that 5% is not a lot. If you are actively looking at various solutions, I would highly recommend to include the “overhead costs” to your criteria list as depending on the solution chosen this could make a substantial difference. I have seen other solutions requiring a lot more resources. But what about response time, cause that is where the typical All Flash Array shines… ultra low latency, how about VSAN?

Similar to the user consolidation, the response time of Group-A operations in VSAN is similar to what we saw with the all flash array.

Both very interesting results if you ask me. Especially the < 5% in user consolidation is what stood out to me the most! Once again, for more details on these tests read the VDI Benchmarking blog part 1, part 2, part 3!

Beta Refresh

For those who are testing VSAN, there is a BETA refresh available as of today. This release has a fix for the AHCI driver issue… and it increases the disk group limit from 6 to 7. From a disk group perspective this will come in handy as many servers have 8, 16 or 24 disk slots allowing you to do 7HHDs + 1 SSD per group. Also some additional RVC commands have been added in the storage policy space, I am sure they will come in handy!

Nice side affect of the number of HDDs going up is increase in max capacity:

(8 hosts * (5 diskgroups * 7 HDDs)) * Size of HDD = Total capacity

With 2 TB disks this would result in:

(8 * (5 * 7)) * 2TB = 560TB

Now keep on testing with VSAN and don’t forget to report feedback through the community forums or your VMware rep.

Virtual SAN and maintenance windows…

Duncan Epping · Nov 25, 2013 ·

After writing the article that “4 is the minimum number of hosts for VSAN” I received a lot of questions via email and on twitter etc about the cost associated with it and if this was a must. Let me start with saying that I wrote this article to get people thinking about Sizing their VSAN environment. When it comes to it, Virtual SAN and maintenance windows can be a difficult topic.

I guess there are a couple of things to consider here. Even in a regular storage environment you typically do upgrades in a rolling fashion meaning that if you have two controllers one will be upgraded while they other handles IO. In that case you are also at risk. The thing is though, as a virtualization administrator you have a bit more flexibility, and you expect certain features to work as expected like for instance vSphere HA. You need to ask yourself what is the level of risk I am willing to take, the level of risk I can take?

When it comes to placing a host in to Maintenance Mode, from a VSAN point of view you will need to ask yourself:

Do I want to move data from one host to another to maintain availability levels?
Do I just want to ensure data accessibility and take the risk of potential downtime during maintenance?

I guess there is something to say for either. When you move data from one node to another, to maintain availability levels, your “maintenance window” could be stretched extremely long. As you would potentially be copying TBs over the network from host to host it could take hours to complete. If your ESXi upgrade including a host reboot takes about 20 minutes, is it acceptable to wait for hours for the data to be migrated? Or do you take the risk, inform your users about the potential downtime, and as such do the maintenance with a higher risk but complete it in minutes rather than hours? After those 20 minutes VSAN would sync up again automatically, so no data loss etc.

It is impossible for me to give you advice on this one to be honest, I would highly recommend to also sit down with your storage team. Look at what their current procedures are today, what they have included in their SLA to the business (if there is one), and how they handle upgrades / periodic maintenance.