Virtual SAN Read IO – cache / buffer / spindles

I had a question around how Virtual SAN read IO is handled when data can be anywhere: read cache, write buffer, disks. On VMTN one of the engineers recently explained this. I figured I would create a quick diagram to illustrate it. Basically how it works is that VSAN will check the read cache, if the block that needs to be read is not available in the read cache it will check whether the block is in the write buffer or on disk. Simple right?

In the scenario I drew below two blocks needs to be read. Block 1 is actively served by ESXi-01 and Block 2 is actively served by ESXi-03. In the case of ESXi-01 the block resides in the read cache so it is read from the cache. In the case of ESXi-03 it is not in the read cache and neither in the write buffer, hence it is read from the magnetic disks. Do note that this is 1 virtual machine, so reads are being served from 2 hosts and depending who is actively serving IO for that block the block can reside on that host in the read cache. The host which is not actively serving IO for that block will also not place the block in the read cache! (Of course if the host which is actively serving IO for a block fails the other host will take over.)

I hope that helps.

How to remove a host from your Virtual SAN cluster

The question “How to remove a host from your Virtual SAN cluster” has now popped up various times, so I figured I would write a short article around what the current procedure is. It is fairly straight forward to be honest, here we go:

  1. Place host in maintenance mode
  2. Delete disk group when “maintenance mode” is completed
  3. Move host out of the cluster
  4. Remove the VSAN VMkernel (not a requirement, but I prefer to clean things up)

That is it, now you can re-purpose the host for anything else.

Access Denied on wp-admin / wp-login page

I have had this problem various times in the last 6 months on my WordPress blog. For whatever reason all of a sudden when I access my wp-admin page I receive an access denied on wp-admin / wp-login.php. Really annoying as you can imagine as it means you cannot get in to the back-end of your blog making it impossible to manage it. The blog it self is still available though when this happens so I figured I would write down the fix, as I somehow keep forgetting it.

  • FTP in to host
  • Check file permissions on wp-login.php, in my case permissions on this file are somehow magically always “00”
  • Change permissions back to 0644
  • Fixed, you should be able to login again

Simple solution, for a strange problem. If I ever find the rootcause I will post it here as well.

How about an All Flash Virtual SAN?

Yeah that title got your attention right… For now it is just me writing about it and nothing has been announced or promised. At VMworld I believe it was Intel who demonstrated the possibilities in this space, an All Flash Virtual SAN. A couple of weeks back during my holiday someone pointed me to a couple of articles which were around SSD endurance. Typically these types of articles deal with the upper-end of the spectrum and as such are irrelevant to most of us, and some of the articles I have read in the past around endurance were disappointing to be honest. however decided to look at consumer grade SSDs. We are talking about SSDs like the Intel 335, Samsung 840 series, Kingston Hyper-X and the Corsair Neutron. All of the SSDs used had a capacity of around 250GB and are priced anywhere between $175 and $275. Now if you look at the guarantees given in terms of endurance, we are talking about anything ranging from “20GB of writes per day for the length of its three-year warranty” for the Intel (22TB in total) to three-year and 192TB in total for the Kingston, and anything in between for the other SSDs.

Tech Report had set their first checkpoint at 22TB. After running through a series of tests, which are described in the article, they compare the results between the various SSDs after 22TB writes. Great to see that all SSDs did what they are supposed to do and promised. All of them passed the 22TB mark without any issues. They had another checkpoint at the 200TB mark, which showed the first signs of weakness. As expected the lower end SSDs dropped out first. The next checkpoint was set at the 300TB mark, they also added an unpowered retention test to see how well they retain data when unplugged. So far impressive results, and a blog series I will follow with interest. The articles clearly show that from an endurance perspective the SSDs perform a lot better than most had assumed in the past years. It is fair to say that the consumer grade SSDs are up to the challenge.

Considering the low price points of these flash devices, I can see how an All Flash Virtual SAN solution would be possible leveraging these consumer grade SSDs as the capacity tier (reads) and using enterprise grade SSDs to provide write performance (write buffer). Hopefully we will start to see the capacity increase even further of these types of devices, today some of them go up to 500GB others up to 800GB, wouldn’t it be nice to have a 1TB (or more) version?

Anyway, I am excited and definitely planning on running some test with an all flash Virtual SAN solution in the future… What about you?

** 500TB blog update! **
** 600TB blog update! **
** 1PB blog update! **
** 2PB blog update **
** Conclusion **

How to calculate what your Virtual SAN datastore size should be

I have had this question so many times I figured I would write an article about it, how to calculate what your Virtual SAN datastore size should be? Ultimate this determines which kind of server hardware you can use, which disk controller you need and which disks… So it is important that you get it right. I know the VMware Technical Marketing team is developing collateral around this topic, when that has been published I will add a link here. Lets start with a quote by Christian Dickmann one of our engineers as it is the foundation of this article:

In Virtual SAN your whole cluster acts as a hot-spare

Personally I like to work top-down, meaning that I start with an average for virtual machines or a total combined number. Lets take an example to go through the exercise, makes it a bit easier to digest.

Lets assume the average VM disk size is 50GB. On average the VMs have 4GB of memory provisioned. And we have 100 virtual machines in total that we want to run on a 4 host cluster. Based on that info the formula would look something like this:

(total number of VMs * average VM size) + (total number of VMs * average VM memory size) = total capacity required

In our case that would be:

(100 * 50GB) + (100 * 4GB) = 5400 GB

So that is it? Well not really, like every storage / file system there is some overhead and we will need to take the “failures to tolerate” in to account. If I set my “failures to tolerate” to 1 than I would have 2 copies of my VMs, this means I need 5400 GB * 2 = . Personally I also add an additional 10% in disk capacity to ensure we have room for things like: meta data, log files, vmx files and some small snapshots when required. Note that VSAN by default provisions all VMDKs as thin objects (note that swap files are thick, Cormac explained that here), so there should be room available regardless. Better safe than sorry though. This means that 10800 GB actually becomes 11880 GB. I prefer to round this up to 12TB. The formula I have been using thus looks as follows:

(((Number of VMs * Avg VM size) + (Number of VMs * Avg mem size)) * FTT+1) + 10%

Now the next step is to see how you divide that across your hosts. I mentioned we would have 4 hosts in our cluster. We have two options, we create a cluster that can re-protect itself after a full host failure or we create cluster that cannot. Just to clarify, in order to have 1 host of spare capacity available we will need to divide the total capacity by 3 instead of 4. Lets look at those two options, and what the impact is:

  • 12TB / 3 hosts = 4TB per host (for each of the 4 hosts)
    • Allows you re-protect (sync/mirror) all virtual machine objects even when you lose a full host
    • All virtual machines will maintain availability levels when doing maintenance
    • Requires an additional 1TB per host!
  • 12TB / 4 hosts = 3TB per host (for each of the 4 hosts)
    • If all disk space is consumed, when a host fails virtual machines cannot be “re-protected” as there would be no capacity to sync/mirror the objects again
    • When entering maintenance mode data availability cannot be maintained as there would be no room to sync/mirror the objects to another disk

Now if you look at the numbers, we are talking about an additional 1TB per host. With 4 hosts, and lets assume we are using 2.5″ SAS 900GB Hitachi drives that would be 4 additional drives, at a cost of around 1000 per drive. When using 3.5″ SATA drives the cost would be a lot lower even. Although this is just a number I found on the internet it does illustrate that the cost of providing additional availability could be small. Prices could differ though depending on the server brand used. But even at double the cost, I would go for the additional drive and as such additional “hot spare capacity”.

To make life a bit easier I created a calculator. I hope this helps everyone who is looking at configuring hosts for their Virtual SAN based infrastructure.