Virtual SAN and Data Locality/Gravity

I was reading this article by Michael Webster about the benefit of Jumbo Frames. Michael was testing what the impact was from both an IOps and latency perspective when you run Jumbo Frames vs non-Jumbo Frames. Michael saw a clear benefit:

  • Higher IOps
  • Lower latency
  • Lower CPU utilization

I would highly recommend reading Michael’s full article for the details, I don’t want to steal his thunder. Now what was most interesting is the following quote, I highly regard Michael he is a smart guy and typically spot-on:

I’ve heard reports that some people have been testing VSAN and seen no noticeable performance improvement when using Jumbo Frames on the 10G networks between the hosts. Although I don’t have VSAN in my lab just yet my theory as to the reason for this is that the network is not the bottleneck with VSAN. Most of the storage access in a VSAN environment will be local, it’s only the replication traffic and traffic when data needs to be moved around that will go over the network between VSAN hosts.

As I said, Michael is a smart guy and as I’ve seen various people asking questions around this and it isn’t a strange assumption to make that with VSAN most IO will be local, I guess this is kind of the Nutanix model. But VSAN is no Nutanix. VSAN takes a different approach, a completely different approach and this is important to realize.

I guess with a very small cluster of 3 nodes Michael chances of IO being local are bigger, but even then IO will not only be local at a minimum 50% (when failures to tolerate is set to 1) due to the data mirroring. So how does VSAN handle this, what are some of the things to keep in mind, lets starts with some VSAN principles:

  • Virtual SAN uses an “object model”, objects are stored on 1 or multiple magnetic disks and hosts.
  • Virtual SAN hosts can access “objects” remotely, both read and write.
  • Virtual SAN does not have the concept of data locality / gravity, meaning that the object does not follow the virtual machine, reason for this is that moving data around is expensive from a resource perspective.
  • Virtual SAN has the capability to read from multiple mirror copies, meaning that if you have 2 mirror copies IO will be distributed equally.

What does this mean? First of all, lets assume you have an 8 host VSAN cluster. You have a policy configured for availability: N+1. This means that the objects (virtual disks) will be on two hosts (at a minimum). What about your virtual machine from a memory and CPU point of view? Well it could be on any of those 8 hosts. With DRS being envoked every 5 minutes at a minimum I would say that chances are bigger that the virtual machine (from a CPU/Memory) resides on one of the 6 hosts that does not hold the objects (virtual disk). In other words, it is likely that I/O (both reads and writes) are being issued remote.

From an I/O path perspective I would like to re-iterate that both mirror copies can and will serve I/O, each would serve ~50% of the I/O. Note that each host has a read cache for that mirror copy, but blocks in read cache are only stored once, this means that each host will “own” a set of blocks and will serve data for those be it from cache or be it from spindles. Easy right?

Now just imagine you have configured your “host failures” policy set to 2. I/O can now come from 3 hosts, at a minimum. And why do I say at a minimum? Because when you have the stripe width configured or for whatever reason striping goes across hosts instead of disks (which is possible in certain scenarios) then I/O can come from even more hosts… VSAN is what I would call a true fully distributed solution! Below is an example of “number of failures” set to 1 and “stripe width” set to 2, as can be seen there are 3 hosts that are holding objects.

Lets reiterate that. When you define “host failures” as 1 and stripe width as 1 then VSAN can still, when needed, stripe across multiple disks and hosts. When needed, meaning when for instance the size of a VMDK is larger than a single disk etc.

Now lets get back to the original question Michael asked himself, does it make sense to use Jumbo Frames? Michael’s tests clearly showed that it does, in his specific scenario that is of course. I have to agree with him that when (!!) properly configured it will definitely not hurt, so the question is should you always implement this? I guess if you can guarantee implementation consistency, then conducts tests like Michael did. See if it benefits you, and if it lowers latency and increase IOps I can only recommend to go for it.


PS: Michael mentioned that even when mis-configured it can’t hurt, well there were issues in the past… although they are solved now, it is something to keep in mind.

Virtual SAN webinars, make sure to attend!

Interested in Virtual SAN? VMware is organizing various webinars about Virtual SAN in the upcoming weeks. Last week there was an introduction on VSAN, you can watch the recording here. The next one is by no one less than Cormac Hogan. Cormac will talk about how to install and configure Virtual SAN and will discuss various do’s and don’ts. If anyone has a vast experience with running Virtual SAN than it is Cormac, so make sure to attend this webinar upcoming Wednesday the 2nd of October at 08:30 PDT. Recording can be found here!

There is another great webinar scheduled for Wednesday October the 9th at 08:30 PDT, which is all about Monitoring Virtual SAN. This webinar is hosted by one of the lead engineers on the Virtual SAN product: Christian Dickmann. Christian was also responsible for developing the RVC extensions for VSAN and I am sure he will do a deepdive on how to monitor VSAN, needless to say: highly recommended. I will update this page when I know more around when it will be hosted!

I created a folder on my VSAN datastore, but how do I delete it?

I created a folder on my VSAN datastore using the vSphere Web Client, but when I wanted to deleted it I received this error message that that wasn’t possible. So how do I delete a VSAN folder when I don’t need it any longer? It is fairly straight forward, you open up an SSH session to your host and do the following:

  • change directory to /vmfs/volumes/vsanDatastore
  • run “ls -l” in /vmfs/volumes/vsanDatastore to identify the folder you want to delete
  • run “/usr/lib/vmware/osfs/bin/osfs-rmdir <name-of-the-folder>” to delete the folder

This is what it would look like:

/vmfs/volumes/vsan:5261f0c54e0c785a-81e199f6c9a23d73 # ls -lah
total 6144
drwxr-xr-x    1 root     root         512 Sep 27 03:17 .
drwxr-xr-x    1 root     root         512 Sep 27 03:17 ..
drwxr-xr-t    1 root     root        1.4K Sep 24 05:38 16254152-1469-2c18-3319-002590c0c254
drwxr-xr-t    1 root     root        1.2K Sep 26 01:21 85803a52-6858-ded5-b40b-00259088447a
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 ISO -> e64d1b52-1828-04ca-95a8-00259088447e
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 TestVM -> ed31d351-a222-83bf-bb70-002590884480
drwxr-xr-t    1 root     root        1.4K Sep 27 01:40 cc8ebe51-6881-7dc8-37f8-00259088447e
drwxr-xr-t    1 root     root        1.2K Sep 27 01:52 e64d1b52-1828-04ca-95a8-00259088447e
drwxr-xr-t    1 root     root        1.2K Jul  3 07:52 ed31d351-a222-83bf-bb70-002590884480
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 iso -> 16254152-1469-2c18-3319-002590c0c254
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 -> cc8ebe51-6881-7dc8-37f8-00259088447e
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 vmw-iol-01 -> 85803a52-6858-ded5-b40b-00259088447a

/vmfs/volumes/vsan:5261f0c54e0c785a-81e199f6c9a23d73 # /usr/lib/vmware/osfs/bin/osfs-rmdir vmw-iol-01

Deleting directory 85803a52-6858-ded5-b40b-00259088447a in container id 5261f0c54e0c785a81e199f6c9a23d73 backed by vsan

Be careful though, cause when you delete it guess what… it is gone! Yes not being able to delete it using the Web Client is a known issue, and on the roadmap to be fixed.

vSphere HA advanced settings, the KB

I’ve posted about vSphere HA advanced settings various times in the past, and let me start by saying that you shouldn’t play around with them unless you have a requirement to do so. But if you do, there is a KB article which I can highly recommend as it lists all the known and lesser known advanced settings. I had the KB article updated with vSphere 5.5 advanced settings yesterday (Thanks KB team for being so responsive!) but it also applies to vSphere 5.0 and 5.1. Recommended read for those who want to get in to the nitty gritty details of vSphere HA.

Initialized disks to be used by VSAN task completed successfully, but no disks added?

I’ve seen various people reporting the following, they wanted to create a diskgroup in Virtual SAN / VSAN, the task completes successfully but you don’t see any disks. Strange right because it gives a green checkmark?! I had the exact same scenario today, but if you click the task more details are revealed as shown in the screenshot below:

There are a couple of reasons why this can happen:

  • Did not fill in the license
    • VSAN licenses are applied on a cluster level. Open the Webclient click on your VSAN enabled cluster, click the “Manage” tab followed by “Settings”. Under “Configuration” click “Virtual SAN Licensing” and then click “Assign License Key”.
  • Running nested and virtual disks used are too small
    • SSD/HDD needs to be 4GB at a minimum (I think, have not tried this extensively though)
  • Existing partition on disk found
    • Wipe the disk before using it, you can do this using partedUtil for instance.


If you can’t figure out why it happens, make sure to check the task details as it can give a pretty good hint even though it looks like it was successful errors could still be in there.

Something to know about vSphere Flash Read Cache

When I was looking in to vSphere Flash Read Cache (part of vSphere 5.5) there was one thing that had my interest, how does it interact with vSphere HA and DRS and more specifically, are there any caveats? It all started with the question what are the requirements for a virtual machine to be successfully restarted by vSphere HA?

The answer was simple, when you define a vSphere Flash Read Cache size for a virtual disk on a virtual machine, that amount of cache capacity defined for that virtual disk needs to be available on a local flash resource in order for the VM to be restarted / powered-on. So what does this mean? Well it means that when you set a flash read cache for a given virtual disk to 4GB, that 4GB needs to be available on your local host where the VM will be powered on. But what in the case of an HA initiated restart? Will HA ignore this requirement during restarts or will it try to guarantee the same performance? [Read more...]

The Compatibility Guides are now updated with VSAN and vFlash info!

For those wanting to play with Virtual SAN (VSAN) and vSphere Flash Read Cache (vFRC / vFlash), the compatibility guides are being updated at the moment. Hit the following URL to find out what is currently supported and what not:

  • For vSphere Flash Read Cache:
    • Select “VMware Flash Read Cache” from the drop down list titled “What are you looking for”.
    • Hit “update and view results”
  • For Virtual SAN:
    • Select “Virtual SAN (beta)” from the drop down list titled “What are you looking for”
    • Select “ESXi 5.5″ and click “Next”
    • Select a category (server, i/o controller, hdd, ssd), at the time of writing only server was available
    • Select the type of Server and click next
    • Now a list is presented of supported servers

I know both lists are short today, this is an on-going efforts and I know many vendors are now wrapping up and submitting their test reports, more to be added over the course of the next couple of weeks so keep on coming back to the compatibility guide.