• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

vSAN

Virtual SAN news flash pt 1

Duncan Epping · Oct 3, 2013 ·

I had a couple of things I wanted to write about with regards to Virtual SAN which I felt weren’t beefy enough to dedicate a full article to so I figured I would combine a couple of news worthy items and create a Virtual SAN news flash article / series.

  • I was playing with Virtual SAN last week and I noticed something I hadn’t noticed yet… I was running vSphere with an Enterprise license and I added the Virtual SAN license for my cluster. After adding the Virtual SAN license all of a sudden I had the Distributed Switch capability on the cluster I had VSAN licensed. Now I am not sure what this will look like when VSAN will go GA, but for now those who want to test with VSAN and want to use the Distributed Switch you can. Use the Distributed Switch to guarantee bandwidth (leveraging Network IO Control) to Virtual SAN when combining different types of traffic like vMotion / Management / VM traffic on a 10GbE pair. I would highly recommend to start playing around with this and get experienced. Especially because vSphere HA traffic and VSAN traffic are combined on a single NIC pair and you do not want HA traffic to be impacted by replication traffic.
  • The Samsung SM1625 SSD series (eMLC) has been certified for Virtual SAN. It comes in sizes ranging between 100Gb and 800GB and can do up to 120k IOps random read… Nice to see the list of supported SSDs expanding, will try to get my hands on one of these at some point to see if I can do some testing.
  • Most people by now are aware of the challenges there were with the AHCI controller. I was just talking with one of the VSAN engineers who mentioned that they have managed to do a full root cause analysis and pinpoint the root of this problem. Currently there is a team working on solving it and things are looking good and hopefully soon a new driver will be released, when we do I will let you guys know as I realize that many use these controllers in their home-lab.

Virtual SAN and Data Locality/Gravity

Duncan Epping · Sep 30, 2013 ·

I was reading this article by Michael Webster about the benefit of Jumbo Frames. Michael was testing what the impact was from both an IOps and latency perspective when you run Jumbo Frames vs non-Jumbo Frames. Michael saw a clear benefit:

  • Higher IOps
  • Lower latency
  • Lower CPU utilization

I would highly recommend reading Michael’s full article for the details, I don’t want to steal his thunder. Now what was most interesting is the following quote, I highly regard Michael he is a smart guy and typically spot-on:

I’ve heard reports that some people have been testing VSAN and seen no noticeable performance improvement when using Jumbo Frames on the 10G networks between the hosts. Although I don’t have VSAN in my lab just yet my theory as to the reason for this is that the network is not the bottleneck with VSAN. Most of the storage access in a VSAN environment will be local, it’s only the replication traffic and traffic when data needs to be moved around that will go over the network between VSAN hosts.

As I said, Michael is a smart guy and as I’ve seen various people asking questions around this and it isn’t a strange assumption to make that with VSAN most IO will be local, I guess this is kind of the Nutanix model. But VSAN is no Nutanix. VSAN takes a different approach, a completely different approach and this is important to realize.

I guess with a very small cluster of 3 nodes Michael chances of IO being local are bigger, but even then IO will not only be local at a minimum 50% (when failures to tolerate is set to 1) due to the data mirroring. So how does VSAN handle this, what are some of the things to keep in mind, lets starts with some VSAN principles:

  • Virtual SAN uses an “object model”, objects are stored on 1 or multiple magnetic disks and hosts.
  • Virtual SAN hosts can access “objects” remotely, both read and write.
  • Virtual SAN does not have the concept of data locality / gravity, meaning that the object does not follow the virtual machine, reason for this is that moving data around is expensive from a resource perspective.
  • Virtual SAN has the capability to read from multiple mirror copies, meaning that if you have 2 mirror copies IO will be distributed equally.

What does this mean? First of all, lets assume you have an 8 host VSAN cluster. You have a policy configured for availability: N+1. This means that the objects (virtual disks) will be on two hosts (at a minimum). What about your virtual machine from a memory and CPU point of view? Well it could be on any of those 8 hosts. With DRS being envoked every 5 minutes at a minimum I would say that chances are bigger that the virtual machine (from a CPU/Memory) resides on one of the 6 hosts that does not hold the objects (virtual disk). In other words, it is likely that I/O (both reads and writes) are being issued remote.

From an I/O path perspective I would like to re-iterate that both mirror copies can and will serve I/O, each would serve ~50% of the I/O. Note that each host has a read cache for that mirror copy, but blocks in read cache are only stored once, this means that each host will “own” a set of blocks and will serve data for those be it from cache or be it from spindles. Easy right?

Now just imagine you have configured your “host failures” policy set to 2. I/O can now come from 3 hosts, at a minimum. And why do I say at a minimum? Because when you have the stripe width configured or for whatever reason striping goes across hosts instead of disks (which is possible in certain scenarios) then I/O can come from even more hosts… VSAN is what I would call a true fully distributed solution! Below is an example of “number of failures” set to 1 and “stripe width” set to 2, as can be seen there are 3 hosts that are holding objects.

Lets reiterate that. When you define “host failures” as 1 and stripe width as 1 then VSAN can still, when needed, stripe across multiple disks and hosts. When needed, meaning when for instance the size of a VMDK is larger than a single disk etc.

Now lets get back to the original question Michael asked himself, does it make sense to use Jumbo Frames? Michael’s tests clearly showed that it does, in his specific scenario that is of course. I have to agree with him that when (!!) properly configured it will definitely not hurt, so the question is should you always implement this? I guess if you can guarantee implementation consistency, then conducts tests like Michael did. See if it benefits you, and if it lowers latency and increase IOps I can only recommend to go for it.

 

PS: Michael mentioned that even when mis-configured it can’t hurt, well there were issues in the past… although they are solved now, it is something to keep in mind.

I created a folder on my VSAN datastore, but how do I delete it?

Duncan Epping · Sep 27, 2013 ·

I created a folder on my VSAN datastore using the vSphere Web Client, but when I wanted to deleted it I received this error message that that wasn’t possible. So how do I delete a VSAN folder when I don’t need it any longer? It is fairly straight forward, you open up an SSH session to your host and do the following:

  • change directory to /vmfs/volumes/vsanDatastore
  • run “ls -l” in /vmfs/volumes/vsanDatastore to identify the folder you want to delete
  • run “/usr/lib/vmware/osfs/bin/osfs-rmdir <name-of-the-folder>” to delete the folder

This is what it would look like:

/vmfs/volumes/vsan:5261f0c54e0c785a-81e199f6c9a23d73 # ls -lah
total 6144
drwxr-xr-x    1 root     root         512 Sep 27 03:17 .
drwxr-xr-x    1 root     root         512 Sep 27 03:17 ..
drwxr-xr-t    1 root     root        1.4K Sep 24 05:38 16254152-1469-2c18-3319-002590c0c254
drwxr-xr-t    1 root     root        1.2K Sep 26 01:21 85803a52-6858-ded5-b40b-00259088447a
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 ISO -> e64d1b52-1828-04ca-95a8-00259088447e
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 TestVM -> ed31d351-a222-83bf-bb70-002590884480
drwxr-xr-t    1 root     root        1.4K Sep 27 01:40 cc8ebe51-6881-7dc8-37f8-00259088447e
drwxr-xr-t    1 root     root        1.2K Sep 27 01:52 e64d1b52-1828-04ca-95a8-00259088447e
drwxr-xr-t    1 root     root        1.2K Jul  3 07:52 ed31d351-a222-83bf-bb70-002590884480
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 iso -> 16254152-1469-2c18-3319-002590c0c254
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 las-fg01-vc01.vmwcs.com -> cc8ebe51-6881-7dc8-37f8-00259088447e
lrwxr-xr-x    1 root     root          36 Sep 27 03:17 vmw-iol-01 -> 85803a52-6858-ded5-b40b-00259088447a

/vmfs/volumes/vsan:5261f0c54e0c785a-81e199f6c9a23d73 # /usr/lib/vmware/osfs/bin/osfs-rmdir vmw-iol-01

Deleting directory 85803a52-6858-ded5-b40b-00259088447a in container id 5261f0c54e0c785a81e199f6c9a23d73 backed by vsan

Be careful though, cause when you delete it guess what… it is gone! Yes not being able to delete it using the Web Client is a known issue, and on the roadmap to be fixed.

Initialized disks to be used by VSAN task completed successfully, but no disks added?

Duncan Epping · Sep 25, 2013 ·

I’ve seen various people reporting the following, they wanted to create a diskgroup in Virtual SAN / VSAN, the task completes successfully but you don’t see any disks. Strange right because it gives a green checkmark?! I had the exact same scenario today, but if you click the task more details are revealed as shown in the screenshot below:

There are a couple of reasons why this can happen:

  • Did not fill in the license
    • VSAN licenses are applied on a cluster level. Open the Webclient click on your VSAN enabled cluster, click the “Manage” tab followed by “Settings”. Under “Configuration” click “Virtual SAN Licensing” and then click “Assign License Key”.
  • Running nested and virtual disks used are too small
    • SSD/HDD needs to be 4GB at a minimum (I think, have not tried this extensively though)
  • Existing partition on disk found
    • Wipe the disk before using it, you can do this using partedUtil for instance.

     

If you can’t figure out why it happens, make sure to check the task details as it can give a pretty good hint even though it looks like it was successful errors could still be in there.

Be careful when defining a VM storage policy for VSAN

Duncan Epping · Sep 19, 2013 ·

I was defining a VM storage policy for VSAN and it resulted in something unexpected. You might have read that when no policy is defined within vCenter that VSAN defaults to the following for availability reasons:

  • Failures to tolerate = 1

So I figured I would define a new policy and include “stripe width” in this policy. I wanted to have a stripe width of 2 and “failures to tolerate” set to the default of 1. I figured as “failures to tolerate” is set to 1 anyway by default I would specify it, but would just specify stripe width. Why add rules which already have the correct value right?

VM storage policy for VSAN

Well that is what I figured, no point in adding it… and this was the result:

Do you notice something in the above screenshot? I do… I see no “RAID 1” mentioned and all components reside on the same host, esx014, in this case. So what does that mean? It means that when you create a profile and do not specify “failures to tolerate” that is default to 0 and no mirror copies are created. This is not the situation you want to find yourself in! So when you define stripe width, make sure you also define “failures to tolerate”. Even better, when you create a VM Storage Policy always include “failures to tolerate. Below is an example of what my policy should have looked like.

VM storage policy for VSAN

So remember this: When defining a new VSAN VM Storage Policy always include “Number of failures to tolerate”! If you did forget to specify it, the nice thing here is that you can change VM Storage Policies on the fly and apply them directly to your VMs. Cormac has a nice article on this subject!

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 67
  • Page 68
  • Page 69
  • Page 70
  • Page 71
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Also visit!

For the Dutch-speaking audience, make sure to visit RunNerd.nl to follow my running adventure, read shoe/gear/race reviews, and more!

Do you like Hardcore-Punk music? Follow my Spotify Playlist!

Do you like 80s music? I got you covered!

Copyright Yellow-Bricks.com © 2026 · Log in