Server

Does a vSAN IO Limit impact resync traffic?

Duncan Epping · Jun 12, 2019 ·

A question just came in, and I figured other people may have the same question so I would share it. The question was if a vSAN IO limit would impact resync traffic or for instance SvMotion? In this case the customer defines limits within each policy to ensure VMs do not interfere with other VMs or excessively uses IO resources. Especially in cloud environments this can be useful, or when running production and test/dev on the same cluster. The concern, of course, was if this limit would impact for instance recovery times after a failure. Because you can imagine that a limit of 50 IOPS would be devastating when a VM (or multiple VMs) need to have objects resynced.

The answer is simple: no, the IO limit specified within a policy does not impact resync traffic (or SvMotion for that matter). It only applies to Guest IO to a VMDK, namespace or swap object. Which means that it is safe to set limits when it comes to recovery times.

Site locality in a vSAN Stretched Cluster?

Duncan Epping · May 28, 2019 ·

On the community forums, a question was asked around the use of site locality in a vSAN Stretched Cluster. When you create a stretched cluster in vSAN you can define within a policy how the data needs to be protected. Do you want to replicate across datacenters? Do you want to protect the “site local data” with RAID-1 or RAID-5/6? All of these options are available within the UI.

What if you decide to not stretch your object across locations, is it mandatory to specify which datacenter the object should reside in?

The answer is simple: no it is not. The real question, of course is, would be: should you define the location? Most definitely! If you wonder how to do this, simplicy specify it within the policy you define for these objects as follows:

The above screenshot is taken from the H5 client, if you are still using the Web Client it probably looks slightly different (Thanks Seamus for the screenshot):

Why would you do this? Well, that is easy to explain. When the objects of a VM get provisioned the decision will be made per object where to place it. If you have multiple disks, and you haven’t specified the location, you could find yourself in the situation where disks of a single non-stretched VM are located in different datacenters. This is, first of all, terrible for performance, but maybe more importantly also would impact availability when anything happens to the network between the datacenters. So when you use site locality for non-stretched VMs, make sure to also configure the location so that your VM and objects will align as demonstrated in the below diagram.

Impact of adding Persistent Memory / Optane Memory devices to your VM

Duncan Epping · May 22, 2019 ·

I had some questions around this in the past month, so I figured I would share some details around this. As persistent memory (Intel Optane Memory devices for instance) is getting more affordable and readily available more and more customers are looking to use it. Some are already using it for very specific use cases, usually in situations where the OS and the App actually understand the type of device being presented. What does that mean? At VMworld 2018 there was a great session on this topic and I captured the session in a post. Let me copy/paste the important bit for you, which discusses the different modes in which a Persistent Memory device can be presented to a VM.

vPMEMDisk = exposed to guest as a regular SCSI/NVMe device, VMDKs are stored on PMEM Datastore
vPMEM = Exposes the NVDIMM device in a “passthrough manner, guest can use it as block device or byte addressable direct access device (DAX), this is the fastest mode and most modern OS’s support this
vPMEM-aware = This is similar to the mode above, but the difference is that the application understands how to take advantage of vPMEM

But what is the problem with this? What is the impact? Well when you expose a Persistent Memory device to the VM, it is not currently protected by vSphere HA, even though HA may be enabled on your cluster. Say what? Yes indeed, the VM which has the PMEM device presented to it will be disabled for vSphere HA! I had to dig deep to find this documented anywhere, and it is documented in this paper. (Page 47, at the bottom.) So what works and what not? Well if I understand it correctly:

vSphere HA >> Not supported on vPMEM enabled VMs, regardless of the mode
vSphere DRS >> Does not consider vPMEM enabled VMs, regardless of the mode
Migration of VM with vPMEM / vPMEM-aware >> Only possible when migrating to host which has PMEM
Migration of VM with vPMEMDISK >> Possible to a host without PMEM

Also note, as a result (data is not replicated/mirrored) a failure could potentially lead to loss of data. Although Persistent Memory is a great mechanism to increase performance, it is something that should be taken into consideration when you are thinking about introducing it into your environment.

Oh, if you are wondering why people are taking these risks in terms of availability, Niels Hagoort just posted a blog with a pointer to a new PMEM Perf paper which is worth reading.

I want vSphere HA to use a specific Management VMkernel interface

Duncan Epping · Apr 30, 2019 ·

This comes up occasionally, customers have multiple Management VMkernel interfaces and see vSphere HA traffic on both of the interfaces, or on the incorrect interface. In some cases, customers use the IP address of the first management VMkernel interface to connect the host to vCenter and then set an isolation address that is on the network of the second management VMkernel so that HA uses that network. This is unfortunately not how it works. I wrote about this 6 years ago, but it never hurts to reiterate as it came up twice over the past couple of weeks. The “management” tickbox is all about HA traffic. Whether “Management” is enabled or not makes no difference for vCenter or SSH for instance. If you create a VMkernel interface without the Management tickbox enabled you can still connect to it over SSH and you can still use the IP to add it to vCenter Server. Yes, it is confusing, but that is how it works.

If you want the interface to not be used by vSphere HA, make sure to untick “Management”. Note, this is not the case for vSAN, with vSAN automatically the vSAN network is used by HA. This only pertains to traditional, non-vSAN based, infrastructures.

Mixing versions of ESXi in the same vSphere / vSAN cluster?

Duncan Epping · Apr 15, 2019 ·

I have seen this question being asked a couple of times in the past months, and to be honest I was a bit surprised people asked about this. Various customers were wondering if it is supported to mix versions of ESXi in the same vSphere or vSAN Cluster? I can be short whether this is support or not, yes it is but only for short periods of time (72hrs max). Would I recommend it? No, I would not!

Why not? Well mainly for operational reasons, it just makes life more complex. Just think about a troubleshooting scenario, you now need to remember which version you are running on which host and understand the “known issues” for each version. Also, for vSAN things are even more complex as you could have “components” running on a different version of ESXi. On top of that, it could even be the case that a certain command or esxcli namespace is not available on a particular version of ESXi.

Another concern is when doing upgrades or updates, you need to take the current version into account when updating, or more importantly when upgrading! Also, remember that firmware/driver combination may be different for a particular version of vSphere/vSAN as well, this could also make life more complex and definitely increases the chances of making mistakes!

Is this documented anywhere? Yes, check out the following KB:

https://kb.vmware.com/s/article/2146381