6.5

Changing advanced vSphere FT related settings, is that supported?

Duncan Epping · Feb 1, 2018 ·

This week I received a question around changing the values for vSphere FT related advanced settings. This customer is working on an environment where uptime is key. Of course the application layer is one side, but they also want to have additional availability from an infrastructure perspective. Which means vSphere HA and vSphere FT are key.

They have various VMs they need to enable FT on, these are vSMP VMs (meaning in this case dual CPU). Right now each host is limited to 4 FT VMs and at most 8 vCPUs, this is being controlled by two advanced settings called “das.maxftvmsperhost” and “das.maxFtVCpusPerHost”. The values for these are, obviously, 4 and 8. The question was: can I edit these and still have a supported configuration? Also, why 4 and 8?

I spoke to the product team about this and the answer is: yes, you can safely edit these. These values were set based on typical bandwidth and resource constraints customers have. An FT VM easily consumes between 1-3Gbps of bandwidth, meaning that if you dedicate a 10Gbps link to it you will fit roughly 4 VMs. I say roughly as of course the workload matters: CPU, Memory and IO pattern.

If you have a 40Gbps NIC, and you have plenty of cores and memory you could increase those max numbers for FT VMs per host and FT vCPUs. However, it must be noted that if you run in to problems VMware GSS may request you to revert back to the default just to ensure the issues that occur aren’t due to this change as VMware tests with the default values.

UPDATE to this content can be found here: https://www.yellow-bricks.com/2022/11/18/can-you-exceed-the-number-of-ft-enabled-vcpus-per-host-or-number-of-ft-enabled-vcpus-per-vm/

Using a vSphere custom TCP/IP Stack for iSCSI

Duncan Epping · Jan 31, 2018 ·

For continued updated guidance on iSCSI and routed traffic/custom IP stacks I would like to refer you to storagehub, all iSCSI best practices can be found here.

I noticed a question today on an internal slack channel and it was about the use of custom TCP/IP stacks for iSCSI storage environments. Cormac and I updated a bunch of Core Storage whitepapers recently, and one of them was the iSCSI Best Practices white paper. It appears that this little section about routing and custom TCP/IP stacks is difficult to find, so I figured I would share it here as well. The executive summary is simple: Using custom TCP/IP stacks for iSCSI storage in vSphere is not supported.

What is nice though is that with vSphere 6.5 you can now set a gateway per vmkernel interface, anyway here’s the blurb from the paper:

As mentioned before, for vSphere hosts, the management network is on a VMkernel port and therefore uses the default VMkernel gateway. Only one VMkernel default gateway can be configured on a vSphere host per TCP/IP Stack. You can, however, add static routes from the command line or configure a gateway for each individual VMkernel port.

Setting a gateway on a per VMkernel port granular level has been introduced in vSphere 6.5 and allows for a bit more flexibility. The gateway for a VMkernel port can simply be defined using the vSphere Web Client during the creation of the VMkernel interface. It is also possible to configure it using esxcli. Note: At the time of writing the use of a custom TCP/IP Stack is not supported for iSCSI!

I hope that clarifies things, and makes this support statement easier to find.

Insufficient configured resources to satisfy the desired vSphere HA failover level

Duncan Epping · Dec 9, 2017 ·

I was going over some of the VMTN threads and I noticed an issue brought up with Admission Control a while ago. Completely forgot about it until it was brought up again internally. With vSphere 6.5 and vSphere HA there seems to be a problem with some Admission Control Policies. When you for instance have selected the Percentage Based Admission Control Policy and you have a low number of hosts, you could receive the following error

Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster …

I managed to reproduce this in my lab, and this is what it looks like in the UI:

It seems to happen when there’s a minor difference in resources between the host, but I am not 100% certain about it. I am trying to figure out internally if it is a known issue or not, and will come back to this when I know in which patch it will be solved and/or if it is indeed a known issue.

Using HA VM Component Protection in a mixed environment

Duncan Epping · Nov 29, 2017 ·

I have some customers who are running both traditional storage and vSAN in the same environment. As most of you are aware, vSAN and VMCP do not go together at this point. So what does that mean for traditional storage, as in with traditional storage for certain storage failure scenarios you can benefit from VMCP.

Well the statement around vSAN and VMCP is actually a bit more delicate. vSAN does not propagate PDL or APD in a way which VMCP understands. So you can enable VMCP in your environment, without it having an impact on VMs running on top of vSAN. The VMs which are running on the traditional storage will be able to use the VMCP functionality, and if an APD or PDL is declared on the LUN they are running on vSphere HA will take action. For vSAN, well we don’t propagate the state of a disk that way and we have other mechanisms to provide availability / resiliency.

In summary: Yes, you can enable HA VMCP in a mixed storage environment (vSAN + Traditional Storage). It is fully supported.

vSphere HA heartbeat datastores, the isolation address and vSAN

Duncan Epping · Nov 8, 2017 ·

I’ve written about vSAN and vSphere HA various times, but I don’t think this has been explicitly called out before. Cormac and I were doing some tests this week and noticed something. When we were looking at results I realized I described it in my HA book a long time ago, but it is so far hidden away that probably no one has noticed.

In a traditional environment when you enable HA you will automatically have HA heartbeat datastores selected. These heartbeat datastores are used by the HA primary host to determine what has happened to a host which is no longer reachable over the management network. In other words, when a host is isolated it will communicate this to the HA primary using the heartbeat datastores. It will also inform the HA primary which VMs were powered off as the result of this isolation event (or not powered off when the isolation response is not configured).

Now, with vSAN, the management network is not used for communication between the hosts but the vSAN network is used. Typically in a vSAN environment, there’s only vSAN storage so there are no heartbeat datastores. As such, when a host is isolated it is not possible to communicate this to the HA primary. Remember, the network is down and there is no access to the vSAN datastore so the host cannot communicate through that either. HA will still function as expected though. You can set the isolation response to power-off and then the VMs will be killed and restarted. That is, if isolation is declared.

So when is isolation declared? A host declares itself isolated when:

It is not receiving any communication from the primary
It cannot ping the isolation address

Now, if you have not set any advanced settings then the default gateway of the management network will be the isolation address. Just imagine your vSAN Network to be isolated on a given host, but for whatever reason, the Management Network is not. In that scenario isolation is not declared, the host can still ping the isolation address using the management network vmkernel interface. HOWEVER… vSphere HA will restart the VMs. The VMs have lost access to disk, as such the lock on the VMDK is lost. HA notices the hosts are gone, which must mean that the VMs are dead as the locks are lost, lets restart them.

That is when you could be in the situation where the VMs are running on the isolated hosts and also somewhere else in the cluster. Both with the same mac address and the same name / IP address. Not a good situation. Now, if you would have had datastore heartbeats enabled then this would be prevented. As the isolated host would inform the primary it is isolated, but it would also inform the primary about the state of the VMs, which would be powered-on. The primary would then decide not to restart the VMs. However, the VMs which are running on the isolated host are more or less useless as they cannot write to disk anymore.

Let’s describe what we tested and what the outcome was in a way that is a bit easier to consume a table:

Isolation Address	Datastore Heartbeats	Observed behavior
IP on vSAN Network	Not configured	Isolated host cannot ping the isolation address, isolation declared, VMs killed and VMs restarted
Management Network	Not configured	Can ping the isolation address, isolation not declared, yet rest of the cluster restarts the VMs even though they are still running on the isolated hosts
IP on vSAN Network	Configured	Isolated host cannot ping the isolation address, isolation declared, VMs killed and VMs restarted
Management Network	Configured	VMs are not powered-off and not restarted as the “isolated host” can still ping the management network and the datastore heartbeat mechanism is used to inform the master about the state. So the master knows HA network is not working, but the VMs are not powered off.

So what did we learn, what should you do when you have vSAN? Always use an isolation address that is in the same network as vSAN! This way during an isolation event the isolation is validated using the vSAN vmkernel interface. Always set the isolation response to power-off. (My personal opinion based on testing.) This would avoid the scenario of duplicate mac / ip / names on the network when you have a single network being isolated for a specific host! And if you have traditional storage, then you can enable heartbeat datastores. It doesn’t add much in terms of availability, but still it will allow the HA hosts to communicate state through the datastore.

PS1: For those who don’t know, HA is configured to automatically select a heartbeat datastore. In a vSAN only environment you can disable this by selecting “Use datastore from only the specified list” in the HA interface and then set “das.ignoreInsufficientHbDatastore = true” in the advanced HA settings.

PS2: In a non-routable vSAN network environment you could create a Switch Virtual Interface on the physical switch. This will give you an IP on the vSAN segment for the isolation address leveraging the advanced setting das.isolationaddress0.