Not sure why, but the last couple of weeks I have had several questions about FT (Fault Tolerance). The questions where around the limits, what is the limit per VM, what is the limit per host, and can I somehow exceed these? All of this is documented by VMware, but somehow seems to be either difficult to find or difficult to understand. Let me write a short summary that hopefully clarifies things.
First of all, the license you use dictates the maximum number of vCPUs a VM can have when enabling FT on that VM:
- vSphere Standard and Enterprise. Allows up to 2 vCPUs
- vSphere Enterprise Plus. Allows up to 8 vCPUs
Now, there are also two other things that come into play. You can have a maximum of 4 FT enabled VMs per host, and a maximum of 8 FT enabled vCPUs per host. You can change these settings, this is fully supported as I already discussed in this blog post. There is however a caveat, while VMware has tested with a higher number of FT enabled VMs per host than 4, and with a high number of FT enabled vCPUs, there’s no guarantee that you will get acceptable performance. The more you increase these default values, the bigger the chance that there will be a performance impact.
When FT is enabled a significant amount of communication between hosts (Primary / Shadow VM) needs to occur to ensure the VMs are in lockstep. This overhead can cause a slowdown, and this is the reason why we have those limitations in place. If you have sufficient networking bandwidth and CPU capacity then you can increase these numbers. Note, typically VMware development does not test beyond the maximum specified numbers. If performance is impacted, or you receive unexpected errors/results, and you contact support then support may request to lower the numbers as that impact can unfortunately not be solved in a different way. I hope that clarifies it.