5.1

How to disable ESXi firewall

Duncan Epping · Jan 23, 2013 ·

For a project I had to disable the ESXi firewall on a host permanently. To be honest, it isn’t something I would do normally or would recommend even. It wasn’t listed in “chkconfig”, which kinda makes sense, so I looked at the networking section of esxcli. What an awesome command by the way! Quickly after “tab’ing” through esxcli I figured out how to disable it permanently:

esxcli network firewall set --enabled false

I figured I would write it down, because this is the stuff I tend to forget easily.

PS: If you ever need anything around esxcli, the vSphere Blog is a good place to check as most of the relevant posts are tagged with “esxcli”.

Percentage Based Admission Control gives lower VM restart guarantee?

Duncan Epping · Jan 9, 2013 ·

Those who have configured vSphere HA have all seen that section where it asks if you want to use admission control or not. Of course if you decide you want to use it, and you should want this, then the next question that comes is which one do you want to use? I have always preferred the “Percentage Based Admission Control” policy. For some reason though there are people who think that the percentage based admission control policy rules out large VMs from being restarted or offers a lower guarantee.

The main perception that people have is that the percentages based admission control policy gives lower guarantees of virtual machines being restarted than the “host failures” admission control policy. So let break it down, and I mean BREAK IT DOWN, by using an example.

Example

5 hosts
200GB of Memory in cluster
20GHz of CPU in cluster

If no reservations are set:

Percentage Based will do the following:

The Percentage Based policy will take the total amount of resources and subtract the amount of resources reserved for fail-over. If that percentage is for instance 20% than 40GB and 4GHz are subtracted. Which means 160GB and 16GHz are left.
The reserved resources for every virtual machine that is powered on is subtracted from what the outcome of 1. was. If no reservation is set memory then memory overhead is subtracted, if the memory overhead is 200MB then 200MB is subtracted from the 160GB that was left resulting in 159,8GB being available. For CPU the default of 32MHz will be used.
You can power-on virtual machines until the amount of available resources, according to HA Admission Control, is depleted, yes many VMs in this case.

Host Failures will do the following:

The Host Failures policy will calculate the amount of slots. A slot is formed out of two components: memory and cpu. As no reservation is used the default for CPU is used which is 32MHz, with vSphere 5.0 and higher. For memory the largest memory overhead size is used, in this scenario there could be a variety of sizes lets say the smallest is 64MB and the largest 300MB. Now 300MB will be used for the Memory Slot size.
Now that the slotsize is known Admission Control will look for the host with the most slots (available resources / slot size) and subtract those slots from the total amount of available slots. (If one host failure is specified). Every time a VM is started a slot is subtracted. If a VM is started with a higher memory reservation we go back to 1 and the math will need to be done again.
You can power-on virtual machines until you are out of slots, again… many VMs.

If reservations are set:

Percentage Based will do the following:

The Percentage Based policy will take the total amount of resources and subtract the amount of resources reserved for fail-over. If that percentage is for instance 20% than 40GB and 4GHz are subtracted. Which means 160GB and 16GHz are left.
The reserved resources for every virtual machine that is powered on is subtracted from what the outcome of 1 was. So if 10GB of memory was reserved, then 10GB is subtracted resulting in 150GB being available.
You can power-on virtual machines until available resources are depleted (according to HA Admission Control), but as reservations are used you are “limited” in terms of the amount of VMs you can power-on.

Host Failures will do the following:

The Host Failures policy will calculate the amount of slots. A slot is formed out of two components: memory and cpu. As a reservation is used for memory but not for CPU the default for CPU is used which is 32MHz, with vSphere 5.0 and higher. For memory there is a 10GB reservation set. 10GB will be used for the Memory Slot size.
Now that the slotsize is known Admission Control will look for the host with the most slots (available resources / slot size) and subtract those slots from the total amount of available slots. (If one host failure is specified). Every time a VM is started a slot is subtracted, yes that is a 10GB memory slot, even if it has for instance a 2GB reservation. If a VM is started with a higher memory reservation we go back to 1 and the math will need to be done again.
You can power-on virtual machines until you are out of slots, as a high reservation is set you will be severely limited!

Now you can imagine that “Host Failures” can be on the safe side… If you have 1 reservation set the math will be done with that reservation. This means that a single 10GB reservation will impact how many VMs you can power-on until HA screams that it is out of resources. But at least you are guaranteed you can power them on right? Well yes, but realistically speaking people disable Admission Control at this point as that single 10GB reservation allows you to power on just a couple of VMs. (16 to be precise.)

But but that beats Percentage Based right… because if I have a lot of VMs who says my VM with 10GB reservation can be restarted? First of all, if there are no “unreserved resources” available on any given host to start this virtual machine then vSphere HA will ask vSphere DRS to defragment the cluster.As HA Admission Control had already accepted this virtual machine to begin with, chances are fairly high that DRS can solve the fragmentation.

Also, as the percentage based admission control policy uses reservations AND memory overhead… how many virtual machines do you need to have powered-on before your VM with 10 GB memory reservation is denied to be powered-on? It would mean that none of the hosts has 10GB of unreserved memory available. That is not very likely as that means you would need to power-on hundreds of VMs… Probably way too many for your environment to ever perform properly. So chances of hitting this scenario are limited, extremely small.

Conclusion

Although theoretically possible, it is very unlikely you will end up in situation where one or multiple virtual machines can not be restarted when using the Percentage Based Admission Control policy. Even if you are using reservations on all virtual machines then this is unlikely as the virtual machines have been accepted at some point by HA Admission Control and HA will leverage DRS to defragment resources at that point. Also keep in mind that when using reservations on all virtual machines that Host Failures is not an option as it will skew your numbers as it does the math with “worst case scenario”, a single 10GB reservation can kill your ROI/TCO.

In short: Go Percentage Based!

ESXi host disconnected from vCenter?

Duncan Epping · Jan 7, 2013 ·

I noticed a couple of people reported this problem in the last two months so I figured a blog post would be useful. This thread on VMTN triggered this article. If your ESXi host is disconnected from vCenter (even 5.0 and 5.1 appear to be impacted by this) and you see error messages in your log files about free space like these:

WARNING: VisorFSObj: xxxx: Cannot create file /var/spool/snmp/xxxxxxxx_x_
x_xxxx.trp for process hostd-worker because the inode table of its ramdisk (root) is full.

VmkCtl Locking (/etc/vmware/esx.conf) : Unable to create or open a LOCK file. Failed with reason: No space left on device

This could be caused by the fact that ESXi is running out of inodes. You can simply check that on the command line by using the following command:

stat -f /

The outcome of this command will look as follows:

File: “/”
ID: 1        Namelen: 127     Type: visorfs
Block size: 4096
Blocks: Total: 449852     Free: 324368     Available: 324368
Inodes: Total: 8192       Free: 55

As you can see the amount of “free” inodes is low and this is causing the experienced issues. In some cases it is reported (by vdsyn in this case) that “/var/spool/snmp/” is full and needs to be cleaned out. In this KB Article “/var/run/sfcb/” is explicitly called out and also explains what you can delete and how. So make sure to look at those two directories when an ESXi host is disconnected from vCenter.

Isolation detection in vSphere 5.1 versus 5.0

Duncan Epping · Dec 31, 2012 ·

I received a question today from someone who wanted to know the difference for isolation detection between vSphere 5.0 and 5.1. I described this in our book, but I figured I would share it here as well. Note that this is an outtake from the book.

The isolation detection mechanism has changed substantially since previous versions of vSphere. The main difference is the fact that HA triggers a master election process before it will declare a host is isolated. In this timeline, “s” refers to seconds. The following timeline is the timeline for a vSphere 5.0 host:

T0 – Isolation of the host (slave)
T10s – Slave enters “election state”
T25s – Slave elects itself as master
T25s – Slave pings “isolation addresses”
T30s – Slave declares itself isolated and “triggers” isolation response

For a vSphere 5.1 host this timeline slightly differs due the insertion of a minimum 30s delay after the host declares itself isolated before it applies the configured isolation response. This delay can be increased using the advanced option das.config.fdm.isolationPolicyDelaySec.

T0 – Isolation of the host (slave)
T10s – Slave enters “election state”
T25s – Slave elects itself as master
T25s – Slave pings “isolation addresses”
T30s – Slave declares itself isolated
T60s – Slave “triggers” isolation response

Or as Frank would say euuuh show:

When the isolation response is triggered, with both 5.0 and 5.1, HA creates a “power-off” file for any virtual machine HA powers off whose home datastore is accessible. Next it powers off the virtual machine (or shuts down) and updates the host’s poweron file. The power-off file is used to record that HA powered off the virtual machine and so HA should restart it. These power-off files are deleted when a virtual machine is powered back on or HA is disabled.

After the completion of this sequence, the master will learn the slave was isolated through the “poweron” file as mentioned earlier, and will restart virtual machines based on the information provided by the slave.

** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **

Using VAAI ATS capable array and VMFS-5?

Duncan Epping · Dec 27, 2012 ·

<update 21-Jan-2013>I have just been informed that this issue was fixed in vSphere 5.0 Update 1. The KB article and 5.0 U1 release notes will be updated shortly!</update>

If you are using a VAAI ATS capable array and VMFS-5 you might want to read this KB Article. The article describes a situation where it is impossible to mount VMFS volumes when they were formatted with VMFS-5 on a VAAI ATS (locking offloading) capable array. These are the kind of problems that you won’t hit on a daily basis but when you do you will be scratching your head for a while. Note that this also applies to scenarios where for instance SRM is used. The error to look for in your vmkernel log is:

Failed to reserve volume

So anyone with a 5.0 environment and newly formatted VMFS-5 volumes might want to test this. Although the article states that so far it has only been encountered with EMC Clariion, NS and VNX storage, it also notes that it might not be restricted to it. The solution fortunately is fairly simple, just disable VAAI ATS for now.

esxcli system settings advanced set -i 0 -o /VMFS3/HardwareAcceleratedLocking

For more details read the KB and I would also suggest following it with an RSS reader if you have this issue, that way you get notified when there is an update.