ha

Isolation detection in vSphere 5.1 versus 5.0

Duncan Epping · Dec 31, 2012 ·

I received a question today from someone who wanted to know the difference for isolation detection between vSphere 5.0 and 5.1. I described this in our book, but I figured I would share it here as well. Note that this is an outtake from the book.

The isolation detection mechanism has changed substantially since previous versions of vSphere. The main difference is the fact that HA triggers a master election process before it will declare a host is isolated. In this timeline, “s” refers to seconds. The following timeline is the timeline for a vSphere 5.0 host:

T0 – Isolation of the host (slave)
T10s – Slave enters “election state”
T25s – Slave elects itself as master
T25s – Slave pings “isolation addresses”
T30s – Slave declares itself isolated and “triggers” isolation response

For a vSphere 5.1 host this timeline slightly differs due the insertion of a minimum 30s delay after the host declares itself isolated before it applies the configured isolation response. This delay can be increased using the advanced option das.config.fdm.isolationPolicyDelaySec.

T0 – Isolation of the host (slave)
T10s – Slave enters “election state”
T25s – Slave elects itself as master
T25s – Slave pings “isolation addresses”
T30s – Slave declares itself isolated
T60s – Slave “triggers” isolation response

Or as Frank would say euuuh show:

When the isolation response is triggered, with both 5.0 and 5.1, HA creates a “power-off” file for any virtual machine HA powers off whose home datastore is accessible. Next it powers off the virtual machine (or shuts down) and updates the host’s poweron file. The power-off file is used to record that HA powered off the virtual machine and so HA should restart it. These power-off files are deleted when a virtual machine is powered back on or HA is disabled.

After the completion of this sequence, the master will learn the slave was isolated through the “poweron” file as mentioned earlier, and will restart virtual machines based on the information provided by the slave.

** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **

TechPubs youtube videos

Duncan Epping · Dec 24, 2012 ·

I just noticed these 3 cool TechPubs youtube videos, the techpubs channel has been around for a while and I have been enjoying their videos a lot. Recently a couple of new videos were released and I hadn’t gotten around to watching them yet, but these are definitely part of my favorites. One is on vSphere HA by the lead engineer: Keith Farkas (also a reviewer on our book), and two others are by Sachin Thakkar. Sachin is one of the leads on vSphere virtual networking features like VXLAN. I enjoyed watching these very much as they give a nice overview of what this feature is about in just a couple of minutes. I also personally feel it is nice to “get to know” the people behind this cool feature/technology…

Make sure to follow the TechPubs channel for more cool videos. Now it is back to christmas shopping again 😉

vSphere HA

VXLAN

Death to false myths: Admission Control lowers consolidation ratio

Duncan Epping · Dec 11, 2012 ·

Death to false myths probably sounds a bit euuhm well Dutch probably, or “direct” as others would label it. Lately I have seen some statements floating around which are either false or misused. One of them is around Admission Control and how it impacts consolidation ratio even if you are not using reservations. I have had multiple questions around this in the last couple of weeks and noticed this thread on VMTN.

The thread referred to is all about which Admission Control policy to use, as the selected policy potentially impacts the amount of virtual machines you can run on a cluster. Now lets take a look at the example in this VMTN thread, and I have rounded up some of the numbers to simplify things:

7 host cluster
512 GB of memory
132 GHz of CPU resources
217 MB of Memory Overhead (no reservations used)

So if you do the quick math. According to Admission Control (host failures example) you can power-on about ~2500 virtual machines. That is without taking N-1 resiliency in to account. When I take out the largest host we are still talking about ~1800 virtual machines that can be powered on. Yes that is 700 slots/virtual machines less due to the N-1, admission control needs to be able to guarantee that even if the largest host fails all virtual machines can be restarted.

Considering we have 512GB in total that means that if those 1800 virtual machines on average actively use 280MB we will see TPS / swapping / ballooning / compression. (512GB / 1800 VMs) Clearly you want to avoid most of these, swapping / ballooning / compression that is. Especially considering most VMs are typically provisioned with 2GB of memory or more.

So what does that mean or did we learn? Two things:

Admission Control is about guaranteeing virtual machine restarts
If you set no reservation you can power-on an insane amount of virtual machines

Let me reemphasize the last bullet, you can power-on an INSANE amount of virtual machines on just a couple of hosts when no reservations are used. In this case HA would allow for 1800 virtual machines to be powered-on before it starts screaming it is out of resources. Is that going to work in real life, would your virtual machines be happy with the amount of resources they are getting? I don’t think so… I don’t believe that 280MB of physically backed memory is sufficient for most workloads. Yes, maybe TPS can help a bit, but chances of hitting the swap file are substantial.

Let it be clear, admission control is no resource management solution. It is only guaranteeing virtual machines can be restarted and if you have no reservations set then the numbers you will see are probably not realistic. At least not from a user experience perspective. I bet your users / customers would like to have a bit more resources available than just the bare minimum required to power-on a virtual machine! So don’t let these numbers fool you.

Insufficient resources to satisfy HA failover level on cluster

Duncan Epping · Dec 4, 2012 ·

I had this question yesterday where the error “Insufficient resources to satisfy HA failover level on cluster” comes from. And although it is hopefully clear to all of my regular readers this is caused by something that is called vSphere HA Admission Control, I figured I would reemphasize it and make sure people can easily find it when they do a search on my website.

When vSphere HA Admission Control is enabled vCenter Server validates if enough resources are available to guarantee all virtual machines can be restarted. If this is not the case the error around the HA failover level will appear. So what could cause this to happen and how do you solve it?

Are all hosts in your cluster still available (any hosts down )?
- If a host is down it could be insufficient resource are available to guarantee restarts
Check which admission control policy has been selected
- Depending on which policy has been selected a single large reservation could skew the admission control algorithm (primarily “host failures” policy is impacted by this)
Admission Control was recently enabled
- Could be that the cluster was overcommitted, or various reservations are used, causing the policy to be violated directly when enabled

In most cases when this error pops up it is caused by a large reservation on memory or CPU and that should always be the first thing to check. There are probably a million scripts out there to check this, but I prefer to use either the CloudPhysics appliance (cloud based flexible solution with new reports weekly), or RVTools which is a nice Windows based utility that produces quick reports. If you are interested in more in-depth info on admission control I suggest reading this section of my vSphere HA deepdive page.

What is that poweron file in my .vSphere-HA folder?

Duncan Epping · Nov 23, 2012 ·

When answering some questions in the vSphere HA section of the VMTN forum the “poweron” file was mentioned. I have gotten some other questions as well about this file so a public blog post makes most sense.

Each hosts in a vSphere HA cluster keeps track of the power state of the virtual machines it is hosting. This set of powered on virtual machines is stored the “poweron” file. Note that this applies to both the master and the slave hosts in your cluster. This file is located on your vmfs volumes in the hidden directory “.vSphere-HA/<FDM cluster ID>“.

The naming scheme for this file is as follows:
host-<id>-poweron

Tracking virtual machine power-on state is not the only thing the “poweron” file is used for. This file is also used by the slaves to inform the master that it is isolated from the management network: the top line of the file will either contain a “0” (zero) or a “1”. A “0” means not-isolated and a “1” means isolated. The master will inform vCenter about the isolation of the host.

This also means that if a host is not sending out any heartbeats to the master, the master will validate if that host has been isolated by reading the “poweron” file. This could be considered as an extra check on top of the “datastore heartbeating” mechanism.