cloud

Clearing up a misunderstanding around CPU throttling with vMotion

Duncan Epping · Jul 16, 2012 ·

I was reading a nice article by Michael Webster on multi-nic vMotion. In the comment section Josh Attwell refers to a tweet by Eric Siebert around how CPUs are throttled when many VMs are simultaneously vMotioned. This is the tweet:

Heard interesting vMotion tidbit today, more simultaneous vMotions are made possible by throttling the clock speed of VMs to slow them down

— Eric Siebert (@ericsiebert) June 6, 2012

I want to make sure that everyone understands that this is not exactly the case. There is a vMotion enhancement in 5.0 which is called SDPS aka “Slow Down During Page Send”. I wrote an article about this feature when vSphere 5.0 was released but I guess it doesn’t hurt to repeat this as the blogosphere was literally swamped with info around the 5.0 release.

SDPS kicks in when the rate at which pages are changed (dirtied) exceeds the rate at which the pages can be transferred to the other host. In other words, if your virtual machines are not extremely memory active then chances of SDSP ever kicking in is small, very very small. If it does kick in, it kicks in to prevent the vMotion process from failing for this particular VM. Now note that by default SDPS is not doing anything, normally your VMs will not be throttled by vMotion and it will only be throttled when there is a requirement to do so.

I quoted my original article on this subject below to provide you the details:

Simply said, vMotion will track the rate at which the guest pages are changed, or as the engineers prefer to call it, “dirtied”. The rate at which this occurs is compared to the vMotion transmission rate. If the rate at which the pages are dirtied exceeds the transmission rate, the source vCPUs will be placed in a sleep state to decrease the rate at which pages are dirtied and to allow the vMotion process to complete. It is good to know that the vCPUs will only be put to sleep for a few milliseconds at a time at most. SDPS injects frequent, tiny sleeps, disrupting the virtual machine’s workload just enough to guarantee vMotion can keep up with the memory page change rate to allow for a successful and non-disruptive completion of the process. You could say that, thanks to SDPS, you can vMotion any type of workload regardless of how aggressive it is.

It is important to realize that SDPS only slows down a virtual machine in the cases where the memory page change rate would have previously caused a vMotion to fail.

This technology is also what enables the increase in accepted latency for long distance vMotion. Pre-vSphere 5.0, the maximum supported latency for vMotion was 5ms. As you can imagine, this restricted many customers from enabling cross-site clusters. As of vSphere 5.0, the maximum supported latency has been doubled to 10ms for environments using Enterprise Plus. This should allow more customers to enable DRS between sites when all the required infrastructure components are available like, for instance, shared storage.

Maximum amount of FT virtual machines per host?

Duncan Epping · Jun 29, 2012 ·

There was a discussion yesterday on our Socialcast system. The question was what the max amount of FT virtual machines was and what dictated this. Of course there are many things that will be a constraint when it comes to FT (memory reservations, bandwidth etc) but the one thing that stands out and not many realize is that the amount of FT virtual machines per host is limited to 4 by default.

This is currently controlled by a vSphere HA advanced setting called “das.maxftvmsperhost”. By default this setting is configured to 4. This advanced setting is an HA advanced setting (in combination with vSphere DRS) and defines the max amount of FT virtual machines, either primary or secondary or a combination of both, that can run on a single host. So if for whatever reason you want a max of 6 you will need to add this advanced setting with a value of 6.

I do not recommend changing this however, FT is a fairly heavy process and in most environments 4 is the recommended value.

vCenter Infrastructure Navigator throws the error: “an unknown discovery error has occurred”

Duncan Epping · May 10, 2012 ·

I was deploying vCenter Infrastructure Navigator (VIN) in my lab today and the following error came up after I wanted to check dependencies for a virtual machine:

Access failed, an unknown discovery error has occurred

I rebooted several services but nothing seemed to solve it. Internally I bumped on a thread which had the fix for this problem: DNS. Yes I know always DNS right. Anyway, I used “DHCP” for my VIN appliance and this DHCP server pointed to a DNS server which did not have the IP/name of my ESXi hosts listed. Because of this the discovery didn’t work as VIN tries to resolve the names of the hosts as they were added to vCenter Server. I configured VIN with a fixed IP and pointed the VIN appliance to the right DNS server. Problem solved.

Cluster Sizes – vSphere 5 style!?

Duncan Epping · Apr 10, 2012 ·

At the end of 2010 I wrote an article about cluster sizes… ever since it has been a popular article and I figured that it was time to update it. vSphere 5 changed the game when it comes to sizing/scaling of your clusters and I this is an excellent opportunity to emphasize that. The key take-away of my 2010 article was the following:

I am not advocating to go big…. but neither am I advocating to have a limited cluster size for reasons that might not even apply to your environment. Write down the requirements of your customer or your environment and don’t limit yourself to design considerations around Compute alone. Think about storage, networking, update management, max config limits, DRS & DPM, HA, resource and operational overhead.

We all know that HA used to be a constraint for your cluster size… However these times are long gone. I still occasionally see people referring to old “max config limits” around the amount of VMs per cluster when exceeding 8 hosts… This is not a concern anymore. I also still see people referring to the max 5 primary node limit… Again not a concern anymore. I guess we can generalize things and using the 2010 article and applying that to vSphere 5 I guess we can come to the following conclusions:

HA does not limit the number of hosts in a cluster anymore! Using more hosts in a cluster results in less overhead. (N+1 for 8 hosts vs N+1 for 32 hosts)
DRS loves big clusters! More hosts equals more scheduling opportunities.
SCSI Locking? Hopefully all of you are using VAAI capable arrays by now… This should not be a concern. Even if you are not using VAAI, optimistic locking should have relieved this for almost all environments!
Max number of hosts accessing a file = 8! This is a constraint in an environment using linked clones like View
Max values in general (256 LUNs, 1024 Paths, 512 VMs per host, 3000 VMs per cluster)

Once again, I am not advocating to scale-up or scale-out. I am mere showing that there are hardly any limiting factors anymore at this point in time. One of the few constraints that is still valid is the max of 8 hosts in a cluster using linked clones. Or better said, a max of 8 hosts accessing a file concurrently. (Yes we are working on fixing this…)

I would like to know from you guys what the cluster sizes are you are using, and if you are constraint somehow… what those constraints are… chip in!

Blocking or allowing traffic when vShield App is down?

Duncan Epping · Mar 19, 2012 ·

I did a couple of articles about vShield App a couple of months back. One of them explained how to get around a situation where vShield App would be down, as in this case of traffic would be blocked. Since then I spoke to multiple customers who asked me if it was possible to configure vShield App in such a way that traffic would be allowed when an issue occurred with vShield App. Although this goes against best practices and I would not recommend this, I can understand why some customers would want to do this. Luckily for them vShield App 5.0.1 now offers a setting that allows you to do this:

Go to vShield within vCenter
Click “Settings & Reports”
Click the “vShield App” tab
Click “Change” under “Failsafe”
Click “Yes” when asked if you would like to change the setting

Together with the option to exclude VMs from being protected by vShield App and the automatic restart of vShield App appliances in the case of a failure it seems that my feature requests were fulfilled.