drs

Using limits instead of downscaling….

Duncan Epping · Sep 25, 2009 ·

I’ve seen this floating around the communities a couple of times and someone also mentioned this during a VCDX Panel: setting limits on VMs when you are not allowed to decrease the memory. For example you want to P2V a server with 8GB of memory and an average utilization of 15%. According to normal guidelines it would make sense to resize the VM to 2GB, however due to political reasons (I paid for 8GB and I demand…) this is not an option. This is when people start looking into using limits. However I don’t recommend this approach and there’s a good reason for it.

Using limits can lead to serious performance issues when the VM starts swapping. As many of you know the first thing that happens when you reach the limit is that the balloon driver kicks in. The balloon driver will force the OS to swap out. Of course this will affect performance but at least when the OS gets to pick the pages it will do this in a smart way. When the OS reaches its limits the VMkernel will start swapping and this is where it gets nasty because the VMkernel does not take anything into account. It could easily swap out pages actively being used by your application or operating system which will affect the performance of your VM heavily. (That’s a short summary of the process, if you want a more in-depth explanation of this please read this excellent post by Scott “VMGuru” Herold.)

Swapping, either VMkernel or OS, is the reason I don’t recommend using limits. Just think about it for a minute. You probably convinced the application owner to virtualize their services with arguments like availability, flexibility and equal performance. Setting a limit will more than likely affect performance when the threshold is in reach and thus hurt their trust in virtualization and the IT organization. Another side effect is that there’s no way to recover from swapping without a reboot, which will mean availability will also be decreased. In other words; avoid setting limits.

I do however understand why admins take these drastic steps; but again I don’t agree. If you want to convince your application owner that their VM needs to be resized monitor it. Prove to them that the server is not utilizing the memory and claim it back. Claiming back is difficult, that’s why I personally recommend to invest more time and effort during the first phase of your P2V project, educate the application owner and convince them with the outcome of your capacity planning tools. Explain them how easy it is to increase memory and make them feel more comfortable by adding a week of aftercare which includes resource monitoring. If you really want to convince them, but that’s dependent on the level of maturity within the organization, change the cost-model and make it more attractive to downsize…

Long Distance VMotion

Duncan Epping · Sep 21, 2009 ·

As you might have noticed last week I’m still digesting all the info from VMworld. One of the coolest new supported technologies is Long Distance VMotion. A couple of people already wrote a whole article on this session so I will not be doing this. (Chad Sakac, Joep Piscaer) However I do want to stress some of the best practices / requirement to make this work.

Requirements:

An IP network with a minimum bandwidth of 622 Mbps is required.
The maximum latency between the two VMware vSphere servers cannot exceed 5 milliseconds (ms).
The source and destination VMware ESX servers must have a private VMware VMotion network on the same IP subnet and broadcast domain.
The IP subnet on which the virtual machine resides must be accessible from both the source and destination VMware ESX servers. This requirement is very important because a virtual machine retains its IP address when it moves to the destination VMware ESX server to help ensure that its communication with the outside world (for example, with TCP clients) continues smoothly after the move.
The data storage location including the boot device used by the virtual machine must be active and accessible by both the source and destination VMware ESX servers at all times.
Access from VMware vCenter, the VMware Virtual Infrastructure (VI) management GUI, to both the VMware ESX servers must be available to accomplish the migration.

Best practices:

Create HA/DRS Clusters on a per site basis. (Make sure I/O stays local!)
A single vDS (like the Cisco Nexus 1000v) across clusters and sites.
Network routing and policies need to be synchronized or adjusted accordingly.

Most of these are listed in this excellent whitepaper from VMware, Cisco and EMC by the way.

Combining this current available technology with what Banjot discussed during his VMworld session regarding HA futures I think the possibilities are endless. One of the most obvious ones is of course Stretched HA Clusters. When adding VMotion into the mix a stretched HA/DRS Cluster would be a possibility. This would require other thresholds of course but how cool would it be if DRS would re-balance your clusters based on specific pre-determined and configurable thresholds?!

Stretched HA/DRS Clusters would however mean that the cluster needs to be carved into sub-clusters to make sure I/O stays local. You don’t want to run your VMs on site A while their VMDKs are stored on site B. This of course depends on the array technology being used. (Active / Active, as in one virtual array would solve this.) During Banjot session it was described as “tagged” hosts in a cross site Cluster and during the Long Distance VMotion session it’s described as “DRS being aware of WAN link and sidedness”. I would rather use the term “sub-cluster” or “host-group”. Although this all seems to be still far away it seems to be much closer than we expect. Long Distance VMotion is supported today. Sub-clusters aren’t available yet but knowing VMware, and looking at the competition, they will go full steam ahead.

MSCS VM’s in a HA/DRS cluster

Duncan Epping · Jun 3, 2009 ·

We(VMware PSO) had a discussion yesterday on the fact whether it’s supported to have MSCS(Microsoft Clustering Services) VM’s in a HA/DRS cluster with both HA and DRS set to disable. I know many people struggle with this because it doesn’t make sense in a way. In short: No, this is not supported. MSCS VM’s can’t be part of a VMware HA/DRS cluster, even if they are set to disabled.

I guess you would like to have proof:

For ESX 3.5:
http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_mscs.pdf
Page 16 – “Clustered virtual machines cannot be part of VMware clusters (DRS or HA).”

For vSphere:
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_mscs.pdf
Page 11 – “The following environments and functionality are not supported for MSCS setups with this release of vSphere:
Clustered virtual machines as part of VMware clusters (DRS or HA).”

As you can see certain restrictions apply, make sure to read the above documents for all the details.

Max amount of VMs per Host?

Duncan Epping · May 25, 2009 ·

If I would ask you what the max amount of VMs per Host is for vSphere what would your answer be?

My bet is that your answer would be 320 VMs. This, of course, based on the “virtual machines per host” number that page 5 of the Configurations Maximum for vSphere shows.

But is this actually the correct answer? No it’s not. The correct answer is, it depends. Yes… it depends on the fact if you are using HA or not. The following restrictions apply to an HA cluster(page 7):

Max 32 Hosts per HA Cluster.
Max 1280 VMs per Cluster.
Max 100 VMs per Host.
If the number of Hosts exceeds 8 in a cluster, the limit of VMs per host is 40.

These are serious restrictions that will need to be taken into account when making a design for a virtual environment. It touches literally everything. From your Cluster size down to the hardware you’ve selected. I know these configuration maximums get revised with every update but it is most definitely something one would need to consider and discuss with the customer…

Just wondering what your thoughts are,

Export and import DRS affinity rules

Duncan Epping · Apr 23, 2009 ·

I just noticed this awesome work by LucD. He developed two scripts which can import and export DRS affinity rules. Especially in large environments or environments that have multiple affinity rules this is an excellent solution. Take a look at the link above for more details. Luc posted the script half way down the topic in text but also added a modified version at the bottom. The VI Toolkit at it’s best… or should we call it PowerCLI these days?