ESX

Fixed: Memory alarms triggered with AMD RVI and Intel EPT?

Duncan Epping · Sep 25, 2009 ·

I wrote about two weeks ago and back in March but the issues with false memory alerts due to large pages being used have finally been solved.

Source

Fixes an issue where a guest operating system shows high memory usage on Nehalem based systems, which might trigger memory alarms in vCenter. These alarms are false positives and are triggered only when large pages are used. This fix selectively inhibits the promotion of large page regions with sampled small page files. This provides a specific estimate instead of assuming a large page is active when one small page within it is active.

BEFORE INSTALLING THIS PATCH: If you have set Mem.AllocGuestLargePage to 0 to workaround the high memory usage issue detailed in the Summaries and Symptoms section, undo the workaround by setting Mem.AllocGuestLargePage to 1.

Six patches have been released today but this fix was probably the one that people talk about the most that’s why I wanted to make everyone aware of it! Download the patches here.

Using limits instead of downscaling….

Duncan Epping · Sep 25, 2009 ·

I’ve seen this floating around the communities a couple of times and someone also mentioned this during a VCDX Panel: setting limits on VMs when you are not allowed to decrease the memory. For example you want to P2V a server with 8GB of memory and an average utilization of 15%. According to normal guidelines it would make sense to resize the VM to 2GB, however due to political reasons (I paid for 8GB and I demand…) this is not an option. This is when people start looking into using limits. However I don’t recommend this approach and there’s a good reason for it.

Using limits can lead to serious performance issues when the VM starts swapping. As many of you know the first thing that happens when you reach the limit is that the balloon driver kicks in. The balloon driver will force the OS to swap out. Of course this will affect performance but at least when the OS gets to pick the pages it will do this in a smart way. When the OS reaches its limits the VMkernel will start swapping and this is where it gets nasty because the VMkernel does not take anything into account. It could easily swap out pages actively being used by your application or operating system which will affect the performance of your VM heavily. (That’s a short summary of the process, if you want a more in-depth explanation of this please read this excellent post by Scott “VMGuru” Herold.)

Swapping, either VMkernel or OS, is the reason I don’t recommend using limits. Just think about it for a minute. You probably convinced the application owner to virtualize their services with arguments like availability, flexibility and equal performance. Setting a limit will more than likely affect performance when the threshold is in reach and thus hurt their trust in virtualization and the IT organization. Another side effect is that there’s no way to recover from swapping without a reboot, which will mean availability will also be decreased. In other words; avoid setting limits.

I do however understand why admins take these drastic steps; but again I don’t agree. If you want to convince your application owner that their VM needs to be resized monitor it. Prove to them that the server is not utilizing the memory and claim it back. Claiming back is difficult, that’s why I personally recommend to invest more time and effort during the first phase of your P2V project, educate the application owner and convince them with the outcome of your capacity planning tools. Explain them how easy it is to increase memory and make them feel more comfortable by adding a week of aftercare which includes resource monitoring. If you really want to convince them, but that’s dependent on the level of maturity within the organization, change the cost-model and make it more attractive to downsize…

Long Distance VMotion

Duncan Epping · Sep 21, 2009 ·

As you might have noticed last week I’m still digesting all the info from VMworld. One of the coolest new supported technologies is Long Distance VMotion. A couple of people already wrote a whole article on this session so I will not be doing this. (Chad Sakac, Joep Piscaer) However I do want to stress some of the best practices / requirement to make this work.

Requirements:

An IP network with a minimum bandwidth of 622 Mbps is required.
The maximum latency between the two VMware vSphere servers cannot exceed 5 milliseconds (ms).
The source and destination VMware ESX servers must have a private VMware VMotion network on the same IP subnet and broadcast domain.
The IP subnet on which the virtual machine resides must be accessible from both the source and destination VMware ESX servers. This requirement is very important because a virtual machine retains its IP address when it moves to the destination VMware ESX server to help ensure that its communication with the outside world (for example, with TCP clients) continues smoothly after the move.
The data storage location including the boot device used by the virtual machine must be active and accessible by both the source and destination VMware ESX servers at all times.
Access from VMware vCenter, the VMware Virtual Infrastructure (VI) management GUI, to both the VMware ESX servers must be available to accomplish the migration.

Best practices:

Create HA/DRS Clusters on a per site basis. (Make sure I/O stays local!)
A single vDS (like the Cisco Nexus 1000v) across clusters and sites.
Network routing and policies need to be synchronized or adjusted accordingly.

Most of these are listed in this excellent whitepaper from VMware, Cisco and EMC by the way.

Combining this current available technology with what Banjot discussed during his VMworld session regarding HA futures I think the possibilities are endless. One of the most obvious ones is of course Stretched HA Clusters. When adding VMotion into the mix a stretched HA/DRS Cluster would be a possibility. This would require other thresholds of course but how cool would it be if DRS would re-balance your clusters based on specific pre-determined and configurable thresholds?!

Stretched HA/DRS Clusters would however mean that the cluster needs to be carved into sub-clusters to make sure I/O stays local. You don’t want to run your VMs on site A while their VMDKs are stored on site B. This of course depends on the array technology being used. (Active / Active, as in one virtual array would solve this.) During Banjot session it was described as “tagged” hosts in a cross site Cluster and during the Long Distance VMotion session it’s described as “DRS being aware of WAN link and sidedness”. I would rather use the term “sub-cluster” or “host-group”. Although this all seems to be still far away it seems to be much closer than we expect. Long Distance VMotion is supported today. Sub-clusters aren’t available yet but knowing VMware, and looking at the competition, they will go full steam ahead.

HA: Did you know?

Duncan Epping · Sep 20, 2009 ·

Did you know that…

the best practice to increase the isolation response time(das.failuredetectiontime) from 15000 to 60000 for an Active/Standby situation for your service console has been deprecated as of vSphere.
(In other words for active/standby leave it set to the default 15000 for vSphere)
the limit of 100 VMs per host is actually “100 powered on and HA enabled VMs”. Of course this also goes for the 40 VM limit for clusters with more than 8 hosts.
the limit of 100VMs per host in an HA cluster less than 9 hosts is a soft limit.
das.isolationaddress[0-9] is one of the most underrated advanced settings.
It should be used as an additional safety net to rule out false positives.

Just four little things most people don’t seem to realize or know…

IO DRS – Providing Performance Isolation to VMs in Shared Storage Environments (TA3461)

Duncan Epping · Sep 16, 2009 ·

This was probably one of the coolest sessions of VMworld. Irfan Ahmad was the host of this session and some of you might know him from Project PARDA. The PARDA whitepaper describes the algorithm being used and how the customer could benefit from this in terms of performance. As Irfan stated this is still in a research phase. Although the results are above expectations it’s still uncertain if this will be included in a future release and if it does when this will be. There are a couple of key take aways that I want to share:

Congestion management on a per datastore level -> limits on IOPS and set shares per VM
Check the proportional allocation of the VMs to be able to identify bottlenecks.
With I/O DRS throughput for tier 1 VMs will increase when demanded (More IOPS, lower latency) of course based on the limits / shares specified.
CPU overhead is limitied -> my take: with the new hardware of today I wouldn’t worry about an overhead of a couple percent.
“If it’s not broken, don’t fix it” -> if the latency is low for all workloads on a specific datastore do not take action, only above a certain threshold!
I/O DRS does not take SAN congestion in account, but SAN is less likely to be the bottleneck
Researching the use of Storage VMotion move around VMDKs when there’s congestion on the array level
Interacting with queue depth throttling
Dealing with end-points and would co-exist with Powerpath

That’s it for now… I just wanted to make a point. There’s a lot of cool stuff coming up. Don’t be fooled by the lack of announcements(according to some people, although I personally disagree) during the keynotes. Start watching the sessions, there’s a lot of knowledge to be gained!