esxi

Second vswp file when doing a VMotion with vSphere?

Duncan Epping · Jul 31, 2009 ·

I was just reading this topic on the VMTN community. In short, a second vswp file gets created during a VMotion. As the starter of this topic noticed it could lead to not being able to VMotion VMs if you don’t have enough free disk space on your VMFS volume.

One of my UK colleagues, David Burgess, jumped in and explained what is happening during the VMotion and why this temporary vswp file is being created. Read it, it’s useful info:

It is only used if the target is under memory pressure. It is thin provisioned so even though it looks the size of the memory it should have very little impact on the free space of the VMFS.

The other thing is that the temp swap will only be used for activity as the machine transitions so should not grow to the size of the memory. If you “du” the file systems you should see the the blocks being consumed. Engineers think this should be tops 400M, if it is used at all. By pressured we mean the amount of memory free is low. That will not deny the VM to VMotion unless we can’t allocate enough reserved memory (this is zero by default). Once the transition is complete the VM reverts to the original swap file and the temp is deleted.

Take a look at the screenshot David uploaded, the bottom two vswp files are the ones created during the VMotion and as you can see are consuming 0 blocks.

Storage VMotion and moving to a Thin Provisioned disk

Duncan Epping · Jul 31, 2009 ·

I was just reading this article by Vladan about Storage VMotion. He explains how you can get your unused disk space back with Storage VMotion and moving to a Thin Provisioned disk at the same time. I agree that this is one of the best new features out there. It’s easy and effective.

However, you will need to keep in mind that although it seems like disk space is not used according to the Guest OS it might have been used in the past. (An OS usually only removes the pointer to the data and not the actual data itself.) If you d0 not zero out your disk before you do the Storage VMotion and migration to a thin provisioned disk you will be copying all the “filled” blocks. This is actually the same concept as for instance a VCB full image dump, which I addressed in the beginning of 2008.

So for optimizing migrations to Thin Provisioned disks either use sdelete by Microsoft/Sysinternals or use the “shrink” option within VMware tools. Both work fine, but keep in mind they can be time consuming. You could use sdelete to script the solution and actually zero-out every disk once a week.

Re: RTFM “What I learned today – HA Split Brain”

Duncan Epping · Jul 22, 2009 ·

I’m going to start with a quote from Mike’s article “What I learned today…“:

Split brain is HA situation where an ESX host becomes “orphaned” from the rest of the cluster because its primary service console network has failed. As you might know the COS network is used in the process of checking if an ESX host has suffered an untimely demise. If you fail to protect the COS network by giving vSwitch0 two NICs or by adding a 2nd COS network to say your VMotion switch, under-desired consequences can occour. Anyway, the time for detecting split brain used to be 15 seconds, for some reason this has changed to 12 seconds. I’m not 100% why, or if in fact the underlying value has changed – or that VMware has merely corrected its own documentation. You see its possible to get split brain in Vi3.5 happening if the network goes down for more than 12 seconds, but comes back up on the 13th, 14th or 15th second. I guess I will have to do some research on this one. Of course, the duration can be changed – and split brain is trivial matter if you take the neccessary network redundency steps…

I thought this issue was something that was common knowledge but if Mike doesn’t know about it my guess is that most of you don’t know about this. Before we dive into Mike’s article, technically this is not a split brain, it is an “orphaned vm” but not a scenario where the disk files and the in memory VM are split between hosts.

Before we start this setting is key in Mike’s example:

das.failuredetectiontime = This is the time period when a host has received no heartbeats from another host, that it waits before declaring the other host dead.

The default value is 15 seconds. In other words the host will be declared dead on the fifteenth second and a restart will be initiated by one of the primary hosts.

For now let’s assume the isolation response is “power off”. These VMs can only be restarted if the current VMs have been powered off. Here’s the clue, the “power off”(isolation response) will be initiated by the isolated host 2 seconds before the das.failuredetectiontime.

Does this mean that you can end up with your VMs being down and HA not restarting them?
Yes, when the heartbeat returns between the 13th and 15th second shutdown could already have been initiated. The restart however will not be initiated because the heartbeat indicates that the host is not isolated.

How can you avoid this?
Pick “Leave VM powered on” as an isolation response. Increasing the das.failuredetectiontime will also decrease the chances of running in to issues like these.

Did this change?
No, it’s been like this since it has been introduced.

Up to 80 virtual machines per host in an HA Cluster (3.5 vs vSphere)

Duncan Epping · Jul 16, 2009 ·

I was re-reading the KB article on how to improve HA scaling. Apparently pre vCenter 2.5 U5 there was a problem when the amount of VMs that needs to fail-over exceeds 35. Keep in mind that it’s a soft limit, you can run more than 35 VMs on a single host in a HA cluster if you want to though.

To increase scalability up to 80VMs per host vCenter needs to be upgraded to 2.5 U5 and the following configuration changes are recommended:

To increase the maximum vCPU limit to 192
To increase the Service Console memory limit to 512 MB.
To increase the memory resource reservation of the vim resource pool to 1024 MB.
To include/edit the host agent memory configuration values. (hostdStopMemInMB=380 and hostdWarnMemInMB=300)

A question that I immediately had was what about vSphere. What are the values for vSphere and do I need to increase them as well? Here are the vSphere default settings:

512
300MB
0 MB
hostdStopMemInMB=380 and hostdWarnMemInMB=300

As you can see 1 and 4 are already the new default on vSphere. I would always recommend to set the Service Console memory to 800MB. With most hosts having 32GB or more the costs of assigning an extra 500MB to the Service Console is minimal. That leaves the recommendation to increase the memory reservation for the vim resource pool. I would recommend to leave it set to the default value. vSphere scales up to 100 VMs per host in a HA cluster and chances are that this will be increased when U1 hits the streets. (These values usually change with every release.)

NetApp’s vSphere best practices and EMC’s SRM in a can

Duncan Epping · Jul 15, 2009 ·

This week both NetApp and EMC released updated versions of documents I highly recommend to everyone interested in virtualization! Now some might think why would I want to read a NetApp document when we are an EMC shop.! Or why would I want to read an EMC document when my environment is hosted on a NetApp FAS3050. The answer is simple, although both documents contain vendor specific information there’s more to be learned from these documents because the focus is VMware products. No marketing nonsense, just the good stuff!

NetApp’s guide dives in to the basics of multipathing for instance. Especially the section on iSCSI/NFS is useful, how do I setup multiple VMkernels for load balancing and are the pros and cons. EMC’s SRM and Celerra guide includes a full how to set this up. Not only the EMC side but also the VMware SRM side of it. Like I said both documents are highly recommended!

TR-3749: vSphere on NetApp Storage Best Practices
EMC Celerra VSA and VMware SRM setup and configurations guide