Server

ESXi Management Network Resiliency

Duncan Epping · Mar 22, 2011 ·

When we wrote the HA/DRS book both Frank and I were still very much in an “ESX Classic” mindset. Over the last weeks I had questions around resilient network configurations for ESXi. I referred people back to the book but the comments that I got were that the examples were very much ESX Classic instead of ESXi. Now in my opinion the configuration looks very much the same except that “Service Console” will need to be replace with “Management Network” but I figured I might as well just document my preference for a resilient ESXi Management Network as I needed to do it anyway as part of an update of the book to a future version of vSphere.

In our book we give two examples, one of which is the simple version with a single “Service Console Network” and one with a dual “Service Console Network” configuration. Now I figured I could update both but I’d rather do just one and explain why I prefer to use this one. The one that I have picked is the single “Management Network” setup. The main reason for it being is the reduced complexity that it brings and on top of that multiple Management Networks will make sense in an environment where you have many NICs and Switches but with all these converged architectures flying around it doesn’t really make sense anymore to have 4 virtual links when you only have 2 physical. Yes I understand that something can happen to a subnet as well, but if that is the case you have far bigger problems than your HA heartbeat network failing. Another thing to keep in mind is that you can also mitigate some of the risks of running into a false positive by selected a different “Isolation Response”, typically we see these set to “Leave Powered On”.

The below is an excerpt from the book.

Although there are many configurations possible and supported we recommend a simple but highly resilient configuration. We have included the vMotion (VMkernel) network in our example as combining the Management Network and the vMotion network on a single vSwitch is the most commonly used configuration and an industry accepted best practice.

Requirements:

2 physical NICs
VLAN trunking

Recommended:

2 physical switches

The vSwitch should be configured as follows:

vSwitch0: 2 Physical NICs (vmnic0 and vmnic1)
- When multiple physical PCI devices are available make sure to use a port of each to increase resiliency
2 Portgroups (Management Network and vMotion VMkernel)
Management Network active on vmnic0 and standby on vmnic1
vMotion VMkernel active on vmnic1 and standby on vmnic0
Failback set to No

Each portgroup has a VLAN ID assigned and runs dedicated on its own physical NIC; only in the case of a failure it is switched over to the standby NIC. We highly recommend setting failback to “No” to avoid chances of a false positive which can occur when a physical switch routes no traffic during boot but the ports are reported as “up”. (NIC Teaming Tab)

Pros: Only 2 NICs in total are needed for the Management Network and vMotion VMkernel, especially useful in Blade environments. This setup is also less complex.

Cons: Just a single active path for heartbeats.

The following diagram depicts the active/standby scenario:

To increase resiliency we also recommend implementing the following Advanced Settings where the ip-address for “das.isolationaddress” should be a “pingable” device which is reachable by the ESXi hosts, preferably on the same subnet with as little hops as possible:

 das.isolationaddress = <ip-address>
 das.failuredetectiontime = 20000

Changing the PSP from Fixed to RR

Duncan Epping · Mar 21, 2011 ·

Today I was fooling around with my new Lab environment when I noticed my Path Selection Policy (PSP) was set to fixed while the array (Clariion CX4-120) most definitely supports Round Robin (RR). I wrote about it in the past(1, 2) but as with vSphere 4.1 the commands slightly changed I figured it wouldn’t hurt to write it down again:

First I validated what the currently used Storage Array Type Plugin (SATP) was and which Path Selected Policy was used:

esxcli nmp device list

(note that compared to 4.1 the “storage” bit was added… yes a minor but important change!)

Than I wanted to make sure that every single LUN that would be added would get the standard PSP for Round Robin:

esxcli nmp satp setdefaultpsp --satp VMW_SATP_ALUA_CX --psp VMW_PSP_RR

Now I also needed to set the PSP per LUN, for which I used these two lines of “script”:

for i in `ls /vmfs/devices/disks | grep naa.600`;
do esxcli nmp device setpolicy --device $i --psp VMW_PSP_RR;done

And I figured why not just set the number of IOps down to 1 as well just to see if it changes anything:

for i in `ls /vmfs/devices/disks/ | grep naa.600`;
do esxcli nmp roundrobin setconfig --device $i --type "iops" --iops=1;done

Setting “iops=1” Didn’t make much difference for me, but it appears to be a general recommendation these days so I figured it would be best to include it.

Before I forget, I wanted to document this as well. For my testing I used the following command which lets you clone a VMDK and time it:

time vmkfstools -i source.vmdk destination.vmdk

And the result would look as follows:

Destination disk format: VMFS zeroedthick
Cloning disk 'destination.vmdk'...
Clone: 100% done.
real    2m 9.67s
user    0m 0.33s
sys     0m 0.00s

Something that might be useful as well, timing the creation of a zeroedthick VMDK:

time vmkfstools -c 30G -d eagerzeroedthick newdisk.vmdk

I am using this to measure the difference between using and not using VAAI on a storage platform. It is a lot easier than constantly kicking off tasks in through vCenter. (Yes Alan and Luc I know it is way easier with PowerCLI.)

Best fling so far! Thinapped vSphere Client!

Duncan Epping · Mar 17, 2011 ·

Steve Herrod just announced the in my opinion the best fling so far on twitter.

Thinapped vSphere Client

Run vSphere client 4.1 in a snap. No install, just download the EXE and double-click. Place the ThinApped vSphere client on any network share and it will automatically stream to any Windows PC with no installation, agents, drivers, or specialized servers required. Carry ThinApped vSphere client and your customization on USB stick and now your vSphere client is available on the GO!

This fling uses VMware ThinApp to package vSphere Client into a single portable EXE giving you instant access to your virtual infrastructure from any computer. ThinApp has been used by corporate administrators to deploy thousands of applications to millions of desktops. Use VMware ThinApp to create your own virtualized applications, for more information visit the VMware ThinApp page and watch the ThinApp Blog.

MinFreePct 6% or should it be less?

Duncan Epping · Mar 17, 2011 ·

Back in the days when Servers still had 8GB or 16GB memory at most a setting was introduced that guaranteed the hypervisor had a certain amount of free memory to its disposal. The main purpose of this being of course stability of the system. As with any Operating System free memory is desirable to ensure it is available whenever a process requests it.. or should we say World in the case of ESXi.

These days however we hardly see environments with 8 or 16GB hosts…. No, most servers today have a minimum of 48GB and I guess the standard is 72 or 96GB. With 72GB and 96GB being the standard today one can imagine that 6% might be slightly going overboard. Especially in high density environments like VDI every single MB worth of of extra memory can and will be worth it. As such it might be beneficial to change that 6% back to 2%. This KB article has been around for a couple of weeks, and describes just that: http://kb.vmware.com/kb/1033687

Now you might wonder what happens if you change that 6% down to 2% as the memory states are closely related this is what many have published in the past:

6% – High
4% – Soft
2% – Hard
1% – Low

But is that really the case? What if I would change MinFreePct? Well I actually mentioned that in one of my previous articles. MinFreePct is defined as 6% however the other memory states are not fixed but rather a percentage of MinFreePct:

Free memory state thresholds { soft:64 pct hard:32 pct low:16 pct }

So that means that if you change the “High” watermark (6%) down to 2% the percentage that will trigger ballooning / compression / swap will also automatically change. Would I recommend changing MinFreePct? Well it depends, if you are running a high density VDI workload this might just give you that little extra you need but in most other cases I would leave it to the default. (For more on memory tuning for VDI read Andre’s article that he coincidentally published today.)

HA/DRS Deepdive now available on Amazon.co.uk and Amazon.de

Duncan Epping · Mar 13, 2011 ·

After 14 emails with absolutely no reply whatsoever our book, vSphere 4.1 HA and DRS Technical Deepdive, popped up on both the German and UK version of Amazon. For those who haven’t ordered it yet through comcol.nl you can also get it here:

Sorry about the delay and I hope they will continue selling it for a very long time. (It seems they don’t have it on stock currently so delivery might take a while.)

And in France as well through Amazon I just noticed

</edit>