No Jumbo frames on your Management Network! (Updated!)

I was just reading some of the comments posted today and Marc Sevigny, one of the vSphere HA developers, pointed out something which I did not know. I figured this is probably something that many are not aware of so I copied and pasted his comment:

Another thing to check if you experience this error is to see if you have jumbo frames enabled on the management network, since this interferes with HA communication.

This is document here in a note: http://kb.vmware.com/kb/2006729

To make it crystal clear: disable jumbo frames on your management network with vSphere 5.0 as there’s a problem with it! This problem is currently being investigated by the HA engineering team and will hopefully be resolved.

<Update> Just received an email that all the cases where we thought vSphere HA issues were caused by Jumbo Frames being enabled were actually caused by the fact that it was not configured correctly end-to-end. Please validate Jumbo Frame configuration on all levels when configuring. (Physical Switches, vSwitch, Portgroup, VMkernel etc)</Update>

Avoid changing your VMs IP in a DR procedure…

I was thinking about one of the most challenging aspects with DR procedures, IP changes. This is a very common problem. Although changing the IP address of a VM is usually straight forward it doesn’t mean that this is propagated to the application layer. Many applications use hardcoded IP addresses and changing these is usually a huge challenge.

But what about using vShield Edge? If you look at how vShield Edge is used in a vCloud Director environment, mainly NAT’ing and Firewall functionality, you could use it in exactly the same way for your VMs in a DR enabled environment. I know there are many Apps out there which don’t use hardcoded IP adresses and which are simple to re-IP. But for those who are not, why not just leverage vShield Edge… NAT the VMs and when there is a DR event just swap out the NAT pool and update DNS. On the “inside” nothing will change… and the application will continue to work fine. On the outside things will change, but this is an “easy” fix with a lot less risk than re-IP’ing that whole multi-tier application.

I wonder how some of you out in the field do this today.

 

Enabling Hot-Add by default? /cc @gabvirtualworld

Gabe asked the question on one of my recent posts if it made sense to enable Hot-Add by default and if there was an impact/overhead?

Lets answer the impact/overhead portion first, yes there is an overhead. It is in the range of percents. You might ask yourself where this overhead is coming from and if that is vSphere overhead or… When CPU and Memory Hot-add is enabled the Guest OS, especially Windows, will accommodate for all possible memory and CPU changes. For CPU is will take the max amount of vCPUs into account, so with vSphere 5 that would be 32. For memory it will take 16 x  power-on memory in to account, as that is the max you can provision . Does it have an impact? Again, a matter of percents. It could also lead to problems however when you don’t have sufficient memory provisioned as described in this KB by Microsoft: http://support.microsoft.com/kb/913568.

Another impact, mentioned by Valentin (VMware), is the fact that on ESXi 5.0 vNUMA would not be used if you had the HotAdd feature enabled for that VM.

What is our recommendation? Enable it only when you need it. Yes they impact might be small, but if you don’t need it why would you incur it?!

Why selecting the correct OS when creating/upgrading a VM is important

I had a discussion yesterday about why one would care about changing the “OS” type for a VM when it is upgraded, or even during the provisioning of a VM. I guess the obvious one is that a VM is “customized / optimized” based on this information from a hardware perspective. Another one that many people don’t realize is that when you initiate a VMware Tools install or Upgrade the information provided in the “Guest Operating System” (VM properties, Options, General Options) is used to mount the correct file. As you can see in the screenshot below, I selected “Windows 2008″ but actually installed Ubuntu, when I wanted to install VMware Tools the Windows binaries popped up. So make sure you update this info correctly,

Fiddling around with SRM’s Storage Replication Adapter – Part II

** Disclaimer: This is for educational purposes, please don’t implement this in your production environment as it is not supported! **

After my article this week about (ab) using the SRA provided through Site Recovery Manager to fail-over any LUN I expected some people reaching out to me with additional questions. One of the questions which came in more than once was “is it possible to do a test-failover of a LUN which is not managed by the SRM infra”? I guess the short answer is yes it is. The long answer is: well it depends on what your definition of a “test-failover” is. Of course booting up a physical machine from SAN while keeping the same IP etc would cause conflicts. I am also not going to show you how to re’ip your physical machines as I expect you to know this. From an SRM perspective how exciting is this?

To be honest, not really. The same concept applies. For a test-failover SRM calls the SRA by a script called “command.pl” and it feeds it XML. The following lines of XML are relevant for this exercise, but the critical one is “TestFailoverStartParameters”:

--> <TestFailoverStartParameters>
--> <ArrayId>BB005056AE32820000-server_2</ArrayId>
--> <AccessGroups>
--> <AccessGroup id="domain-c7">
--> <Initiator type="iSCSI" id="iqn.1998-01.com.vmware:localhost-11616041"/>
--> <Initiator type="iSCSI" id="iqn.1998-01.com.vmware:localhost-4a15366e"/>
--> <Initiator type="NFS" id="10.21.68.106"/>
--> <Initiator type="NFS" id="10.21.68.105"/>
--> </AccessGroup>
--> </AccessGroups>
--> <TargetDevices>
--> <TargetDevice key="fs14_T1_LUN1_BB005056AE32800000_fs10_T1_LUN1_BB005056AE32820000">
--> <AccessGroups>
--> <AccessGroup id="domain-c7"/>
--> </AccessGroups>
--> </TargetDevice>
--> </TargetDevices>
--> </TestFailoverStartParameters>
--> </Command>

Now in our case we want to fail-over a random non vSphere LUN. We will need the “initiator” (server(s)) who will need to see be able to see this LUN and we will need the LUN identifier. All of this can either be found in the SRM log files (LUN identifiers) or on the physical server (initiator details). If you would call command.pl and feed it the XML file the SRA will request the array to create a snapshot and give the host access to that snapshot. Now it is up to you to take the next steps!

It is no rocket science. Anything SRM does with the SRA you can do from the command line using command.pl and a custom XML file. As mentioned in the comments in my previous article, I know people are interested in using this for Physical Hosts… I will discuss this internally, but for now don’t come close, it is not supported!