esxi

VMware Technical papers

Duncan Epping · Dec 16, 2008 ·

VMware lately published a whole bunch of must read technical papers:

Storage Design Options for VMware Virtual Desktop Infrastructure

Companies planning to deploy VDI face decisions about the use of both local and shared storage,
and in the case of shared storage solutions, choosing between differing technologies available in
today’s market. Selecting the appropriate storage model is important for both performance and costs reasons. Certain solutions require less overhead than others, as do different implementations of the same technology. Costs can vary greatly depending on which storage options are chosen. Fortunately organizations can leverage a myriad of best practices to help drive these costs down, while improving performance. This paper provides information on technical concepts related to storage implementations in a VMware ® Virtual Desktop Infrastructure ( VDI) environment.

VMware View Reference Architecture Kit

This reference architecture kit is comprised of four distinct papers written by VMware and our supporting partners to serve as a guide to assist in the early phases of planning, design and deployment of VMware View based solutions. The building block approach uses common components to minimize support costs and deployment risks during the planning of VMware View based deployments.

SQL Server Workload Consolidation

Database workloads are very diverse. While most database servers are lightly loaded, larger database workloads can be resource-intensive, exhibiting high I/O rates or consuming large amounts of memory. With improvements in virtualization technology and hardware, even servers running large database workloads run well in virtual machines. Servers running Microsoft’s SQL Server, among the top database server platforms in the industry today, are no exception.

Using IP Multi Cast with VMware

IP multicast is a popular protocol implemented in many applications for simultaneously and efficiently delivering information to multiple destinations. Multicast sources send single copies of information over the network and let the network take responsibility for replicating and forwarding the information to multiple recipients.

UPDATE: Free ESXi and the RCLI

Duncan Epping · Dec 16, 2008 ·

I just had a chat with several people of VMware about the fact that the RCLI is read/write as of ESX 3.5 Update 3. As it turns out, these new APIs were opened up unintentionally. This unintentional change happened when VMware was resolving an API-related bug.

My VMware contacts tell me that this bug will be fixed shortly. So, do not get used to them because they will become restricted again. This only applies to customers who downloaded the free version of ESXi. VirtualCenter and VI (Foundation, Standard, Enterprise) customers are not affected.

HA: who decides where a VM will be restarted?

Duncan Epping · Dec 15, 2008 ·

During the Dutch VMUG someone walked up to me and asked a question about High Availability. He read my article on Primary and Secondary nodes and was wondering who decided where and when VM would be restarted.

Let’s start with a short recap of the “primary/secondary” article: “The first five servers that join the cluster will become a primary node, and the others that will join will become a secondary node. Secondary nodes send their state info to primary nodes and also contact the primary nodes for their heartbeat notification. Primary nodes replicate their data with the other primary nodes and also send their heartbeat to other primary nodes.”

The question was, when a fail-over needs to take place cause an isolation occurred who decides on which host a specific VM will be restarted. The obvious answer is one of the primaries. One of the primaries will be selected as the “fail-over coordinator”. The fail-over coordinator coordinates the restart of virtual machines on the remaining hosts. The coordinator takes restart priorities in account. Keep in mind, when two hosts fail at the same time it will handle the restart sequentially. In other words, restart the VM’s of the first failed host(taking restart priorities in account) and then restart the VM’s of the host that failed as second(again taking restart priorities in account). If the fail-over coordinator fails one of the primaries will take over.

By the way, this is another reason why you can only account for 4 host failures. You need at least 1 primary, this primary will be the fail-over coordinator. When the last primary dies….

VM’s may unexpectedly reboot when using VMware HA with Virtual Machine Monitoring

Duncan Epping · Dec 12, 2008 ·

This KB article has just been published:

Virtual Machines may unexpectedly reboot after a VMotion migration to an ESX 3.5 Update 3 Host OR after a Power On operation on an ESX 3.5 Update 3 Host, when VMware HA feature with Virtual Machine Monitoring is active.

There’s a work around for the problem but I will not be posting them here cause they might change somewhere in time. Just read the KB article for more info on how to fix this issue.

EnableResignature and/or DisallowSnapshotLUN

Duncan Epping · Dec 11, 2008 ·

I’ve spend a lot of time in the past trying to understand the settings for EnableResignature and DisallowSnapshotLUN. It had me confused and dazzled a couple of times. Every now and then I still seem to have trouble to actually understand these settings, after a quick scan through the VCDX Enterprise Study Guide by Peter I decided to write this post and I took the time to get to the bottom of it. I needed this settled once and for all, especially now I start to focus more on BC/DR.

According to the San Config Guide(vi3_35_25_san_cfg.pdf) there are three states:

EnableResignature=0, DisallowSnapshotLUN=1 (default)
In this state, you cannot bring snapshots or replicas of VMFS volumes by the array into the ESX Server host regardless of whether or not the ESX Server has access to the original LUN. LUNs formatted with VMFS must have the same ID for each ESX Server host.
EnableResignature=1, (DisallowSnapshotLUN is not relevant)
In this state, you can safely bring snapshots or replicas of VMFS volumes into the same servers as the original and they are automatically resignatured.
EnableResignature=0, DisallowSnapshotLUN=0 (This is similar to ESX Server 2.x behavior.)
In this state, the ESX Server assumes that it sees only one replica or snapshot of a given LUN and never tries to resignature. This is ideal in a DR scenario where you are bringing a replica of a LUN to a new cluster of ESX Servers, possibly on another site that does not have access to the source LUN. In such a case, the ESX Server uses the replica as if it is the original.

The advanced LVM setting EnableResignature is used for resignaturing a VMFS volume that has been detected with a different LUN ID. So what does the LUN ID has to do with the VMFS volume? The LUN ID is stored in the LVM Header of the volume. The LUN ID is used to check if it’s the same LUN that’s being (re)discovered or a copy of the LUN that’s being presented with a different ID. If this is the case the VMFS volume needs to be resignatured, in other words the UUID will be renewed and the LUN ID will be updated in the LVM header.

UUID, what’s that? Chad Sakac from EMC described it as follows in his post on VMFS resignaturing:

It’s a VMware generated number – the LVM signature aka the UUID (it’s a long hexadecimal number designed to be unique). The signature itself has little to with anything presented by the storage subsystem (Host LUN ID, SCSI device type), but a change in either will cause a VMFS volume to get resigned (the ESX server says “hey I used to have a LUN with this signature, but it’s parameters were different, so I better resign this”).

Like Chad says the UUID has little to do with anything presented by the storage subsystem. A VMFS volume ID aka UUID looks like this:

42263200-74382e04-b9bf-009c06010000

1st part – The COS Time when the file-system was created or re-signatured
2nd part – The TSC Time; an internal time stamp counter kept by the CPU
3rd part – A random number
4th part – The Mac Address of the COS NIC

Like I said before, and this is a common misconception so I will say it again, the LUN ID and the Storage System product ID are stored in the LVM header and not the actual UUID itself. Not that it really matters for the way the process works though.

I think that makes it clear when to use EnableResignature and when not to use it. Use it when you want to access VMFS volumes of which the LUN ID changed for whatever reason. For instance a fail over to a DR Site with different LUN numbering or SAN upgrades which caused changes in LUN numbering.

That leaves DisallowSnapshotLun. I had a hard time figuring out when to set it to “0” and when to leave it at the default setting “1”. But found the following in a VMworld Europe 2008 presentation:

DisallowSnapshotLun: Should be set to “0” if SCSI Inquiry string differs between the two Array’s in order to allow access to datastores.

I googled for “SCSI Inquiry” and I found the following in a document by HP:

The storage system product ID retrieved from the SCSI Inquiry string (Example: HSV210)

In other words, when you’ve got an HP EVA 4000 and an HP EVA 8000 which are mirrored you need to set DisallowSnapshotLun to 0, when a fail-over has occurred. The SCSI Inquiry string would differ because the controllers would be of a different model. (The SCSI Inquiry string also contains the LUN ID by the way.)

When both sites are exactly the same, including LUN ID’s, you don’t need to change this setting. Leave it set to 1. Be absolutely sure that when you set DisallowSnapshotLun to 0 that there’s only 1 “version” of the VMFS volume presented to the host. If for some reason both are presented data corruption can and probably will occur. If you need to present both LUNs at the same time, use EnableResignature instead of DisallowSnapshotLun.

Depending on the way your environment is setup and the method you chose to re-enable a set of LUNs you may need to re-register your VM’s. The only way to avoid this is to use DisallowSnapshotLun and pre-register all VM’s on the secondary VirtualCenter server or use just one VirtualCenter server.

Re-registering can be done with a couple of lines of script on just one ESX box:

for i in `find /vmfs/volumes/ -name "*.vmx" ` do echo "Registering VM $i" vmware-cmd -s register $i done

You can change the EnableResignature or DisallowSnapshotLun setting as follows:

open vCenter
Click on a host
Click on “Configurations” tab
Click on “Advanced Settings”
Go to “LVM”
Change appropriate setting
Click “Ok”
Rescan your HBA’s (Storage Adapters, Rescan)

It’s also possible to use the command line to enable DisallowSnapshotLun or EnableResignature:

echo 0 > /proc/vmware/config/LVM/DisallowSnapshotLUN echo 1 > /proc/vmware/config/LVM/EnableResignature

I do want to stress that setting the options should always be used temporarily considering the impact these changes can have! When you set any of both options reset them to the default. The big question still remains, would I prefer resignaturing my VMFS volumes or setting “DisallowSnapshotLun” to “0” to be able to access the volumes? Well the answer is:”It depends”. It heavily depends on the type of setup you have, I can’t answer this question without knowing the background of an environment. The safest method definitely is Resignaturing.

Before you decide read this post again and read the articles/pdf’s in the links below that I used as a reference:

Updates for the VMFS volume resignaturing discussion
HP disaster tolerant solutions using Continuous Access for HP EVA in a VI 3 environment
Fibre Channel SAN Configuration Guide
VMFS Resignaturing by Chad Sakac