design

Layer 2 Adjacency for vMotion (vmkernel)

Duncan Epping · Aug 19, 2010 ·

Recently I had a discussion around Layer 2 adjacency for the vMotion(vmkernel interface) network. With that meaning that all vMotion interfaces, aka vmkernel interfaces, are required to be on the same subnet as otherwise vMotion would not function correctly.

Now I remember when this used to be part of the VMware documentation but that requirement is nowhere to be found anywhere. I even have a memory of documentation of the previous versions stating that it was “recommended” to have layer-2 adjacency but even that is nowhere to be found. The only reference I could find was an article by Scott Lowe where Paul Pindell from F5 chips in and debunks the myth, but as Paul is not a VMware spokes person it is not definitive in my opinion. Scott also just published a rectification of his article after we discussed this myth a couple of times over the last week.

So what are the current Networking Requirements around vMotion according to VMware’s documentation?

On each host, configure a VMkernel port group for vMotion
Ensure that virtual machines have access to the same subnets on source and destination hosts
Ensure that the network labels used for virtual machine port groups are consistent across hosts

Now that got me thinking, why would it even be a requirement? As far as I know vMotion is all layer three today, and besides that the vmkernel interface even has the option to specify a gateway. On top of that vMotion does not check if the source vmkernel interface is on the same subnet as the destination interface, so why would we care?

Now that makes me wonder where this myth is coming from… Have we all assumed L2 adjacency was a requirement? Have the requirements changed over time? Has the best practice changed?

Well one of those is easy to answer; no the best practice hasn’t changed. Minimize the amount of hops needed to reduce latency, is and always will be, a best practice. Will vMotion work when your vmkernels are in two different subnets, yes it will. Is it supported? No it is not as it has not explicitly gone through VMware’s QA process. However, I have had several discussions with engineering and they promised me a more conclusive statement will be added to our documentation and the KB in order to avoid any misunderstanding.

Hopefully this will debunk this myth that has been floating around for long enough once and for all. As stated, it will work it just hasn’t gone through QA and as such cannot be supported by VMware at this point in time. I am confident though that over time this statement will change to increase flexibility.

References:

VCDX Application Form

Duncan Epping · Apr 21, 2010 ·

I had many questions around this and was always told I was not allowed to “talk” about the Application Form and the requirements for the VCDX defense. Apparently this has changed as VMware just published it on MyLearn and it is available to anyone with a MyLearn account.

Now, I want to remind everyone that this Application Form is subject to change:

The final step in becoming a VMware Certified Design Expert is the VCDX Defense. In this step you will need to successfully complete an interview, project review, and design defense activity. The VCP3 certification, the VMware Enterprise Administration on VI3 Exam pass, and the VMware Design on VI3 Exam pass are all pre-requisites to theVCDX Application. Applications will not be accepted unless all pre-requsites have been met.

Anyone going through the process will benefit from reading the application form. The questions I had most where what the minimal set of deliverables are. Those can be found in the application form, but for your convenience have been listed below:

VI Architecture Design. This document provides a complete design for the virtual inf rastructure and dependencies. It provides detailed specifications for all architectural components, as well as the decisions and justifications behind design choices. It includes architectural schematic diagrams (a.k.a Design Blueprints)
VI Installation and Configuration Guidelines and Procedures. This document provides guidance on how the specified design will be put together. This document i s sometimes referred to as an “assembly and configuration guide.”
VI Operational Verification. This document provides a test plan t o be used to validate an implementation using the design documents.
VI Operational Procedures. This document provides steps and guidance on how to use an implemented architecture from an operator’s perspective. This document is sometimes referred to as “standard procedures.”
Next Steps. This document summarizes design requirements, objectives and issues. It provides recommendations for proceeding with the implementation as a wrap-up to a typical design project.

If you are going through the process, read the application form! Keep in mind that filling it out will take time. Especially the section on Design Decisions where you will need to come up with a consideration, decision, justification and impact in several sections of which for instance Storage, Networking, Cluster Design and VM design.

For those who will be submitting their design: READ! If it say “a minimum of four” stick to that as anything less is a straight reject.

VCDX defense panels

Duncan Epping · Apr 21, 2010 ·

Just to let you guys know… VCDX defense panels are being scheduled. The invites for Melbourne went out this week and the ones for VMworld SF will follow soon I guess.

Defense interviews will be held during the following days:

July 5-9, Melbourne, Australia

August 23-27, San Francisco, CA

October 11-15, Copenhagen, Denmark

November (dates TBD), Boston, MA

Completed applications will be scheduled in the order received.

source

Aligning your VMs virtual hard disks

Duncan Epping · Apr 8, 2010 ·

I receive a lot of hits on an old article regarding aligning your VMDKs. This article doesn’t actually explain why it is important but only how to do it. The how is not actually as important in my opinion. I do however want to take the opportunity to list some of the options you have today to align your VMs VMDKs. Keep in mind that some require a license(*) or login for that matter:

UberAlign by Nick Weaver
mbralign by NetApp(*)
vOptimizer by Vizioncore(*)
GParted (Free tool, Thanks Ricky El-Qasem).

First let’s explain why alignment is important. Take a look at the following diagram:

In my opinion there is no need to discuss VMFS alignment. Everyone, and if you don’t you should!, creates their VMFS via vCenter which means it is automatically aligned and you won’t need to worry about it. However you will need to worry about the Guest OS. Take Windows 2003, by default when you install the OS your partition is misaligned. (Both Windows 7 and Windows 2008 create aligned partitions by the way.) Even when you create a new partition it will be misaligned. As you can clearly see in the diagram above every cluster will span multiple chunks. Well actually it depends. I guess that’s the next thing to discuss but first let’s show what an aligned OS partition looks like:

I would recommend everyone to read this document. Although it states at the beginning it is obsolete it still contains relevant details! And I guess the following quote from the vSphere Performance Best Practices whitepaper says it all:

Src
The degree of improvement from alignment is highly dependent on workloads and array types. You might want to refer to the alignment recommendations from your array vendor for further information.

Now you might wonder why some vendors are more effected by misalignment than others. The reason for this is block sizes on the back end. For instance NetApp uses a 4KB block size (correct me if I am wrong). If your filesystem uses a 4KB block size (or cluster size as Microsoft calls it) as well this basically means every single IO will require the array to read or write to two blocks instead of 1 when your VMDK’s are misaligned as the diagrams clearly show.

Now when you take for instance an EMC Clariion it’s a different story. As explained in this article, which might be slightly outdated, Clariion arrays use a 64KB chunk size to write their data which means that not every Guest OS cluster is misaligned and thus EMC Clariion is less effected by misalignment. Now this doesn’t mean EMC is superior to NetApp, I don’t want to get Vaughn and Chad going again ;-), but it does mean that the impact of misalignment is different for every vendor and array/filer. Keep this in mind when migrating and / or creating your design.

What’s the point of setting “–IOPS=1” ?

Duncan Epping · Mar 30, 2010 ·

To be honest and completely frank I really don’t have a clue why people recommend setting “–IOPS=1” by default. I have been reading all these so called best practices around changing the default behaviour of “1000” to “1” but none of these contain any justification. Just to give you an example take a look at the following guide: Configuration best practices for HP StorageWorks Enterprise Virtual Array (EVA) family and VMware vSphere 4. The HP document states the following:

Secondly, for optimal default system performance with EVA, it is recommended to configure the round robin load balancing selection to IOPS with a value of 1.

Now please don’t get me wrong, I am not picking on HP here as there are more vendors recommending this. I am however really curious how they measured “optimal performance” for the HP EVA. I have the following questions:

What was the workload exposed to the EVA?
How many LUNs/VMFS volumes were running this workload?
How many VMs per volume?
Was VMware’s thin provisioning used?
If so, what was the effect on the ESX host and the array? (was there an overhead?)

So far none of of the vendors have published this info and I very much doubt, yes call me sceptical, that these tests have been conducted with a real life workload. Maybe I just don’t get it but when consolidating workloads a threshold of a 1000 IOPS isn’t that high is it? Why switch after every single IO? I can imagine that for a single VMFS volume this will boost the performance as all paths will be equally hit and load distribution on the array will be optimal. But for a real life situation where you would have multiple VMFS volumes this effect decreases. Are you following me? Hmmm, let me give you an example:

Test Scenario 1:

1 ESX 4.0 Host
1 VMFS volume
1 VM with IOMeter
HP EVA and IOPS set to 1 with Round Robin based on the ALUA SATP

Following HP’s best practices the Host will have 4 paths to the VMFS volume. However as the HP EVA is an Asymmetric Active Active array(ALUA) only two paths will be shown as “optimized”. (For more info on ALUA read my article here and Frank’s excellent article here.) Clearly when IOPS is set to 1 and there’s a single VM pushing IOs to the EVA on a single VMFS volume the “stress” produced by this VM would be equally divided on all paths without causing any spiky behaviour. In contrary to what a change of paths every “1000 IOs” might do. Although a 1000 is not a gigantic number it will cause spikes in your graphs.

Now lets consider a different scenario. Let’s take a more realistic one:

Test Scenario 2:

8 ESX 4.0 Hosts
10 VMFS volumes
16 VMs per volume with IOMeter
HP EVA and IOPS set to 1 with Round Robin based on the ALUA SATP

Again each VMFS volume will have 4 paths but only two of those will be “optimized” and thus be used. We will have 160 VMs in total on this 8 Host cluster and 10 VMFS volumes which means 16 VMs per VMFS volume. (Again following all best practices.) Now remember we will only have two optimized paths per VMFS volume and we have 16 VMs driving traffic to a volume, but not only 16 VMs this is also coming from 8 different hosts to these Storage Processors. Potentially each host is sending traffic down every single path to every single controller…

Let’s assume the following:

Every VM produces 8 IOps on average
Every host runs 20 VMs of which 2 will be located on the same VMFS volume

This means that every ESX host changes the path to a specific VMFS volume every 62 seconds(1000/(2×8)), with 10 volumes that’s a change every 6 seconds on average per host. With 8 hosts in a cluster and just two Storage Processors… You see where I am going? Now I would be very surprised if we would see a real performance improvement when IOPS is set to 1 instead of the default 1000. Especially when you have multiple Hosts running multiple VMs hosted on multiple VMFS volumes. If you feel I am wrong here or work for a Storage Vendor and have access to the scenarios used please don’t hesitate to join the discussion.

<update> Let me point out though that every situation is different, if you have had discussions with your storage vendor based on your specific requirements and configuration and this recommendation was given… Do not ignore it, ask why and if it indeed fits –> implement! Your storage vendor has tested various configurations and knows when to implement what, this is just a reminder that implementing “best practices” blind is not always the best option!</update>