4.1

Cool Tool: opvizor

Duncan Epping · Dec 7, 2010 ·

Recently Dennis Zimmer, which most of you probably know of Icomasoft or from the books he authored, emailed me about a new tool his company was developing. I watched the video that is hosted on opvizor.com and must admit that it looks promising. Especially as most solutions today are reactive or semi-pro-active and opvizor is aiming to be pro-active.

opvizor identifies in advance when the virtualized IT infrastructure is lo osing on performance or might crash. Issues in VMware environments can be analyzed and corrected before they become dangerous. In addition, opvizor provides optimized logfiles and makes it possible to share the infrastructure data with internal and external partners, thus allowing more efficient problem solving. “Our goal is, that opvizor anticipates 60 percent of issues from system behavior.”

Now the tool just entered the Beta stage and opvizor is looking for people willing to give it a testdrive and willing to provide feedback! Funnily enough the tool kind of reminds me of a great tool we use internally to take vm-support files apart and analyze them. I can assure you that with the right amount of work / commitment this can turn into a really powerful tool to monitor / healthcheck your environment on a regular basis.

vSphere 4.1 HA and DRS Technical Deepdive, the book!

Duncan Epping · Dec 6, 2010 ·

In August we announced that we were working a secret project and let you guys in on it. The idea was to get it published through an official Publisher but due to several circumstances and a very tight deadline we decided to go the self-publishing route to make it available as soon as possible. So here it is, the moment both Frank Denneman and I have been waiting for…. it is finally available, the HA and DRS technical deepdive.

As of today “vSphere 4.1 HA and DRS Technical Deepdive” is available on paper via CreateSpace and Amazon. We are also working on getting a digital copy up for sale but that will more than likely be early 2011.

There is something I want to make very clear here as I have heard multiple people referring to this book as “Duncan’s Book”. This book was very much a joint effort. Frank has invested at least as much time in this project as I have, and probably even more. I want to thank Frank for his hard work and hope everyone realizes that it is our book and not my book!

We want to take the opportunity to thank our Technical Reviewers for their very valuable feedback and for keeping us honest; fellow VCDX Panel Member Craig Risinger (VMware PSO), Marc Sevigny (VMware HA Engineering), Anne Holler (VMware DRS Engineering) and Bouke Groenescheij (Jume.nl). A very special thanks to Scott Herold for writing the foreword!

For those who can’t wait, order it via CreateSpace or Amazon now. (Please be so kind to leave a review

This is the description of the book that is up on CreateSpace/Amazon:

About the authors:
Duncan Epping (VCDX 007) is a Consulting Architect working for VMware as part of the Cloud Practice. Duncan works primarily with Service Providers and large Enterprise customers. He is focussed on designing Public Cloud Infrastructures and specializes in bc-dr, vCloud Director and VMware HA. Duncan is the owner of Yellow-Bricks.com, the leading VMware blog.
Frank Denneman (VCDX 029) is a Consulting Architect working for VMware as part of the Professional Services Organization. Frank works primarily with large Enterprise customers and Service Providers. He specializes in Resource Management, DRS and storage. Frank is the owner of frankdenneman.nl which has recently been voted number 6 worldwide on vsphere-land.com

VMware vSphere 4.1 HA and DRS Technical Deepdive zooms in on two key components of every VMware based infrastructure and is by no means a “how to” guide. It covers the basic steps needed to create a VMware HA and DRS cluster, but even more important explains the concepts and mechanisms behind HA and DRS which will enable you to make well educated decisions. This book will take you in to the trenches of HA and DRS and will give you the tools to understand and implement e.g. HA admission control policies, DRS resource pools and resource allocation settings. On top of that each section contains basic design principles that can be used for designing, implementing or improving VMware infrastructures.
Coverage includes:

HA node types

HA isolation detection and response

HA admission control

VM Monitoring

HA and DRS integration

DRS imbalance algorithm

Resource Pools

Impact of reservations and limits

CPU Resource Scheduling

Memory Scheduler

DPM

We hope you will enjoy reading it as much as we did writing it. Thanks,

RE: Maximum Hosts Per Cluster (Scott Drummonds)

Duncan Epping · Nov 29, 2010 ·

I love blogging because of the discussions you some times get into. One of the bloggers I highly respect and closely follow is EMC’s vSpecialist Scott Drummonds (former VMware Performance Guru). Scott posted a question on his blog about what the size of a cluster should be. Scott discussed this with Dave Korsunksy and Dan Anderson, both VMware employee, and more or less came to the conclusion that 10 is probably a good number.

So, have I given a recommendation? I am not sure. If anything I feel that Dave, Dan and I believe that a minimum cluster size needs should be set to guarantee that the CPU utilization target, and not the HA failover capacity, is the defining the number of wasted resources. This means a minimum cluster of something like four or five hosts. While neither of us claims a specific problem that will occur with very large clusters, we cannot imagine the value of a 32-host cluster. So, we think the right cluster size is somewhere shy of 10.

And of course they have a whole bunch of arguments for both Large( 12+) and small (8-) clusters… which I summarized below for your convenience

Pro Large: DRS efficiency. This was my primary claim in favor of 32-host clusters. My reasoning is simple: with more hosts in the cluster there are more CPU and memory resource holes into which DRS can place running virtual machines to optimize the cluster’s performance. The more hosts, the more options to the scheduler.
Pro Small: DRS does not make scheduling decisions based on the performance characteristics of the server so a new, powerful server in a cluster is just as likely to receive a mission-critical virtual machine as older, slower host. This would be unfortunate if a cluster contained servers with radically different–although EVC compatible–CPUs like the Intel Xeon 5400 and Xeon 5500 series.
Pro Small: By putting your mission-critical applications in a cluster of their own your “server huggers” will sleep better at night. They will be able to keep one eye on the iron that can make or break their job.
Pro Small: Cumbersome nature of their change control. Clusters have to be managed to a consistent state and the complexity of this process is dependent on the number of items being managed. A very large cluster will present unique challenges when managing change.
Pro Small: To size a 4+1 cluster to 80% utilization after host failure, you will want to restrict CPU usage in the five hosts to 64%. Going to a 5+1 cluster results in a pre-failure CPU utilization target of 66%. The increases slowly approach 80% as the clusters get larger and larger. But, you can see that the incremental resource utilization improvement is never more than 2%. So, growing a cluster slightly provides very little value in terms of resource utilization.

It is probably an endless debate and all the arguments for both “Pro Large” and “Pro Small” are all very valid although I seriously disagree with their conclusion as in not seeing the value of a 32-host cluster. As always it fully depends. On what in this case you might say, why would you ever want a 32-host cluster? Well for instance when you are deploying vCloud Director. Clusters are currently your boundary for your vDC, and who wants to give his customer 6 vDCs instead of just 1 because you limited your cluster size to 6 hosts instead of leaving the option open to go to the max. This might just be an exception and nowhere near reality for some of you but I wanted to use this as an example to show that you will need to take many factors into account.
Now I am not saying you should, but at least leave the option open.

One of the arguments I do want to debate is the Change Control argument. Again, this used to be valid in a lot of Enterprise environments where ESX was used. Now I am deliberately using “ESX” and “Enterprise” here as reality is that many companies don’t even have a change control process in place. (I worked for a few large insurance companies which didn’t!) On top of that there is a large discrepancy when it comes to the amount of work associated with patching ESX vs ESXi. I have spent many weekends upgrading ESX but today literally spent minutes upgrading ESXi. The impact and risks associated with patching has most certainly decreased with ESXi in combination with VUM and the staging options. On top of that many organizations treat ESXi as an appliance, and with with stateless ESXi and the Auto-Deploy appliance being around the corner I guess that notion will only grow to become a best practice.

A couple of arguments that I have often seen being used to restrict the size of a cluster are the following:

HA limits (different max amount of VMs when cluster are > 8 hosts)
SCSI Reservation Conflicts
HA Primary nodes

Let me start with saying that for every new design you create, challenge your design considerations and best practices… are the still valid?

The first one is obvious as most of you know by now that there is no such a thing anymore as an 8 host boundary with HA. The second one needs some explanation. Around the VI3 time frame cluster sizes were often limited because of possible storage performance issues. These alleged issues were mainly blamed on SCSI Reservation Conflicts. The conflicts were caused by having many VMs on a single LUN in a large cluster. Whenever a metadata update was required the LUN would be locked by a host and this would/could increase overall latency. To avoid this, people would keep the amount of VMs per VMFS volume low (10/15) and keep the amount of VMFS volumes per cluster low…. Also resulting in a fairly low consolidation factor, but hey 10:1 beats physical.

Those arguments used to be valid, however things have changed. vSphere 4.1 brought us VAAI; which is a serious game changer in terms of SCSI Reservations. I understand that for many storage platforms VAAI is currently not supported… However, the original mechanism which is used for SCSI Reservations has also severely improved over time (Optimistic Locking) which in my opinion reduced the need to have many small LUNs, which eventually would limit you from a max amount of LUNs per host perspective. So with VAAI or Optimistic Locking, and of course NFS, the argument to have small clusters is not really valid anymore. (Yes there are exceptions)

The one design consideration, which is crucial, that is missing in my opinion though is HA node placement. Many have limited their cluster sizes because of hardware and HA primary node constraints. As hopefully known, if not be ashamed, HA has a maximum of 5 primary nodes in a cluster and a primary is required for restarts to take place. In large clusters the chances of losing all primaries also increase if and when the placement of the hosts is not taken into account. The general consensus usually is, keep your cluster limited to 8 and spread across two racks or chassis so that each rack always has at least a single primary node to restart VMs. But why would you limit yourself to 8? Why, if you just bought 48 new blades, would you create 6 clusters of 8 hosts instead of 3 clusters of 16 hosts? By simply layering your design you can mitigate all risks associated with primary nodes placements while benefiting from additional DRS placement options. (Do note that if you “only” have two chassis, your options are limited.) Which brings us to another thing I wanted to discuss…. Scott’s argument against increased DRS placement was that hundreds of VMs in an 8 host cluster already leads to many placement options. Indeed you will have many load balancing options in an 8 host cluster, but is it enough? In the field I also see a lot of DRS rules. DRS rules will restrict the DRS Load Balancing algorithm when looking for suitable options, as such more opportunities will more than likely result in a better balanced cluster. Heck, I have even seen cluster imbalances which could not be resolved due to DRS rules in a five host cluster with 70 VMs.

Don’t get me wrong, I am not advocating to go big…. but neither am I advocating to have a limited cluster size for reasons that might not even apply to your environment. Write down the requirements of your customer or your environment and don’t limit yourself to design considerations around Compute alone. Think about storage, networking, update management, max config limits, DRS&DPM, HA, resource and operational overhead.

vStorage APIs for Array Integration aka VAAI

Duncan Epping · Nov 23, 2010 ·

It seems that a lot of vendors are starting to update their firmware to enable virtualized workloads from the vStorage APIs for Array Integration, also known as VAAI. Not only the vendors are starting to show interest, also the bloggers are picking up on it. Hence the reason I wanted to reiterate some of the excellent details out there and wanted to make sure everyone understands what VAAI brings. Although currently there are “only” three major improvements they can and probably will make a huge difference:

Hardware Offloaded Copy
Up to 10x faster VM deployment, cloning, Storage vMotion etc. VAAI offloads the copy task to the array, enabling the usage of native storage based mechanism resulting in a decrease of deployment time but equally important reducing the amount of data flowing between the array and server. Check this post by Bob Plankers and this one by Matt Liebowitz which clearly demonstrates the power of hardware offloaded copies! (reducing cloning from 19Minutes to 6Minutes!)
Write Same/Zero
10 x times less I/O for common tasks. Take for instance a zero-out process. It typically sends the same SCSI command several times. By enabling this option the same command will be repeated by the storage platform resulting in reduced utilization of the server while decreasing the time span of the action.
Hardware Offloaded Locking
SCSI Reservation Conflicts…. How many times have I heard that during Health Checks / Design Reviews and while troubleshooting performance related issues. Well VAAI solves those issues as well by offloading the locking mechanism to the array, also known as Atomic Test & Set aka ATS. It will more than likely reduce latency in an environment where thin-provisioned disks are used or linked clones, or even where VMware based snapshots are used. ATS removes the need to lock the full VMFS volume but instead locks a block when an update needs to occur.

One thing I wanted to point out here, which I haven’t seen mentioned yet, is that VAAI will actually allow you to have larger VMFS volumes. Now don’t get me wrong, I am not saying that you can go beyond 2TB-512b by enabling VAAI… My point is that by having VAAI enabled you will reduce the “load” on the array and on the servers. I placed quotes around load as it will not reduce the load from a VM perspective. What I am trying to get at is that many people have limited the amount of VMs per VMFS volume because of “SCSI Reservation Conflicts”. With VAAI this will change. Now you can keep your calculations “simple” and base your VMFS size on the amount of eggs you can have in a single basket and the sum of all VMs IOPS requirements.

After reading about all of this goodness I bet many of you want to use it straight away, well of course your array will need to support it first. Tomi Hakala created a nice list of all storage platforms that are currently supported and those that will be supported soon including a time frame. If your array is supported this KB explains perfectly how to enable/disable it.

I started out with saying that there are currently only three major enhancements…. that means indeed that there is more coming up in the future. Some of which I can’t discuss and others that I can as those were already mentioned at VMworld. (If you have access to TA7121 watch it!) I can’t say when they will be available or in which release, but I think it is great to know more enhancements are being worked on.

Dead Space Reclamation
Dead space is previously written blocks that are no longer used by the VM. Currently in order to reclaim diskspace (for instance when you’ve deleted a lot of files) these blocks you will need to zero them out with for instance sdelete and then Storage vMotion the VM. Dead Space Reclamation will enable the storage system to reclaim these dead blocks by giving block liveness information.
Out-of-space conditions notifications
This is very much an improvement for day-to-day operations. It will enable notification of possible “out-of-space” conditions on both the array vendor’s tool both also within the vSphere client!

Must reads:

Chad Sakac – What does VAAI mean to you?
Bob Plankers – If you ever needed convincing about VAAI
AndreTheGiant – VAAI
VMware KB – VAAI FAQ
VMware Support Blog – VAAI changes the way storage is handled
Matt Liebowitz – Exploring the performance benefits of VAAI
Bas Raayman – What is VAAI, and how does it add spice to my life as a VMware admin?

VMworld esxtop advanced session

Duncan Epping · Nov 8, 2010 ·

During my flight from Boston back to the Netherlands I listened to the VMworld esxtop session “Troubleshooting using ESXTOP for Advanced Users, TA6720“. As always an excellent session with a lot of in-depth info. Most of it was already documented, however there were a couple of key points that I hadn’t documented yet. I just added those to my esxtop page which I wanted to stress as I personally believe it is very useful info. It seems pretty random but it rolled up nicely into the esxtop page in my opinion.

%SYS should be less than 20, %SYS is the percentage of time spent by system services on behalf of the world. The possible system services are interrupt handlers, bottom halves, and system worlds.
-b = batch mode, adding “-a” will force all metrics to be gathered
Limit display to a single group (l)
- enables you to focus on a specific VM
Limiting the number of entities (#)
- this enables you for instance to watch the top 5 worlds for

I have also added thresholds for ZIP/s, UNZIP/s and CACHEUSD. These should of course be 0 from a performance perspective as anything larger than 0 means the host was overcommitted on memory and had to resort to memory compression.

If anyone has more metrics/thresholds to contribute which they used in the past to troubleshoot issues let me know!