• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

clustering

The State of vSphere Clustering by @virtualirfan

Duncan Epping · Oct 23, 2012 ·

The state of vSphere clustering
By Irfan Ahmad

Some of my colleagues at CloudPhysics and I spent years at VMware and were lucky to have participated in one of the most rapid transformations in enterprise technology history. A big part of that is VMware’s suite of clustering features. I worked alongside Carl Waldspurger in the resource management team at VMware that brought to the world the ESX VMkernel CPU and memory schedulers, DRS, DPM, Storage I/O Control and Storage DRS among other features. As a result, I am especially interested in analyzing and improving how IT organizations use clustering.

Over a series of blog posts, I’ll try to provide a snapshot of how IT teams are operationalizing vSphere.  One of my co-founders, Xiaojun Liu and I performed some initial analysis on the broad community dataset that is continually expanding as more virtualization engineers securely connect us to their systems.

First, we segmented our analysis based on customer size. The idea was to isolate the effect of various deployment sizes including test labs, SMBs, commercial and large enterprise, etc. Our segmentation was in terms of total VMs in customer deployments and divided up as: 1-50 VMs, 51-200, 201-500, 501-upwards. Please let us know if you believe an alternative segmentation would warrant better analysis.

Initially we compared various ESX versions deployed in the field. We found ESXi 5.0 already captured the majority of installations in large deployments. However, 4.0 and 3.5 versions continue to be deployed in the field in small numbers. Version 4.1, on the other hand, continues to be more broadly deployed. If you are still using 4.1, 4.0, and 3.5, we recommend upgrading to 5.0 which provides greatly improved HA clustering, amongst many other benefits. This data shows the 5.0 version has been broadly adopted by our peers and is user-verified production ready.

Next, we looked at cluster sizes. A key question for VMware product managers was often, “How many hosts are there in a typical cluster?” This was a topic of considerable debate, and it is critically important to know when prioritizing features. For example, how much emphasis should go into scalability work for DRS.

For the first time, CloudPhysics is able to leverage real customer data to provide answers. The highest frequency cluster size is two hosts per cluster for customers with greater than 500 VMs. Refer to the histogram. This result is surprisingly low and we do not yet know all the contributing reasons, though we can speculate on some of the causes. These may be a combination of small trainiång clusters, dedicated clusters for some critical applications, Oracle clustering license restrictions, or perhaps a forgotten pair of older servers. Please tell us why you may have been keeping your clusters small.

Despite the high frequency of two-host clusters, we see opportunities for virtualization architects to increase their resource pooling. By pooling together hosts into larger clusters, DRS can do a much better job at placement and providing resource management. That means real dollars in savings. It also allows for more efficient HA policy management since the absorption of spare capacity needed for infrequent host failures is now spread out over a larger set of hosts. Additionally, having fewer clusters makes for fewer management objects to configure, keep in sync with changing policies, etc. This reduces management complexity and makes for a safer and more optimized environment.

Several caveats arise with regard to the above findings. First is potential sample bias. For instance, it might be the case that companies using CloudPhysics are more likely to be early adopters and that early adopters might be more inclined to upgrade to ESX 5.0 faster. Another possible issue is imbalanced dataset composition. It might be that admins are setting up small training or beta labs, official test & development, and production environments mixed in the same environment thus skewing the findings.

CloudPhysics is the first to provide a method of impartially determining answers based on real customer data, in order to dampen the controversy.

Xiaojun and I will continue to report back on these topics as the data evolves. In the meantime, the CloudPhysics site is growing with new cards being added weekly. Each card solves daily problems that virtualization engineers have described to us in our Community Cards section. I hope you will take the time to send us your feedback on the CloudPhysics site.

Database clustering support for vCloud Director added in version 5.1!

Duncan Epping · Oct 18, 2012 ·

Those who have been architecting vCloud Director environments from the early days know that this has always been a pain point. I personally have had many discussions with product management and engineering to get support for database clustering like Oracle RAC or Microsoft clustering services for MS SQL. Unfortunately neither 1.0 and 1.5 supported it. So the big questions always was, when will database clustering support for vCloud Director be added?

I had a couple of discussions around this again last week and noticed it was still not listed until someone pointed me to the vCAT 3.0 documents. Hidden on page 110 of document “3a Architecting a VMware vCloud.pdf” I found the following statement:

VMware vCloud component database resiliency is provided through database clustering. Microsoft Cluster Service for SQL and Oracle RAC are supported.

Yes I do realize that this is not a KB article, or even mentioned in the vCloud Director documentation. I have requested the docs to be revised and a KB to be created. Hopefully those will follow soon, for now this statement is all we needed! When the docs are revised or a KB is published I will add the references to this article.

<update – 18/Oct/2012> KB just got added – http://kb.vmware.com/kb/2037802 </update>

Can I get your book for free?

Duncan Epping · Oct 11, 2012 ·

Well not from me, but CloudPhysics has a nice book give-away going on at the moment for the VMworld Barcelona attendees! So what do you need to do?

How To Win

  • Email us at info@cloudphysics.com with a subject of “Book”. No message is needed.
  • Register at http://www.cloudphysics.com/ by clicking “SIGN UP”.
  • Install the CloudPhysics Observer vApp to activate your dashboard.

Eligibility

  • You are attending VMworld Barcelona.
  • You are a new CloudPhysics user.
  • You fully install the CloudPhysics ‘Observer’ vApp in your vSphere environment.

That is an easy way of getting the book for free right? So I suggest you head over and sign up to make sure you are part of the first 150 users that gets a free book!

Out on iBooks finally – vSphere 5.1 Clustering Deepdive

Duncan Epping · Oct 1, 2012 ·

It took about about a month to get this published, but here it finally is: vSphere 5.1 Clustering Deepdive on iBooks.

Yeah yeah, we know… you also want Nook and lulu.com says it is pending so that means it probably takes a couple of days before it is up on Barnes and Nobles as well.

Why is my pathing policy limited to “fixed” or “MRU” with things like MSCS cluster?

Duncan Epping · May 17, 2012 ·

Yesterday I received an email from someone. He wanted to know why he was limited to using either the “fixed” or “MRU” pathing policy for the LUNs attached to his MSCS cluster. In his environment they used round-robin for everything and not being able to configure all of them with the same policy was against their internal policy. The thing is that if round-robin would be used and the path would switch (by default every 1000 I/Os) the SCSI-2 reservation would need to be re-acquired on this LUN. (MSCS uses SCSI-2 reservations for their cluster devices) As you can imagine that could cause a lot of stress on your array and could lead to all sorts of problems. So please do not ignore this recommendation! Some extra details can be found in the following KB articles:

  • http://kb.vmware.com/kb/1033678
  • http://kb.vmware.com/kb/1037959
  • « Go to Previous Page
  • Page 1
  • Page 2
  • Page 3
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Advertisements




Copyright Yellow-Bricks.com © 2025 · Log in