• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

clustering

Black Friday Gift: Free copy of the vSphere 6.7 Clustering Deep Dive, thanks Rubrik (ebook)

Duncan Epping · Nov 23, 2018 ·

Many asked us if the ebook would be made available for free again. Today I have the pleasure of announcing that Frank, Niels and I have worked once again with Rubrik and the VMUG organization to make the vSphere 6.7 Clustering Deep Dive book available for free! Yes, that is 0 USD / EURO, or whatever your currency is. As the book signing at VMworld was wildly popular, which resulted in the follow up discussion about the ebook.

Ready to up your vSphere game? Join us at #VMworld booth #P305 for a complimentary copy of @ClusterDeepDive + the chance to meet authors @DuncanYB @FrankDenneman @NHagoort! More info: https://t.co/0DQ7nI1wzX pic.twitter.com/7nIGEvjdBF

— Rubrik, Inc. (@rubrikInc) November 2, 2018

You want a copy? All that we expect you to do is register on Rubrik’s website using your own email address. Anyway, register and start your download engines, pick up a fresh copy of the vSphere Clustering Deep Dive here!

Books linked, buy paper Clustering Deep Dive get ebook for 2.95!

Duncan Epping · Oct 1, 2018 ·

We just managed to link the paper and electronic version of the Clustering Deep Dive. This means that if you buy the paper book today, you can get the e-book at a discount. This was something a lot of you have asked for, so we pushed it through. Unfortunately, it did mean we had to re-upload the book to a different back-end system and “history” is lost, so those who already bought the paper version of the book, unfortunately, can’t get the same deal. If you are interested in getting both versions, go here. Or click below on the book, or one of the other books I recommend reading 🙂


VMworld Video: vSphere 6.7 Clustering Deep Dive

Duncan Epping · Sep 3, 2018 ·

As all videos are posted for VMworld (and nicely listed by William), I figured I would share the session Frank Denneman and I presented. It ended up in the Top 10 Sessions on Monday, which is always a great honor. We had a lot of positive feedback and comments, thanks for that! Most importantly, it was a lot of fun again to be up on stage at VMworld talking about this content after 6 years of absence or so. For those who missed it, watch it here:

https://s3-us-west-1.amazonaws.com/vmworld-usa-2018/VIN1249BU.mp4

Also very much enjoyed the book signing session at the Rubrik booth with Niels and Frank. I believe Rubrik gave away around 1000 copies of the book. Hoping we can repeat this huge success in EMEA. But more on that later. If you haven’t picked up the book yet and won’t be at VMworld Europe, consider picking it up through Amazon, e-book is 14.95 USD only.


Change in Permanent Device Loss (PDL) behavior for vSphere 5.1 and up?

Duncan Epping · Aug 1, 2013 ·

Yesterday someone asked me a question on twitter about a whitepaper by EMC on stretched clusters and Permanent Device Loss (PDL) behavior. For those who don’t know what a PDL is, make sure to read this article first.  This EMC whitepaper states the following on page 40:

Note:

In a full WAN partition that includes cross-connect, VPLEX can only send SCSI sense code (2/4/3+5) across 50% of the paths since the cross-connected paths are effectively dead. When using ESXi version 5.1 and above, ESXi servers at the non-preferred site will declare PDL and kill VM’s causing them to restart elsewhere (assuming advanced settings are in place); however ESXi 5.0 update 1 and below will only declare APD (even though VPLEX is sending sense code 2/4/3+5). This will result in a VM zombie state. Please see the section Path loss handling semantics (PDL and APD)

Now as far as I understood, and I tested this with 5.0 U1 the VMs would not be killed indeed when half of the paths were declared APD and the other half PDL. But I guess something has changed with vSphere 5.1. I knew about one thing that has changed which isn’t clearly documented so I figured I would do some digging and write a short article on this topic. So here are the changes in behavior:

Virtual Machine using multiple Datastores:

  • vSphere 5.0 u1 and lower: When a Virtual Machine’s files are spread across multiple Datastores it might not be restarted in the case a Permanent Device Loss scenario occurs.
  • vSphere 5.1 and higher: When a Virtual Machine’s files are spread across multiple Datastores and a Permanent Device Loss scenario occurs then vSphere HA will restart the virtual machine taking availability of those datastores on the various hosts in your cluster in to account.

Half of the paths in APD state:

  • vSphere 5.0 u1 and lower: When a datastore on which your virtual machine resides is not in a 100% declared in a PDL state (assume half of the paths in APD) then the virtual machine will not be killed and restarted.
  • vSphere 5.1 and higher: When a datastore on which your virtual machine resides is not in a 100% declared in a PDL state (assume half of the paths in APD) then the virtual machine will be killed and restarted. This is a huge change compared to 5.0 U1 and lowe

These are the changes in behavior I know about for vSphere 5.1, I have asked engineering to confirm these changes for vSphere Metro Storage Cluster environments. When I have received an answer I will update this blog.

The State of vSphere Clustering by @virtualirfan

Duncan Epping · Oct 23, 2012 ·

The state of vSphere clustering
By Irfan Ahmad

Some of my colleagues at CloudPhysics and I spent years at VMware and were lucky to have participated in one of the most rapid transformations in enterprise technology history. A big part of that is VMware’s suite of clustering features. I worked alongside Carl Waldspurger in the resource management team at VMware that brought to the world the ESX VMkernel CPU and memory schedulers, DRS, DPM, Storage I/O Control and Storage DRS among other features. As a result, I am especially interested in analyzing and improving how IT organizations use clustering.

Over a series of blog posts, I’ll try to provide a snapshot of how IT teams are operationalizing vSphere.  One of my co-founders, Xiaojun Liu and I performed some initial analysis on the broad community dataset that is continually expanding as more virtualization engineers securely connect us to their systems.

First, we segmented our analysis based on customer size. The idea was to isolate the effect of various deployment sizes including test labs, SMBs, commercial and large enterprise, etc. Our segmentation was in terms of total VMs in customer deployments and divided up as: 1-50 VMs, 51-200, 201-500, 501-upwards. Please let us know if you believe an alternative segmentation would warrant better analysis.

Initially we compared various ESX versions deployed in the field. We found ESXi 5.0 already captured the majority of installations in large deployments. However, 4.0 and 3.5 versions continue to be deployed in the field in small numbers. Version 4.1, on the other hand, continues to be more broadly deployed. If you are still using 4.1, 4.0, and 3.5, we recommend upgrading to 5.0 which provides greatly improved HA clustering, amongst many other benefits. This data shows the 5.0 version has been broadly adopted by our peers and is user-verified production ready.

Next, we looked at cluster sizes. A key question for VMware product managers was often, “How many hosts are there in a typical cluster?” This was a topic of considerable debate, and it is critically important to know when prioritizing features. For example, how much emphasis should go into scalability work for DRS.

For the first time, CloudPhysics is able to leverage real customer data to provide answers. The highest frequency cluster size is two hosts per cluster for customers with greater than 500 VMs. Refer to the histogram. This result is surprisingly low and we do not yet know all the contributing reasons, though we can speculate on some of the causes. These may be a combination of small trainiång clusters, dedicated clusters for some critical applications, Oracle clustering license restrictions, or perhaps a forgotten pair of older servers. Please tell us why you may have been keeping your clusters small.

Despite the high frequency of two-host clusters, we see opportunities for virtualization architects to increase their resource pooling. By pooling together hosts into larger clusters, DRS can do a much better job at placement and providing resource management. That means real dollars in savings. It also allows for more efficient HA policy management since the absorption of spare capacity needed for infrequent host failures is now spread out over a larger set of hosts. Additionally, having fewer clusters makes for fewer management objects to configure, keep in sync with changing policies, etc. This reduces management complexity and makes for a safer and more optimized environment.

Several caveats arise with regard to the above findings. First is potential sample bias. For instance, it might be the case that companies using CloudPhysics are more likely to be early adopters and that early adopters might be more inclined to upgrade to ESX 5.0 faster. Another possible issue is imbalanced dataset composition. It might be that admins are setting up small training or beta labs, official test & development, and production environments mixed in the same environment thus skewing the findings.

CloudPhysics is the first to provide a method of impartially determining answers based on real customer data, in order to dampen the controversy.

Xiaojun and I will continue to report back on these topics as the data evolves. In the meantime, the CloudPhysics site is growing with new cards being added weekly. Each card solves daily problems that virtualization engineers have described to us in our Community Cards section. I hope you will take the time to send us your feedback on the CloudPhysics site.

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Go to Next Page »

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive", the “vSphere Clustering Technical Deep Dive” series, and the host of the "Unexplored Territory" podcast.

Upcoming Events

May 24th – VMUG Poland
June 1st – VMUG Belgium
Aug 21st – VMware Explore
Sep 20th – VMUG DK
Nov 6th – VMware Explore
Dec 7th – Swiss German VMUG

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2023 · Log in