das.maskCleanShutdownEnabled is set to true by default

I had a couple of questions on the topic of das.maskCleanShutdownEnabled today. For those who have not read the other articles I wrote about this topic, this is in short what it does and why it was introduced and how I explained it in an email today:

When a virtual machine is powered off (or shut down) by a user a property is set to true named runtime.cleanPowerOff”. To vSphere HA this indicates that the virtual machine was powered off by a user and as such when a host fails it knows that for this virtual machine it doesn’t need to take action. By default this property is set to true. If for whatever reason the virtual machine is killed by ESXi than this property is set to false.

vSphere HA provides the ability to respond to a storage failure (PDL). When a PDL occurs it can kill a virtual machine and then restart the virtual machine. However, runtime.cleanPowerOff” default is “true” and vSphere HA cannot access the datastore (PDL remember) to change the property! So this means if the VM is killed after the PDL, then it won’t be restarted as HA assumes it was cleanly powered off.

This is where das.maskCleanShutdownEnabled comes in to play. By setting this to “true”, vSphere HA assumes that VM is not cleanly powered off. Only when you cleanly power it off the property is set. In other words, In a PDL situation it will now restart the VM even though the datastore was unavailable when the VM was killed!

Back to the original question, what is das.maskCleanShutdownEnabled set to in 5.1 and later? Do you need to set it manually? No you do not, by default it is set to true! So when you configure a cluster, be aware of this… Especially in a stretched cluster environment where a PDL scenario is not unlikely.

** do not forget to also set terminateVMonPDL described in this blog post if you want VMs to be automatically killed when a PDL occurs! **

Disk Controllers removed from VSAN HCL

For those who are running VSAN in their environment, I would urge you to have a look at this KB article: Storage Controllers previously supported for VSAN that are no longer supported (2081431). This KB article describes a list of disk controllers which have been removed from the VSAN HCL because of the shallow queue depth. I described in an article a while back “Why Queue Depth matters” and this is also reiterated by Rakesh from VSAN product management in this blog article on the vSphere blog.

If you have purchased Virtual SAN for use with these controllers, please contact VMware customer care for next steps.

Public vSphere Beta, sign up and provide feedback now!

I am very pleased to see VMware just announced the beta of vSphere. I think it is great that everyone has the chance to sign up, download it, test it and provide feedback on such a critical part of your environment! Who doesn’t love to play with cutting edge technology? I know I do! Especially for all the bloggers and book authors out there this is an excellent opportunity to already start working on articles (or a book) for the launch time frame, whenever that will be. I have my engines fired up, downloading the bits as I write this…

How do you join?

  •  Navigate to https://communities.vmware.com/community/vmtn/vsphere-beta and click “JOIN NOW!” button on the right hand side!
  • Log in with your My VMware account.  (Please register for an account if you don’t have one).
  • Once you have an account and are logged in, please accept the Master Software Beta Test Agreement (MSBTA) and Program Rules screens if you have not already done so in the past.
  • After doing this you should be in the vSphere Beta 2 community.

There are 2 webinars coming up, which I would recommend attending:

  • Introduction / Overview – Tuesday, July 8, 2014
  • Installation & Upgrade – Thursday, July 10, 2014

One of the features, which is part of the beta, that I am excited about is Virtual Volumes. I have written about this concept a bunch of times (here and here) and I hope you folks will appreciate this feature as much as I do. If you are interested, look at this VVOL Beta page. You may wonder, why a separate page for VVOL beta? Well that is because you will need a VVOL capable storage solution…

Reminder: Before anyone forgets, the vSphere Beta is open to public but it is NOT a public beta. It still is a private beta and NDA applies!

vSwitch Traffic Shaping, what is what?

I was troubleshooting an issue where vMotion would time-out constantly, I had no clue where it was coming from so I started digging. In this case the environment was using a regular vSwitch and 10GbE networking. When I took a closer look I noticed that some form of traffic shaping was applied, as unfortunately the Distributed vSwitch was not an option for this environment. Now traffic shaping was enabled and the peak value was specified and the rest was left to the default value… and unfortunately this is exactly what cause the problem.

So when it comes to vSwitch Traffic Shaping, what is what? There are 3 settings you can set per portgroup:

  • Average Bandwidth – specified in Kbps
  • Peak Bandwidth - specified in Kbps
  • Burst Size - specified in KB

So if you have a 10Gbps NIC port for your traffic this means you have a total of 10,485,760 Kbps. When you enable vSwitch Traffic Shaping by default it is set to have “Average Bandwidth” to 100,000 Kbps , Peak Bandwidth to 100,000 Kbps and Burst Size to 1024,00 KB. So what does that mean? Well it means that if you enable it and do not change the values that the traffic is limited to 100,000 Kbps. 100,000 Kbps is… yes roughly 100Mbps, even less to be more precise: 97.6Mbps. Which is not a lot indeed, and not even a supported configuration for vMotion.

So what if I simply bump up the Peak Bandwidth to lets say 5Gbps, as I do not want vMotion to ever consume more than half of the NIC port (note, vSwitch traffic shaping is only for egress aka outbound traffic). Well setting the peak bandwidth sounds like it may do something, but probably not what you would hope for as this is how the settings are applied:

By default the traffic stream will get what is specified by “Average Bandwidth”. However, it is possible to exceed this when needed by specifying a higher “Peak Bandwidth” value. Your traffic will be allowed to burst until the value of “Burst Size” has been exceeded. In other words, in the above example when only Peak Bandwidth is increased this would lead to the following: By default the traffic is limited to 100Mbps, however it can peak to 5Gbps but only for 100MB worth of data traffic. As you can imagine in the case of vMotion when the full memory content of a VM is transferred that 100MB is hit within a second, after which the vMotion process is throttled back to 100Mbps and the remainder of the VM memory takes ages to copy and eventually times out.

So if you apply traffic shaping using your vSwitch, make sure to think through the numbers. In the above scenario for instance, specifying a 5Gbps Average and Peak would be what was desired.

Result of the Vietnam volunteering experience…

Before I forget, once again I would like to thank everyone who has made all of this possible. All the individuals and corporations who stepped up and made a donation, thank you on behalf of Orphan Impact and of course all of the children! (Donations are always welcome and help is always needed, look here for more details.) Some of you reached out to me personally and have asked me what the result was of the volunteering and their donations to Orphan Impact. Well the result was huge if I say so myself. With the money raised and the help provided Orphan Impact is on its way to provide computer classes to multiple additional orphanages! I just received two cool videos that I wanted to share with all of you. In these videos the results of the trip are explained both from the Orphan Impact side and from the VMware side in terms of volunteering experience.

Before I do, for those who missed the original blog posts on my volunteering experience:

Orphan Impact Story:

VMware Foundation Members share experience: