• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

Server

vSphere 5.0 Hardening Guide public draft available

Duncan Epping · Apr 18, 2012 ·

One of the things my team is responsible for is security of the cloud infrastructure suite. They have worked really hard the last couple of months on overhauling the vSphere Hardening Guide. Today the public draft was published. (Thanks Charu, Grant and Kyle!)

One of the major changes is the format of the guide. It has been poured into an Excel spreadsheet making it easier filter, sort and edit. Please take a look at the guide and if there is any feedback don’t hesitate to comment on the community forum thread! The final version of the document should be published mid May.

Scripts release for Storage vMotion / HA problem

Duncan Epping · Apr 17, 2012 ·

Last week when the Storage vMotion / HA problem went public I asked both William Lam and Alan Renouf if they could write a script to detect the problem. I want to thank both of them for their quick response and turnaround, they cranked the script out in literally hours. The scripts were validated multiple times in a VDS environment and worked flawless. Note that these scripts can detect the problem in an environment using a regular Distributed vSwitch and a Nexus 1000v, the script can only mitigate the problem though in a Distributed vSwitch environment. Here are the links to the scripts:

  • Perl: Identifying & Fixing Virtual Machines Affected By SvMotion / VDS Issue (William Lam)
  • PowerCLI – Identifying and fixing VMs Affected By SvMotion / VDS Issue (Alan Renouf)

Once again thanks guys!

Limiting stress on storage caused by HA restarts by lowering restart concurrency?

Duncan Epping · Apr 16, 2012 ·

I had a question last week, and it had me going for a while. The question was if “das.perHostConcurrentFailoversLimit” could be used to lower the hit on storage during a boot storm. By default this advanced option is set to 32. Meaning that a max of 32 VMs will be restarted by HA on a single host. The question was if lowering this value to for instance 16 would help reducing the stress on storage when multiple hosts would fail, or for instance in a blade environment when a chassis would fail.

At first you would probably say “Yes of course it will”. Having only 16 restarts concurrently vs 32 should cut the stress in half… Well not exactly. The point here is that this setting is:

  1. A per host setting and not cluster wide
  2. Addressing power on attempts

So what is the problem with that exactly? Well in the case of the per host setting, if you have a 32 node cluster and 8 would fail, there would still be a max of 384 VMs power on attempts concurrently. (32 – 8 failed host) * 16 VMs max restart per host. Yes it is a lot better than 768, but still a lot of VMs hitting your storage.

But more importantly, we are talking power-on attempts here! A power-on attempt does not equal the boot process of the virtual machine! It is just the initial process that flips the switch of the VM from “off” to “on”, check vCenter when you power on a VM, you will see the task as completed during the boot process of your VM. Reducing this number will reduce the stress hostd, but that is about it. In other words, if you lower it to 16 you will have less power-on attempts concurrently, but they will be handled fast by HOSTD and before you know it 16 new power-on attempts will be done, and near simultaneous!

The only way you can really limit the hit on storage and virtual machines sharing this storage would be by enabling Storage IO Control. SIOC will ensure that all VMs who are in need of storage resources will get it in a fair manner. The other option is to ensure that you are not overloading your datastores with a massive amount of VMs and not the IOPS to back the boot storm process up. I guess there is no real need to be overly concerned here though… How often does it happen that 50% of your environment fails? If it does, are you worried about that 15 minute performance hit, or worried about those 50% of the VMs being down?

Clarifying the SvMotion / VDS problem

Duncan Epping · Apr 14, 2012 ·

<Update>I asked William Lam if he could write a script to detect this problem and possibly even mitigate it. William worked on it over the weekend and just posted the result! Head over to his blog for the script! Thanks William for cranking it out this quick! For those who prefer PowerCLI… Alan Renouf just posted his version of the script! Both scripts provide the same functionality though!</Update>

I think there is some confusion around the SvMotion / VDS problem I described a couple of days back. Let me try to clarify it in a couple of simple steps.

First of all, this only applies to virtual machines that have been Storage vMotioned by vCenter 5.0 and are connected to a Distributed vSwitch. This could be either manually or using Storage DRS. So what is the exact problem?

  • When a VM is attached to a dvPortgroup it is connected to a port. This information is stored locally on the host and on the VMFS volume this VM is stored on.
  • This volume will contain a file which is named equal to the port number of this VM.
  • When the VM is Storage vMotioned to a different datastore this file is not created on the destination datastore
  • When the host fails on which the Storage vMotioned VM resides HA will attempt to restart that VM.
  • In order for HA to restart it and connect it to the dvPortgroup this file is required.
  • As the file is not available the restart fails.

You can simply resolve this by connecting the impacted VMs to a different dvPortgroup temporarily and then reconnect them back to the original portgroup. As soon as you’ve done that the file will be created on the datastore. For now this is a manual task, but I am sure some of my teammembers are working on a scripted solution as we speak… right Alan / William? 🙂

HA fails to initiate restart when a VM is SvMotioned and on a VDS!

Duncan Epping · Apr 11, 2012 ·

<Update>I asked William Lam if he could write a script to detect this problem and possibly even mitigate it. William worked on it over the weekend and just posted the result! Head over to his blog for the script! Thanks William for cranking it out this quick! For those who prefer PowerCLI… Alan Renouf just posted his version of the script! Both scripts provide the same functionality though!</Update>

A couple of weeks back Craig S. commented on my blog about an issue he ran in to in his environment. He was using a Distributed vSwitch and testing certain failure scenarios. One of those scenarios was failing a host in the middle of a Storage vMotion process of a virtual machine. After he had failed the host he expected HA to restart the virtual machine but this did not happen unfortunately. He also could not get the virtual machine up and running again himself. Unfortunately in this case it was the vCenter Server that was used to test this scenario with, which brought him in to a difficult position. This was the exact error:

Operation failed, diagnostics report: Failed to open file /vmfs/volumes/4f64a5db-b539e3b0-afed-001b214558a5/.dvsData/71 9e 0d 50 c8 40 d1 c3-87 03 7b ac f8 0b 6a 2d/1241 Status (bad0003)= Not found

Today I spotted a KB article which describes this scenario, the error mentioned in this KB articles reveals a bit more what is going wrong I guess:

2012-01-18T16:23:17.827Z [FFE3BB90 error 'Execution' opID=host-6627:6-0] [FailoverAction::ReconfigureCompletionCallback]
Failed to load Dv ports for /vmfs/volumes/UUID/VM/VM.vmx: N3Vim5Fault19PlatformConfigFault9ExceptionE(vim.fault.PlatformConfigFault)
2012-01-18T16:23:17.827Z [FFE3BB90 verbose 'Execution' opID=host-6627:6-0] [FailoverAction::ErrorHandler]
Got fault while failing over vm. /vmfs/volumes/UUID/VM/VM.vmx: [N3Vim5Fault19PlatformConfigFaultE:0xecba148] (state = reconfiguring)

It seems that at the time of fail-over the “dvport” information cannot be loaded by HA as after the Storage vMotion process the dvport file is not created on the destination datastore. Now please note that this applies to all VMs attached to a VDS which have been Storage vMotioned using vCenter 5.0. However the problem will only be witnessed during time of HA fail-over.

This dvport info is what I mentioned in my “digging deeper into the VDS construct” article. I already mentioned there that this is what HA uses to reconnect the virtual machine to the Distributed vSwitch… And when files are moving around you can imagine it is difficult to power-on a virtual machine.

I reproduced the problem as shown in the following screenshot. The VM has port 139 assigned by the VDS, but on the datastore there is only a dvport file for 106. This is what happened after I simply Storage vMotioned the VM from Datastore-A to Datastore-B.

For now, if you are using a Distributed vSwitch and running a virtual vCenter Server and have Storage DRS enabled… I would recommend disable Storage DRS for vCenter specifically, just to avoid getting in these scenarios.

Go to Datastore & Datastore Clusters view, Edit properties on Datastore Cluster and change automation level:

The problem itself can be mitigated, as described by Michael Webster here, by simply selecting a different dvPortgroup or vSwitch for the virtual machine. After the reconfiguration has completed you can select the original portgroup again, this will recreate the dvport info on the datastore.

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 153
  • Page 154
  • Page 155
  • Page 156
  • Page 157
  • Interim pages omitted …
  • Page 336
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Also visit!

For the Dutch-speaking audience, make sure to visit RunNerd.nl to follow my running adventure, read shoe/gear/race reviews, and more!

Do you like Hardcore-Punk music? Follow my Spotify Playlist!

Do you like 80s music? I got you covered!

Copyright Yellow-Bricks.com © 2026 · Log in