• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

Archives for 2009

Re: RTFM “What I learned today – HA Split Brain”

Duncan Epping · Jul 22, 2009 ·

I’m going to start with a quote from Mike’s article “What I learned today…“:

Split brain is HA situation where an ESX host becomes “orphaned” from the rest of the cluster because its primary service console network has failed. As you might know the COS network is used in the process of checking if an ESX host has suffered an untimely demise. If you fail to protect the COS network by giving vSwitch0 two NICs or by adding a 2nd COS network to say your VMotion switch, under-desired consequences can occour. Anyway, the time for detecting split brain used to be 15 seconds, for some reason this has changed to 12 seconds. I’m not 100% why, or if in fact the underlying value has changed – or that VMware has merely corrected its own documentation. You see its possible to get split brain in Vi3.5 happening if the network goes down for more than 12 seconds, but comes back up on the 13th, 14th or 15th second. I guess I will have to do some research on this one. Of course, the duration can be changed – and split brain is trivial matter if you take the neccessary network redundency steps…

I thought this issue was something that was common knowledge but if Mike doesn’t know about it my guess is that most of you don’t know about this. Before we dive into Mike’s article, technically this is not a split brain, it is an “orphaned vm” but not a scenario where the disk files and the in memory VM are split between hosts.

Before we start this setting is key in Mike’s example:

das.failuredetectiontime = This is the time period when a host has received no heartbeats from another host, that it waits before declaring the other host dead.

The default value is 15 seconds. In other words the host will be declared dead on the fifteenth second and a restart will be initiated by one of the primary hosts.

For now let’s assume the isolation response is “power off”. These VMs can only be restarted if the current VMs have been powered off. Here’s the clue, the “power off”(isolation response) will be initiated by the isolated host 2 seconds before the das.failuredetectiontime.

Does this mean that you can end up with your VMs being down and HA not restarting them?
Yes, when the heartbeat returns between the 13th and 15th second shutdown could already have been initiated. The restart however will not be initiated because the heartbeat indicates that the host is not isolated.

How can you avoid this?
Pick “Leave VM powered on” as an isolation response. Increasing the das.failuredetectiontime will also decrease the chances of running in to issues like these.

Did this change?
No, it’s been like this since it has been introduced.

Whitepaper: VMware vNetwork Distributed Switch

Duncan Epping · Jul 22, 2009 ·

I just noticed this great whitepaper on the Distributed Switch(vDS) and thought it might also be useful for you guys:

http://vmware.com/files/pdf/vsphere-vnetwork-ds-migration-configuration-wp.pdf

This guide is intended to help users understand the various scenarios and considerations for migration to the vNetwork Distributed Switch (vDS). It also includes a step-by-step guide on migration from a Standard Switch environment to a vDS environment.

SRDF SRA and the SPC-2 bit

Duncan Epping · Jul 21, 2009 ·

I was at a customer site helping out configuring SRM yesterday. During the configuration of the EMC SRDF SRA(Storage Replication Adapter) we ran into a weird issue. Although we could see the paired arrays with a green “okay sign” we did not see any replicated LUNs. First things I usually check in these cases are:

  1. Is the LUN already formatted as VMFS
  2. Does it hold any VMs

In this case we met both requirements. After checking all the configuration settings on the SRA side, SRM side and the SAN we noticed that the SPC-2 bit was not enabled. Of course we knew that is was a required setting according to the FC San Config Guide(page 57) but this is definitely not the kind of behavior I would expect to see when it’s disabled. Anyway I did a quick search on our internal mailing list and as it appears we were not the first to encounter these issues.

The SPC-2 bit is something that comes up every now and then, so if you’ve got EMC Symmetrix storage and you are not sure whether you have applied VMware’s recommendations please read the FC San Config Guide and avoid future problems. Please bare in mind though that when you set the SPC-2 bit you might and probably will need to re-signature the disk.

VMworld, just be there!

Duncan Epping · Jul 19, 2009 ·

Usually 1.5 month before VMworld all the “VMworld, why you should attend” posts start appearing. My title says it all, Just be there! But if you are still not sure why, here are a bunch of arguments you can use to convince yourself and your manager…

As most of you know Rick Scherer, Scott Lowe, Chad Sakac, Tom Howarth and myself will be hosting the Ask the Experts Panel Session(TA2259). This session will be a round-table alike session. You get the chance to ask questions regarding virtualization and designing virtual infrastructures or as we like to call it these days (internal) clouds. If you’ve got a question you can already ask them here, you can even provide us  diagrams if you feel that would help better understanding your environment!

Now of course this is not the only reason why you should attend VMworld 2009 in San Francisco. Besides the fact that San Francisco is one of the coolest cities in the US these are the reasons why I think you should visit VMworld 2009 in no particular order:

  • VMworld Party. Do I even need to explain this? The reputation this party has says enough. It’s more than worth it. Not only the VMworld party also companies like Veeam, Vizioncore etc will organize parties which are excellent to meet people! And what about the Warm Up Party? Be there!
  • VMware Technology Exchange Developer Day
    Something different for a change, a whole day based on developing tools, scripts etc! Make sure you attend this extra day it will be worth it!
  • Networking. No better place to do networking than VMworld. I’ve met so many great people at VMworld, heck I even had job interviews at VMworld. (Don’t tell your manager this :-))
  • Knowledge. The amount of info to be gathered at VMworld is just insane. I know the sessions will be available online but this will not give you the chance to discuss your experiences and ask your questions(Genius Booth, visit it!). Isn’t it awesome that all the top virtualization experts are walking around and you can actually walk up to them and ask them that one thing you always wanted to know!?! I highly recommend the following sessions:
    • BC1500 – Lee Dilworth – vCenter Site Recovery Manager “Up and Running” – Best Practices & Avoiding the Pitfalls
    • BC3197 – Marc Sevigny – High Availability – Internals and Best Practices
    • TA1394 – Mostafa Khalil – vSphere 4.0 Advanced Storage Log Analysis
    • TA2525 – Srinivas Neginhal – vShpere 4 Networking Deep Dive
    • TA2627 – Kit Colbert – Understanding “Host” and “Guest” Memory Usage and Other Memory Management Concepts
    • TA2963 – Krishna Raj Raja – esxtop for advanced users
    • TA4341 – Boon Seong Ang – Virtual Network Performance
    • TA2942 – Bhavjit Walha – Performance Best Practices
    • TA2650 – Hal Rottenberg/Luc Dekens – Take PowerCLI to the Next Level
    • TA2647 – Chad Sakac – Best Practices to Increase Availability and Throughput for the Future of VMware
    • VM2241 – Carter Shanklin/Scott Herold – Managing vSphere with VMware PowerCLI

Drop your manager an email and ask for his approval and start booking! Don’t forget to visit the VMTN/Community Booth. Most of the bloggers, including myself, can be found there. Come by and say hi!

Up to 80 virtual machines per host in an HA Cluster (3.5 vs vSphere)

Duncan Epping · Jul 16, 2009 ·

I was re-reading the KB article on how to improve HA scaling. Apparently pre vCenter 2.5 U5 there was a problem when the amount of VMs that needs to fail-over exceeds 35. Keep in mind that it’s a soft limit, you can run more than 35 VMs on a single host in a HA cluster if you want to though.

To increase scalability up to 80VMs per host vCenter needs to be upgraded to 2.5 U5 and the following configuration changes are recommended:

  1. To increase the maximum vCPU limit to 192
  2. To increase the Service Console memory limit to 512 MB.
  3. To increase the memory resource reservation of the vim resource pool to 1024 MB.
  4. To include/edit the host agent memory configuration values. (hostdStopMemInMB=380 and hostdWarnMemInMB=300)
A question that I immediately had was what about vSphere. What are the values for vSphere and do I need to increase them as well? Here are the vSphere default settings:
  1. 512
  2. 300MB
  3. 0 MB
  4. hostdStopMemInMB=380 and hostdWarnMemInMB=300

As you can see 1 and 4 are already the new default on vSphere. I would always recommend to set the Service Console memory to 800MB. With most hosts having 32GB or more the costs of assigning an extra 500MB to the Service Console is minimal. That leaves the recommendation to increase the memory reservation for the vim resource pool. I would recommend to leave it set to the default value. vSphere scales up to 100 VMs per host in a HA cluster and chances are that this will be increased when U1 hits the streets. (These values usually change with every release.)

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 29
  • Page 30
  • Page 31
  • Page 32
  • Page 33
  • Interim pages omitted …
  • Page 85
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Also visit!

For the Dutch-speaking audience, make sure to visit RunNerd.nl to follow my running adventure, read shoe/gear/race reviews, and more!

Do you like Hardcore-Punk music? Follow my Spotify Playlist!

Do you like 80s music? I got you covered!

Copyright Yellow-Bricks.com © 2026 · Log in