• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

Cool new HA feature coming up to prevent a split brain situation!

Duncan Epping · Mar 29, 2010 ·

I already knew this was coming up but wasn’t allowed to talk about it. As it is out in the open on the VMTN community I guess I can talk about it as well.

One of the most common issues experienced with VMware HA is a split brain situation. Although currently undocumented, vSphere has a detection mechanism for these situations. Even more important the upcoming release ESX 4.0 Update 2 will also automatically prevent it!

First let me explain what a split brain scenario is, lets start with describing the situation which is most commonly encountered:

4 Hosts – iSCSI / NFS based storage – Isolation response: leave powered on

When one of the hosts is completely isolated, including the Storage Network, the following will happen:

Host ESX001 is completely isolated including the storage network(remember iSCSI/NFS based storage!) but the VMs will not be powered off because the isolation response is set to “leave powered on”. After 15 seconds the remaining, non isolated, hosts will try to restart the VMs. Because of the fact that the iSCSI/NFS network is also isolated the lock on the VMDK will time out and the remaining hosts will be able to boot up the VMs. When ESX001 returns from isolation it will still have the VMX Processes running in memory. This is when you will see a “ping-pong” effect within vCenter, in other words VMs flipping back and forth between ESX001 and any of the other hosts.

As of version 4.0 ESX(i) detects that the lock on the VMDK has been lost and issues a question if the VM should be powered off or not. Please note that you will(currently) only see this question if you directly connect to the ESX host. Below you can find a screenshot of this question.

With ESX 4 update 2 the question will be auto-answered though and the VM will be powered off to avoid the ping-pong effect and a split brain scenario! How cool is that…

Share it:

  • Tweet

Related

BC-DR, Server ESX, esxi, ha, Storage, vSphere

Reader Interactions

Comments

  1. Jason Boche says

    29 March, 2010 at 14:35

    “With ESX 4 update 2 the question will be auto-answered though and the VM will be powered off to avoid the ping-pong effect and a split brain scenario!”

    Is it a configurable option to auto-answer the question? In other words, if a customer does not want the question auto-answered, can that behavior be toggled?

  2. Rob Mokkink says

    29 March, 2010 at 15:03

    I allways leave the response power off vm, because i use FC based storage.

  3. Arnim van Lieshout says

    29 March, 2010 at 15:19

    What will happen to the automatically powered off vm?
    Will it be automatically deregistered from the host that failed?

  4. David Owen says

    29 March, 2010 at 15:34

    Great top see this feature. Its not the most common issue in the world but was always somthing that was in the back of my mind when deploying HA.

  5. Frank Brix Pedersen says

    29 March, 2010 at 16:01

    How do you directly connect to the ESX host if the network is lost and the host is isolated? 😉 I think it is great it auto answers yes.

  6. Arkadiusz Krowczynski says

    29 March, 2010 at 18:53

    Any timetable when Update 2 will arrive for us?

  7. Johan says

    30 March, 2010 at 07:03

    If you are on iSCSI/NFS why not set the isolation address to the storage and let them power off ? What would be better would be if HA would check the isolationaddress first and then the other hosts (in the case of iscsi/nfs) so if iscsi/nfs is dead just power off the vm’s so another host can power them on with out the risk for duplicated vm’s

  8. Johan says

    30 March, 2010 at 07:05

    Frank: By having several sc/management port’s one on the iscsi/nfs network and one on the normal management network.

  9. Frank Brix Pedersen says

    30 March, 2010 at 09:45

    Johan: If that was the case your host would never be isolated in the first case.

    It was purely a rhetorical question

  10. Johan says

    30 March, 2010 at 10:09

    Frank: well it depends what isolation network you choose…

  11. Paul Geerlings says

    30 March, 2010 at 13:02

    Is that why they set the default isolation response on ESX4 HA back to “Shutdown”?
    Will they be changing the default again with ESX4 Update 2 ?

  12. rotary laser levels says

    7 October, 2010 at 17:39

    I was’nt sure I would like this site since it was about Cool new HA feature coming up to prevent a split brain situation! » Yellow Bricks but I was wrong and thought it was cool and found it on AOL . Thanks and I’ll be back as you update.

  13. Craig says

    20 October, 2011 at 16:05

    How do the hosts handle a split brain when the underlying storage is FC ? Or in the case where the Storage network is still available ?

    • Duncan Epping says

      20 October, 2011 at 17:55

      Then the VMDK would be locked and the VM wouldn’t be restarted so a split brain can’t occur.

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive", the “vSphere Clustering Technical Deep Dive” series, and the host of the "Unexplored Territory" podcast.

Upcoming Events

29-08-2022 – VMware Explore US
07-11-2022 – VMware Explore EMEA
….

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2022 · Log in