• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

vSphere HA Waiting for cluster election to complete Operation timed out?

Duncan Epping · Jan 4, 2012 ·

I noticed this thread on the VMTN communtity which discussed a time-out during a cluster election process. The one thing all scenarios described in the topic is that they upgraded from 4.1 to 5.0 or 5.0 base to a higher patch level. Marc Sevigny posted in the same thread that it is a known issue which the HA team is currently investigating…

After an upgrade, under conditions we’re still investigating, an error is occurring when issuing a start request of the HA service on the upgraded host.  When that fails, HA then tries to re-install HA, and the re-install does nothing because the service is already there (and the right version) but we’re left without an HA service running.

This is the way to fix it if you are experiencing this issue. Now, if you do experience this issue please report it to VMware and submit log files as that will help the HA team fixing the problem.

  1. Place host into Maintenance Mode
  2. Take a copy of /opt/vmware/uninstallers/VMware-fdm-uninstall.sh (we copied to /tmp)
  3. From the location you made a copy of the file, run the command (./VMware-fdm-uninstall.sh)
  4. You should see a short pause before it gets back to the prompt (you’ll see why I mention this below)
  5. Exit host out of Mainenance Mode and within the “Recent Tasks” area you should see the client being pulled from vCenter and installing

Share it:

  • Tweet

Related

Server 5.0, ha, troubleshooting, VMware, vSphere

Reader Interactions

Comments

  1. Kevin says

    5 January, 2012 at 00:14

    I am experiencing this same issue, but am very new to VMware. I’m assuming that I can do these steps using the PowerCLI but I do not know how. Could you please provide some quick guidance? I can connect to the host, but cannot figure out how to access the file system. The prompt is on my local computer. Do I use a cmdlet to do this?

    Thanks for any help.

  2. Kevin says

    5 January, 2012 at 00:20

    Got it. Used Putty instead of PowerCLI.

  3. TBailey says

    6 January, 2012 at 21:04

    It has definitely been an odd occurrence for us…

  4. MarcS says

    19 January, 2012 at 16:11

    If anyone is seeing this issue, it would be helpful to look at the ESXi host logs after the failure to see if the issue is the same as the known problem. The known problem has the following signature in the /var/run/log/hostd.log (or it may have been zipped into a hostd.[x].gz if too much time has elapsed):

    2011-12-30T17:12:16.511Z [25B03B90 info ‘TaskManager’ opID=701D770F-000013A5-75-76] Task Completed : haTask-ha-host-vim.host.ServiceSystem.updatePolicy-19463 Status error
    2011-12-30T17:12:16.512Z [25F10B90 info ‘TaskManager’ opID=SWI-ec15c64b] Task Completed : haTask-ha-host-vim.host.FirewallSystem.disableRuleset-19465 Status success
    2011-12-30T17:12:16.512Z [25B03B90 info ‘Vmomi’ opID=701D770F-000013A5-75-76] Activation [N5Vmomi10ActivationE:0x70a3978] : Invoke done [updatePolicy] on [vim.host.ServiceSystem:serviceSystem]
    2011-12-30T17:12:16.512Z [25B03B90 verbose ‘Vmomi’ opID=701D770F-000013A5-75-76] Arg id:
    –> “vmware-fdm”
    2011-12-30T17:12:16.512Z [25B03B90 verbose ‘Vmomi’ opID=701D770F-000013A5-75-76] Arg policy:
    –> “off”
    2011-12-30T17:12:16.512Z [25B03B90 info ‘Vmomi’ opID=701D770F-000013A5-75-76] Throw vmodl.fault.SystemError
    2011-12-30T17:12:16.512Z [25B03B90 info ‘Vmomi’ opID=701D770F-000013A5-75-76] Result:
    –> (vmodl.fault.SystemError) {
    –> dynamicType = ,
    –> faultCause = (vmodl.MethodFault) null,
    –> reason = “”,
    –> msg = “”,
    –> }

    If you don’t see anything like this in your host’s /var/run/log directory, then you may have a different issue and should open a support request so we can look into it.

  5. MarcS says

    19 January, 2012 at 16:14

    In the above log entry, I should have removed

    2011-12-30T17:12:16.512Z [25F10B90 info ‘TaskManager’ opID=SWI-ec15c64b] Task Completed : haTask-ha-host-vim.host.FirewallSystem.disableRuleset-19465 Status success

    That is another task that may or may not be in your log file around the time of the error, and has no relevance to the issue.

  6. MarcS says

    19 January, 2012 at 17:22

    Another thing to check if you experience this error is to see if you have jumbo frames enabled on the management network, since this interferes with HA communication.

  7. cwjking says

    19 January, 2012 at 18:51

    This is going to be something I am going to have to bookmark for sure. Moving to v5 over here soon..

  8. Maximiliano says

    23 January, 2012 at 15:27

    Hi!! I had the same issue, but i can’t find the path /opt/vmware/uninstallers/VMware-fdm-uninstall.sh
    In the /opt/vmware i only have the folder vpxa.

    I reinstalled a new vsphere 5 in a test environment, searched the folder and wasn’t there.

    Anyone know why???

  9. txolson says

    6 April, 2012 at 18:21

    The “fix” did not work for me, but disabling HA at a cluster level, then re-enabling did work.

  10. marcel says

    21 May, 2013 at 05:38

    +1 The “fix” did not work for me, but disabling HA at a cluster level, then re-enabling did work.

  11. Ashley Smoot says

    3 July, 2013 at 17:26

    At different times I got this exact same issue with two ESX 5.0 U2 hosts that were nuke and pave new installs with all updates and patches. This procedure didn’t work in my case. Removing the host from the reconfiguring for HA, removing and re-adding to cluster, restarting mgmt services and a reboot did not work. Removing from vCenter, then re-adding to vCenter did fix the issue. All is well with both now.

  12. Ashley Smoot says

    3 July, 2013 at 20:06

    Correcting typo — Above comment in sentence 2 should read: “Removing the host from cluster, reconfiguring for HA,…”

  13. akın akalan says

    25 December, 2013 at 00:37

    for this error please disable anti-ddos protection from your switch or router.

  14. hiney says

    12 August, 2014 at 09:03

    needed to remove the host from the VC and readd it before it worked, but it finally did. tried lots of things before this. thanks very much.

    My issue appeared after recreating the rui.key and rui.crt. so i’m not sure if i cause it or not. HostReconnect.pl didn’t fix it, so maybe i didn’t.

    P

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive", the “vSphere Clustering Technical Deep Dive” series, and the host of the "Unexplored Territory" podcast.

Upcoming Events

Feb 9th – Irish VMUG
Feb 23rd – Swiss VMUG
March 7th – Dutch VMUG
May 24th – VMUG Poland
June 1st – VMUG Belgium

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2023 · Log in