Seva, a VMware Technical Account Manager, put together a cool table with the implications of a VirtualCenter crash. This is a follow up to my blog about VirtualCenter getting more important by the minute. I think the most important thing to remember is that the VM’s keep running whatever happens to your VC Server and HA will still work if VC fails, well except for adding hosts to the cluster of course. So reinstalling the VirtualCenter server and re-adding the hosts is still possible, but in my opinion not recommended. Especially when you’ve got complex Resource Pools and Folder structures set up.
Open this link to the PDF or click on the picture below.
Is the same true for ESX3i? My understanding is that the HA agent lives inside the Service Console, which of course does not exist in ESX3i.
My understanding (which may very well be wrong) was that ALL HA functionality is lost in an ESX3i environment without VC being available.
afaik that’s not true, HA runs within ESX and doesn’t need VC to function correctly. It does need VC to get it’s initial config info.
Actually, the new version of HA relies on VC for name resolution instead of DNS. But HA works for ESX and ESXi. I don’t know about HA completely failing if VC is lost, but THAT would stink, especially if VC is a VM on the failed host.
that’s what I meant. and it doesn’t really rely on VC, it just uses VC once to fill up the /etc/ft_hosts file.
Duncan – I hope I don’t take this the wrong way but this list seems to be a “Don’t panic if VC dies.”
Any thoughts on what the HA possibilities are for VC in the future or how to get some redundancy with it right now?
I have a second VC in one location because of latency issues but I would love to set that up as a child VC (even as a new datacenter) but have it visible within my one main authentication structure.
Well I guess it is more kind of a reality check.
I can’t elaborate on what the future holds because of an NDA.
Just to clarify some facts:
1. No, this list is not a message “Don’t panic if VC dies”. Those who understand it that way will be cursed for ever without any hope to salvation. 😉 This list should allow the VI administrator to design and implement recovery scenarios which include VirtualCenter downtime. For example, since the Update 2 we can clone the VM without powering it off. This we can clone once daily (or even twice or thrice) the virtual center VM (just a simple script) and in case of VC outage restart it manually. Look at the table, compare it with your SLA’s and decide, whether this scenario is feasible for you, or you need the to cluster VC with Neverfail software. (Yes, Neverfail is the preferred solution).The only bad thing which may guarantee you eternity in the hell on the coast of broken SLA is absence of any recovery scenario for VC. If you have one, even that with VC downtime you have hope to paradise.
2. When HA computes where to star the new VM it gets the performance data from viortual center. The VMap process is used for the communication between the HA agent and vpxa (and from vpxa to vpxd). When HA tries to start VM it first asks vpxd about the node ressources, then chooses the appropriate node to restart VM and then lets the local HA agent on that node to contact the local vpxa to start the VM. When VC is down HA is blind. It can only contact vpxa and let it start VM. The placement happens according to the latest (now possible obsolete) performance data. Theoretically someone cann connect directly to hosts with VIC and start some VMs or shut them down which changes node resources. This is a reason why I marked this function as degradated.
(VPX is the maiden name of the VirtualCenter Server, vpxa is agent, vpxd is daemon)
3. I don’t know how it is implemented in ESXi. May be all mentioned agents are running in context of primitive shell incorporated in ESXi, may be they are now something like helpers worlds. Sorry, I don’t know.
=Seva
That table is a great resource! Just one question… Suppose there were a situation with power where all the ESX hosts and VirtualCenter went offline. And when they come back up, the VirtualCenter service doesn’t start properly… This table seems to indicate that ESX should start with a 14-day trial license, but what about the VMs? Do they start automatically, or do you have to connect to the ESX server manually to start them? Does it depend on how HA is configured?
Thanks,
Your ESX hosts need the licensing service not the virtualcenter service. and the vm’s only start automatically if this is set. you’ll need to configure this on the ESX host! (config -> virtual machine / startup shutdown) by default this is disabled.
Great, thanks for the quick reply. Just to clarify, when does the 14-day trial license come into play? If the licensing service is unavailable? Will that prevent ESX server from starting?
Thanks,
When the license server fails indeed. ESX will start, but the VM’s won’t.
Ok, thanks, that helps clear up the process. 🙂
Nice article, congratulations,
But !
I have my VC in a VM (ESX3.5U1) and the whole network went down for 5 minutes.
All VM on all nodes shut down (Isolation response) but none went up when network back until I turn un the VC VM manually !
So Why ?
Best regards,
Thanks,
Cyril
Why? Well because all VM’s were shutdown by HA, the isolation response kicking in means that the startup will not occur automatically if all servers are going down. they are all waiting on eachother to start up the VM’s they closed down…
Thanks,
Your answer make sens but why when the LAN is back ON they don’t organise an election of a ‘Master’ ?
What is the différence when the VC is physical because when I started the VC VM, it took control of everything and everything ‘came back’ automtically.
Looks like HA agent of each ESX gave up after a few minutes trying to heartbit his friends.
Then it is the VC that reinitialized agents on every ESX one by one (logged on VC).
Am I right ?
Best regards,
Cyril
Does this list apply for vcenter 4.0 also?
Yes, i think it applies for Vcenter4.0 also.Am i correct
Duncan, do VMware maintain a list of functions you can/can’t do when vCenter is down?
Basically how did you create you list and will you be updating it in the future?
I haven’t seen an update to be honest and I posted this but did nit create it…
can i have the step for rebuilding my v-center server database if my v-center server database crash…Plz help me on this i am a bigenner in the field of Vmware.
Thanx in advance….
Waiting eagerly….
I would suggest contacting support as this is not easy to fix.
What happens to vMotion and DRS if VCenter server is down?
Sorry, I got the answer after checking the table. VMotion and DRS do not work if VCenter fails.
If VC goes does ,HA will still failover Vm,s to other ESX host (if a host got failed),then How HA will decide to which host it should failover the VM’s.
It picks the host with the largest amount of unreserved capacity
how do we know that still HA is working if VC is down.
is there any mathod to check is HA working or not….please explain.
Duncan,
This was posted in 2008. Has anything changed?
I think one additional consideration for the diagram above and the if/when-vcenter-server-crashes discussion is: is the distributed switch affected? will hosts w distributed switches continue to operate I just (obviously) won’t be able to add addl hosts to my vcenter/distributed switch if the vcenter server is down? thanks!