At VMworld during one of my group discussions there was a discussion around using vSphere HA or vCenter Heartbeat to protect the vCenter Server. Coincidentally it is something that we recently discussed internally on Socialcast and I figured I would give my thoughts on this topic. My answer was short and simple: It depends.
Yes I bet some of you saw that coming… But let me elaborate. vCenter availability is crucial in my opinion when it comes to operating your environment. However your environment is not about vSphere. Your environment is not really about virtual machines. Your environment is about the services that you offer!
Your service level agreement typically is based on up-time of the service, makes sense right. No one really cares about the management platform, well I do and you do but your customers probably do not. Your customers care about the availability of their service.
Will their service have an interruption when vCenter is down is the question you will need to ask yourself. In most cases the answer will probably be no, and in those cases you will need to ask yourself what the downtime is you can afford from a management perspective. Is a minute or two okay? Than vSphere HA can help you and there is no need for Heartbeat or other complex clustering solutions. If a couple of minutes is not acceptable than Heartbeat is an option.
If there is a service interruption for the customer when vCenter is down (for instance in a test / dev cloud where provisioning processes are key, vCloud Director, View) you should consider using vCenter Heartbeat. Again, it all depends on your service level agreement. In some cases vCenter availability is crucial, in other cases a downtime of minutes is within the defined boundaries. The answer remains, it depends… it depends on your use case and service level agreement.
Steve Morris says
I completely agree. I have never deployed Heartbeat for server infrastructures as the availibility benefit over using HA has thus far not been worth the additional cost. However I have deployed it for VDI infrastructures where vCenter is critical for provisioning & management of desktops (the last deployment was XenDesktop with MCS, but same applies to VIew)
I’m using heartbeat for a couple of years now. In the beginning i was quite happy with it, but in the meantime I came to the point that i want to get rid of it soon. Reasons:
– overall, it’s a pretty complicated product and troubleshooting for non-experts is nearly impossible
– major updates in combination with a new vCenter version can get very complex and time consuming (going from identical to non-identical mode and upgrading to vCenter 5 took us 2 days)
– sometimes the system just reacts weird to disturbances and you’re having a hard time to get it up and running again.
The main advantage of it, is the monitoring of the services and the automated failover. Hard to do that with HA or otherwise. The service interruption during a downtime can still be quite long though – all services have to be started on the secondary node, and this can take a while.
Somehow reminds me of the very first Windows Cluster solutions, where you ended up having more outages _because_ you had a cluster 🙂 Just my personal experience of course…
Guess I will build something with a dedicated cluster, local storage and replication, to be independent from the rest of the environment and the shared storage.
Cristiano Schmidt says
I belive that the vCenter heart beat is very important into the specific cenario, when the customer have vCenter installed in Physical Machine and the SLA is considered to important to business. Based on this, we must drawing the better solution that according with SLA.
Recently I made a project and the vCenter in HA was not according with SLA, because the unique fact was the System Operation was the only fail point.
I think that the better solution must be according the SLA to business and how much the customer desire to spend with that solution.
We have a three point to check that always are together connected ,
how much we need to spent with available, security and integrity.
Luke Huckaba says
I use both. To protect vCenter within the 4-wall datacenter, it’s HA. For DR purposes, it’s Heartbeat.
I run active/active datacenters where a single production vCenter running in DC-1 manages both physical sites, and a second vCenter for SRM runs in DC-2 managing the other half of both physical sites. I need both vCenters accessible in the event of an outage at either datacenter.
This is where Heartbeat comes in, if I lose DC-1, I can activate the secondary Heartbeat vCenter in DC-2 so it can manage the production space at DC-2, while SRM talks to the second vCenter to bring up the failed DC-1’s production VMs.
I’m intrigued by your design. I too run active/active data centers, but I have seperate vCenters for each site running both management and SRM. It’s less than ideal. Could we talk offline about your design or is there an article on your site discussing how you set this up?
I too have a similar setup and was looking ways to do something similar.
Mike Moran says
In my scenario heartbeat is deployed in two of my datacenters to provide redundancy for host level backups. We once had an issue when the vc went down during a the backups. Netbackup snapshots were running at the time of the failure of the vc and the vmdk got corrupt.
I have question related to vCenter & distributed virtual switches lost configuration. We lost our vCenter & everything associated with vCenter configuration. I know we should keep backup & other method to protect vCenter considering the fact that it is critical.
Can we recover vCenter existing distributed switch configuration if we build new vCenter?
If we join new host to existing cluster it automatically get information related to vCenter distributed. What if we want to do a reverse process? Build a new vCenter and point it to existing host which was previously part of failed vCenter & if it has existing information related to distributed switches there should be a process to pull information back to new vCenter.
I may be thinking loud or may be process already existing, just need your advice.