Someone asked this question last week when I posted my “back to basics” vSphere Replication blog. I guess protecting vCenter Server isn’t too difficult but how about recovering it after a failure?
Those who have used vSphere Replication know that you need vCenter Server to click “Recover”. In a dual vCenter Server configuration that is not a problem. But what if you just want to protect your vCenter Server virtual machine and replicate it to a second piece of storage. I tested this and then “killed” my vCenter Server. How do I get my vCenter Server up and running again from this replica?
Let me start by saying that this is unsupported as far as I know. So lets start by checking the folder in which the replica of the vCenter Server resides:
8.5K Sep 21 09:46 hbrcfg.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.6.nvram.18 3.4K Sep 21 09:46 hbrcfg.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.6.vmx.16 267 Sep 21 09:46 hbrcfg.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.6.vmxf.17 124.0K Sep 21 09:46 hbrdisk.RDID-9786ae39-cd3a-4773-be63-cd1bc3641d59.14.175750085646519-delta.vmdk 379 Sep 21 09:46 hbrdisk.RDID-9786ae39-cd3a-4773-be63-cd1bc3641d59.14.175750085646519.vmdk 52.0K Sep 21 09:46 hbrdisk.RDID-ae17cfad-c8d8-460c-99a1-8f26ff1133b9.13.43820857661344-delta.vmdk 375 Sep 21 09:46 hbrdisk.RDID-ae17cfad-c8d8-460c-99a1-8f26ff1133b9.13.43820857661344.vmdk 4.1K Sep 21 09:46 hbrgrp.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.txt 25.0G Sep 21 09:46 vcenter-tm01-flat.vmdk 473 Sep 21 09:46 vcenter-tm01.vmdk 60.0G Sep 21 09:46 vcenter-tm01_1-flat.vmdk 476 Sep 21 09:46 vcenter-tm01_1.vmdk
As you can see the folder contains a lot of files we are familiar with… Especially the vmdk files and the vmx files is something we can work with. So how would we get this vcenter up and running. Lets look at the vmxf file first as that will reveal the original name of the vmx file:
<vmxPathName type="string">vcenter-tm01.vmx</vmxPathName></VM></Foundry>
Next I am going to copy the “.nvram”, “.vmx” and “.vmxf” file and give them the name “vcenter-tm01.nvram” etc.
cp hbrcfg.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.6.vmxf.17 vcenter-t vcenter-tmp.vmxf
So now I have all the files I need with the right name… Next I will first “unregister” the original vCenter Server virtual machine… just to avoid any weird issues. I list all the virtual machines registered against this host first:
vim-cmd /vmsvc/getallvms
Now that I have the “vmid” I can unregister the original virtual machine:
vim-cmd /vmsvc/unregister <vmid>
Now that the original virtual machine is removed unregistered from the host, I should be able to register the “new” vCenter Server virtual machine… aka the replica.
vim-cmd /solo/register /vmfs/volumes/4f228789-84f6b84c-e17e-984be1047b16/vcenter-tm01/vcenter-tm01.vmx
Lets break that one down just to be clear:
vim-cmd /solo/register /path/to/vmxfile/filename.vmx
This command will return the “vmid” of the virtual machine we just registered. Now we can power it on…
vim-cmd /vmsvc/power.on
Now it sits there for a while, and when I log in with the vSphere Client and check the host it is running on I see this message that says “the virtual machine might have been moved or copied…”, I answer it by saying that is was copied and now the vCenter virtual machine boots up and I can login again. Yes there is an orphaned vCenter Server instance there, and you will need to clean that up… also there might be some obsolete files in the folder of this replica, and you might want to clean those up as well. Anyway, the vCenter Server virtual machine is up and running again, and that was the goal of this exercise right 🙂
Andreas Peetz says
Good write-up, Duncan.
Just one question: Wouldn’t it be better to answer the “moved or copied” question with “I moved it” to leave its UUID unchanged? Or will this cause any problems?
Thanks, Andreas
Karl Eveleigh says
Duncan – As always, great write up, thanks.
Tim says
I was wondering this exact question yesterday. I’m pretty annoyed that VMware did not anticipate this exact scenario. I don’t understand why they don’t just have the VM ready to be registered to the inventory and started up as a failsafe. In any case thank you for this.
Jeremy says
Hi, could anyone please provide detailed commands to recover VM from a replica with sphere replication 5.1. I am having problems to re-create the instructions provided by Duncan Epping
Serge Meeuwsen says
Nice write up and exactly what I was looking for for a 2 host vSphere 5.1 installation with VSA (with a 3rd small host running the Tie Breaker code). I understand this is unsupported, but it nicely solves the problem of making sure that you can recover vCenter when the host it runs on dies.
I think you can improve this slightly by incorporating the answering of the “Did you move this virtual machine” message on the commandline with the instructions given here:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1026835
I’m gonna see if I can also do all of the commandline stuff with a PowerCLI script. Again Thanks for the pointers!
Cheers
Eric Smith says
Great write-up – question I’m struggling with .. how do you test a recovery while leaving source VM online? My team has a monthly checklist item where we recover, power-up all replicated VMs to ensure integrity,etc.
Would the following work assuming two seperate vCenter environments where VMs cannot talk to each other?
1. Disable replication
2. Rename files as noted above
3. Register on second vCenter
4. Snapshot
5. Power-on to test
6. Revert to snapshot
7. Unregister
8. Re-enable replication with pre-seed (????)
Jack says
The simple answer to this question Eric, is SRM – it’s designed to allow you to test your fail-over process.
SRM has a test option that brings up your replicas either in a disconnected virtual network (or a separate one if you have a need for physical connectivity in DR) all without impacting the running replication – so if you have a real disaster whilst you’re testing, you’re still covered.
It also supports “fail-back” – so you can rebuild “Live” and replicate to it, then fail-back when everything is ready.
It also gives you a report you can print / export at the end and hand to your boss / auditor / insurer as proof of the test and its’ success 🙂
Hope this helps!!
Jack
Aram says
Hi,
I have tried this. It works but problem is that there is no replication site after vcenter boot. This makes other VM recovery imposible trough web console. Any ideas how to fix this?
Thanks.