Can I protect my vCenter Server with vSphere Replication?

Someone asked this question last week when I posted my “back to basics” vSphere Replication blog. I guess protecting vCenter Server isn’t too difficult but how about recovering it after a failure?

Those who have used vSphere Replication know that you need vCenter Server to click “Recover”. In a dual vCenter Server configuration that is not a problem. But what if you just want to protect your vCenter Server virtual machine and replicate it to a second piece of storage. I tested this and then “killed” my vCenter Server. How do I get my vCenter Server up and running again from this replica?

Let me start by saying that this is unsupported as far as I know. So lets start by checking the folder in which the replica of the vCenter Server resides:

  8.5K Sep 21 09:46 hbrcfg.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.6.nvram.18
  3.4K Sep 21 09:46 hbrcfg.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.6.vmx.16
   267 Sep 21 09:46 hbrcfg.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.6.vmxf.17
124.0K Sep 21 09:46 hbrdisk.RDID-9786ae39-cd3a-4773-be63-cd1bc3641d59.14.175750085646519-delta.vmdk
   379 Sep 21 09:46 hbrdisk.RDID-9786ae39-cd3a-4773-be63-cd1bc3641d59.14.175750085646519.vmdk
 52.0K Sep 21 09:46 hbrdisk.RDID-ae17cfad-c8d8-460c-99a1-8f26ff1133b9.13.43820857661344-delta.vmdk
   375 Sep 21 09:46 hbrdisk.RDID-ae17cfad-c8d8-460c-99a1-8f26ff1133b9.13.43820857661344.vmdk
  4.1K Sep 21 09:46 hbrgrp.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.txt
 25.0G Sep 21 09:46 vcenter-tm01-flat.vmdk
   473 Sep 21 09:46 vcenter-tm01.vmdk
 60.0G Sep 21 09:46 vcenter-tm01_1-flat.vmdk
   476 Sep 21 09:46 vcenter-tm01_1.vmdk

As you can see the folder contains a lot of files we are familiar with… Especially the vmdk files and the vmx files is something we can work with. So how would we get this vcenter up and running. Lets look at the vmxf file first as that will reveal the original name of the vmx file:

<vmxPathName type="string">vcenter-tm01.vmx</vmxPathName></VM></Foundry>

Next I am going to copy the “.nvram”, “.vmx” and “.vmxf” file and give them the name “vcenter-tm01.nvram” etc.

cp hbrcfg.GID-d69c6cad-42a5-474a-86c4-c3158d1a3b42.6.vmxf.17 vcenter-t 
vcenter-tmp.vmxf

So now I have all the files I need with the right name… Next I will first “unregister” the original vCenter Server virtual machine… just to avoid any weird issues. I list all the virtual machines registered against this host first:

vim-cmd /vmsvc/getallvms

Now that I have the “vmid” I can unregister the original virtual machine:

vim-cmd /vmsvc/unregister <vmid>

Now that the original virtual machine is removed unregistered from the host, I should be able to register the “new” vCenter Server virtual machine… aka the replica.

vim-cmd /solo/register /vmfs/volumes/4f228789-84f6b84c-e17e-984be1047b16/vcenter-tm01/vcenter-tm01.vmx

Lets break that one down just to be clear:

vim-cmd /solo/register /path/to/vmxfile/filename.vmx

This command will return the “vmid” of the virtual machine we just registered. Now we can power it on…

vim-cmd /vmsvc/power.on

Now it sits there for a while, and when I log in with the vSphere Client and check the host it is running on I see this message that says “the virtual machine might have been moved or copied…”, I answer it by saying that is was copied and now the vCenter virtual machine boots up and I can login again. Yes there is an orphaned vCenter Server instance there, and you will need to clean that up… also there might be some obsolete files in the folder of this replica, and you might want to clean those up as well. Anyway, the vCenter Server virtual machine is up and running again, and that was the goal of this exercise right :-)

Say goodbye to the “Transfer LUN” aka “Swing LUN” aka “Stepping Stone”

Every once in a while I go through some articles and see if they need to be revised or not. As there are over 1400 articles on yellow-bricks.com that is not an easy task, I can tell you that. Today I stumbled on this article I wrote early 2010. This article discussed the use of a “swing lun” to limit the amount of LUNs masked to a single host. Let me copy/paste the part that I want to revise:

In my design I usually propose a so called “Transfer Volume”. This Volume(NFS or VMFS) can be used to transfer VMs to a different cluster. Yes there’s a slight operational overhead here, but is also reduces overhead in terms of traffic to a LUN and decreases the chance of scsi reservation conflicts etc.

Here’s the process:

  1. Storage VMotion the VM from LUN on Array 1 to Transfer LUN
  2. VMotion VM from Cluster A to Cluster B
  3. Storage VMotion the VM from Transfer LUN to LUN on Array 2

Of course these don’t necessarily need to be two separate arrays, it could just as easily be a single array with a group of LUNs masked to a particular cluster. For the people who have a hard time visualizing it:

I guess this is a great example of why you need to revise your design with every release… This used to be a valid workaround to limit the amount of LUNs attached to a Cluster while maintaining the flexibility to move between clusters using Storage vMotion and vMotion. With vSphere 5.1 that is no longer needed now that we have enhanced functionality for vMotion. (Frank has an awesome vMotion deepdive… read it) Make sure to update your design and make the needed changes to your infrastructure if and when required…

A host has failed, which VMs were impacted and restarted by HA?

Someone asked me a question a while back and I figured it was time to write it down… Or in this case to record a video. The vSphere Web Client is a powerful tool when it comes to finding events and problems. This video shows how you can use the vSphere Web Client to figure out which virtual machines were impacted by a host failure and restarted by HA. On top of that I also show you how you can use PowerCLI to list all virtual machines that were restarted recently by HA. No I didn’t write that PowerCLI blurb myself, I elegantly stole it from the infamous PowerCLI guru Jonathan Medd. So if you need the blurb, hit his article and check the “update 2″ section as it contains the code for vSphere 5.0 and up. (I tested it on 5.1 and it works as you can see in the video.)

Enabling PDL enhancements in a non-stretched environment?

I received two questions on the same topic last week. The question was around using the PDL enhancements in a non-stretched environment… does it make sense? The question was linked to a scenario where for instance a storage admin makes a mistake and removes access for a specific host to a LUN. For those who don’t know what a PDL is read this article, but in short it is a SCSI sense code issued by an array when it believes storage will be permanently unavailable.

First of all, the vSphere HA advanced option “das.maskCleanShutdownEnabled” is enabled by default as of vSphere 5.1. In other words, HA is going to assume a virtual machine needs to be restarted when it is powered and isn’t able to update the config files. (Config files contain the details about the shutdown state normally, was it an admin initiated shutdown?)

Now, one thing to note is that “disk.terminateVMOnPDLDefault” is not on by default. If this setting is not explicitly enabled then the virtual machine will not be killed and HA won’t be able to take action. In other words, if your storage admin changes the presentation of your LUNs and removes a host accidentally the virtual machine will just sit there without access to disk. The OS might fail at some point, your application will definitely not be happy, but this is it.

To answer the question, yes even in a non-stretched environment it makes sense to enable both disk.terminateVMOnPDLDefault and das.maskCleanShutdownEnabled. Virtual machines will be automatically restarted by HA if they are killed by the VMkernel when a PDL has been detected.

Protecting vCenter Server – HA or Heartbeat?

At VMworld during one of my group discussions there was a discussion around using vSphere HA or vCenter Heartbeat to protect the vCenter Server. Coincidentally it is something that we recently discussed internally on Socialcast and I figured I would give my thoughts on this topic. My answer was short and simple: It depends.

Yes I bet some of you saw that coming… But let me elaborate. vCenter availability is crucial in my opinion when it comes to operating your environment. However your environment is not about vSphere. Your environment is not really about virtual machines. Your environment is about the services that you offer!

Your service level agreement typically is based on up-time of the service, makes sense right. No one really cares about the management platform, well I do and you do but your customers probably do not. Your customers care about the availability of their service.

Will their service have an interruption when vCenter is down is the question you will need to ask yourself. In most cases the answer will probably be no, and in those cases you will need to ask yourself what the downtime is you can afford from a management perspective. Is a minute or two okay? Than vSphere HA can help you and there is no need for Heartbeat or other complex clustering solutions. If a couple of minutes is not acceptable than Heartbeat is an option.

If there is a service interruption for the customer when vCenter is down (for instance in a test / dev cloud where provisioning processes are key, vCloud Director, View) you should consider using vCenter Heartbeat. Again, it all depends on your service level agreement. In some cases vCenter availability is crucial, in other cases a downtime of minutes is within the defined boundaries. The answer remains, it depends… it depends on your use case and service level agreement.

My vCenter Server 5.1 appliance crashed and I was using VDP… now what?

I had this question this week, what if I am using vSphere Data Protection (VDP) and my vCenter Server appliance (VCVA) crashes… well lets just test it.

I just killed my vCenter Server appliance and deleted if from disk. Next step is to get a brand new vCenter Server appliance up and running. So I deploy a brand new VCVA first. As I have pointed my vSphere Client directly to a host I will need to login to the commandline to configure my networking, you can use vami_config_net but also Yast.

/opt/vmware/share/vami/vami_config_net

Next I go through the regular setup and configuration steps. Create a Datacenter and a Cluster and add some hosts. Now I see my VDP appliance again in my inventory… but I don’t see those nice shiny VDP icons. So how do I get those back? Well that is simple, just register the appliance to the new vCenter Server:

  • Point your browser to the VDP configuration web page
    https://<ip address or name of vdp appliance>:8543/vdp-configure/
  • Click on the “configuration” tab
  • Click on the lock to unlock the config
  • Now enter your appliance password
  • Provide the new vCenter Server details (in my case they are the same as the old so I just provide the password of the vCenter Server appliance)
  • Reboot the VDP appliance
  • Reboot the vCenter Server appliance

Now open up the Web Client and …

  • Click the “vSphere Data Protection” option in the left pane of your Web Client
  • If you see the “Not Connected” status, click “Connect”
  • That is it… now you can restore VMs again

 

Back to Basics: Install, configure and use vSphere Data Protection

Installing vSphere Data Protection is just a couple of steps. I downloaded the vSphere Replication virtual appliance. Note there are three different versions available and depending on how large your environment is you will need to select a version. I selected the 0.5TB version as I have a limited amount of virtual machines. This is how you import it and configure it, but before you begin I recommend ensuring DNS records are created before deploying the appliance!

  • Open the Web Client
  • Go to your cluster under “vCenter” —> “Hosts and Clusters”.
  • Right click the cluster object and click “All vCenter Actions” —> “Deploy OVF Template”
  • As a source I select the ova file I downloaded, now click “Next”
  • Validate the details and click “Next”
  • If you agree “Accept” the EULA and click “Next”
  • Enter the “Name” of the virtual machine and select the “Folder” this virtual machine will needs to be placed in and click “Next”
  • Select the “Datastore” it needs to be provisioned to and click “Next”
  • Select the “Network” it needs to be connected to and click “Next”
  • [Read more...]