Stretched vCloud Director infrastructure

A while back I wrote about design considerations when designing or building a stretched vCloud Director infrastructure. Since then I have been working on a document in collaboration with Lee Dilworth, and this document should be out soon hopefully. As various people have asked for the document I decided to throw it in to this blog post so that the details are already out there.

** Disclaimer: this article has not been reviewed by the technical marketing team yet, this is a preview of what will possibly be published. When the official document is published I will add a link to this article **

Introduction

VMware vCloud® Director™ 5.1 (vCloud Director) gives enterprise organizations the ability to build secure private clouds that dramatically increase datacenter efficiency and business agility. Coupled with VMware vSphere® (vSphere), vCloud Director delivers cloud computing for existing datacenters by pooling vSphere virtual resources and delivering them to users as catalog-based services. vCloud Director helps you build agile infrastructure-as-a-service (IaaS) cloud environments that greatly accelerate the time-to-market for applications and responsiveness of IT organizations.

Resiliency is a key aspect of any infrastructure but is even more important in “Infrastructure as a Service” (IaaS) solutions. This solution overview was developed to provide additional insight and information in how to architect and implement a vCloud Director based solution on a vSphere Metro Storage Cluster infrastructure.

Architecture Introduction

This architecture consists of two major components. The first component is the geographically separated vSphere infrastructure based on stretched storage solution, here after referred to as the vSphere Metro Storage Cluster (vMSC) infrastructure. The second component is vCloud Director.

Note –  Before we dive in to the details of the solution we would like to call out the fact that vCloud Director is not site aware. If incorrectly configured availability could be negatively impacted in certain failure scenarios.

[Read more...]

Some questions about Stretched Clusters with regards to power outages

Today I received an email about the vSphere Metro Storage Cluster paper I wrote, or better said about stretched clusters in general. I figured I would answer the questions in a blog post so that everyone can chip in / read etc. So lets show the environment first so that the questions are clear. Below is an image of the scenario.

Below are the questions I received:

If a power outage occurs at Frimley the 2 hosts get a message by the UPS that there is a power outage. After 5 minutes (or any other configured value) the next action should start. But what will be the next action? If a scripted migration to a host at Bluefin starts, will DRS move some VMs back to Frimley? Or could the VMs get a mark to stick at Bluefin? Should the hosts at Frimley placed into Maintenance mode so the migration will be done automatically? And what happens if there is a total power outage both at Frimley and Bluefin? How a controlled shutdown across hosts could be arranged?

Lets start breaking it down and answer where possible. The main question is how do we handle power outages. As in any datacenter this is fairly complex. Well the powering-off part is easy, powering everything on in the right order isn’t. So where do we start? First of all:

  1. If you have a stretched cluster environment and, in this case, Frimley data center has a power outage, it is recommended to place the hosts in maintenance mode. This way all VMs will be migrated to the Bluefin data center without disruption. Also, when power returns it allows you to do check on the host before introducing them to the cluster again.
  2. If maintenance mode is not used and a scripted migration is done virtual machines will be migrated back probably by DRS. DRS is triggered every 5 minutes (at a minimum). Avoid this, use maintenance mode!
  3. If there is an expected power outage and the environment is brought down it will need to be manually powered on in the right order. You can also script this, but a stretched cluster solution doesn’t cater for this type of failure unfortunately.
  4. If there is an unexpected power outage and the environment is not brought down then vSphere HA will start restarting virtual machines when the hosts come back up again. This will be done using the “restart priority” that you can set with vSphere HA. It should be noted that the “restart priority” is only about the completion of the power-on task, not about the full boot of the virtual machine itself.

I hope that clarifies things.

What is das.maskCleanShutdownEnabled about?

I had a question today around what the vSphere HA option advanced setting das.maskCleanShutdownEnabled is about. I described why it was introduced for Stretched Clusters  but will give a short summary here:

Two advanced settings have been introduced in vSphere 5.0 Update 1 to enable HA to fail-over virtual machines which are located on datastores which are in a Permanent Device Loss state. This is very specific to stretchec cluster environments. The first setting is configured on a host level and is “disk.terminateVMOnPDLDefault”. This setting can be configured in /etc/vmware/settings and should be set to “True”. This setting ensures that a virtual machine is killed when the datastore it resides on is in a PDL state.

The second setting is a vSphere HA advanced setting called “das.maskCleanShutdownEnabled“. This setting is also not enabled by default and it will need to be set to “True”. This settings allows HA to trigger a restart response for a virtual machine which has been killed automatically due to a PDL condition. This setting allows HA to differentiate between a virtual machine which was killed due to the PDL state or a virtual machine which has been powered off by an administrator.

But why is “das.maskCleanShutdownEnabled” needed for HA? From a vSphere HA perspective there are two different types of “operations”. The first is a user initiated power-off (clean) and the other is a kill. When a virtual machine is powered off by a user part of the process is setting the property “runtime.cleanPowerOff” to true.

Remember that when “disk.terminateVMOnPDLDefault” is configured your VMs will be killed when they issue I/O. This is where the  problem arises, in a PDL scenario it is impossible to set “runtime.cleanPowerOff” as the datastore, and as such the vmx, is unreachable. As the property defaults to “true” vSphere HA will assume the VMs were cleanly powered off. This would result in vSphere HA not taking any action in a PDL scenario. By setting ”das.maskCleanShutdownEnabled” to true, a scenario where all VMs are killed but never restarted can be avoided.

If you have a stretched cluster environment, make sure to configure these settings accordingly!

Stretched Clusters and Site Recovery Manager

My colleague Ken Werneburg, also known as “@vmKen“, just published a new white paper. (Follow him if you aren’t yet!) This white paper talks about both SRM and Stretched Cluster solutions and explains the advantages and disadvantages of either. It provides a great overview in my opinion on when a stretched cluster should be implemented or when SRM makes more sense. Various goals and concepts are discussed and I think this is a must read for everyone exploring implementing a Stretched Clusters or SRM.

http://www.vmware.com/resources/techresources/10262

This paper is intended to clarify concepts involved with choosing solutions for vSphere site availability, and to help understand the use cases for availability solutions for the virtualized infrastructure. Specific guidance is given around the intended use of DR solutions like VMware vCenter Site Recovery Manager and contrasted with the intended use of geographically stretched clusters spanning multiple datacenters. While both solutions excel at their primary use case, their strengths lie in different areas which are explored within.

Permanent Device Loss (PDL) enhancements in vSphere 5.0 Update 1 for Stretched Clusters

In the just released vSphere 5.0 Update 1 some welcome enhancements were added around vSphere HA and how a Permanent Device Loss (PDL) condition is handled. The PDL condition is a condition that is communicated by the array to ESXi via a SCSI sense code and indicates that a device (LUN) is unavailable and more than likely permanently unavailable. This is a condition which is useful for “stretched storage cluster” configurations where in the case of a failure in Datacenter-A the configuration in Datacenter-B can take over. An example of when a condition like this would be communicated by the array would be when a LUN “detached” in a site isolation. PDL is probably most common in non-uniform stretched solutions like EMC VPLEX. With VPLEX site affinity is defined per LUN. If your VM resides in Datacenter-A while the LUN it is stored on has affinity to Datacenter-B in case of failure this VM could lose access to the LUN. These enhancements will ensure the VM is killed and restarted on the other side.

Please note that action will only be taken when a PDL sense code is issued. When your storage completely fails for instance it is impossible to reach the PDL condition as there is no communication possible anymore from the array to the ESXi host and the state will be identified by the ESXi host as an All Paths Down (APD) condition. APD is a more common scenario in most environments. If you are testing these enhancements please check the log files to validate which problem has been identified.

With vSphere 5.0 and prior HA did not have a response in a PDL condition, meaning that when a virtual machine was residing on a datastore which had a PDL condition the virtual machine would just sit there. This virtual machine would be unable to read or write from disk however. As of vSphere 5.0 Update 1 a new mechanism has been introduced which allows vSphere HA to take action when a datastore has reached a PDL state. Two advanced settings make this possible. The first setting is configured on a host level and is “disk.terminateVMOnPDLDefault”. This setting can be configured in /etc/vmware/settings and should be set to “True”. This setting ensures that a virtual machine is killed when the datastore it resides on is in a PDL state. The virtual machine is killed as soon as it initiates disk I/O on a datastore which is in a PDL condition and all of the virtual machine files reside on this datastore. Note that if a virtual machine does not initiate any I/O it will not be killed!

The second setting is a vSphere HA advanced setting called das.maskCleanShutdownEnabled. This setting is also not enabled by default and it will need to be set to “True”. This settings allows HA to trigger a restart response for a virtual machine which has been killed automatically due to a PDL condition. This setting allows HA to differentiate between a virtual machine which was killed due to the PDL state or a virtual machine which has been powered off by an administrator.

As soon as “disaster strikes” and the PDL sense code is sent you will see the following popping up in the vmkernel.log that indicates the PDL condition and the kill of the VM:

2012-03-14T13:39:25.085Z cpu7:4499)WARNING: VSCSI: 4055: handle 8198(vscsi4:0):opened by wid 4499 (vmm0:fri-iscsi-02) has Permanent Device Loss. Killing world group leader 4491
2012-03-14T13:39:25.085Z cpu7:4499)WARNING: World: vm 4491: 3173: VMMWorld group leader = 4499, members = 1

As mentioned earlier, this is a welcome enhancement which especially in non-uniform stretched storage environment can help in specific failure scenarios.