stretched cluster

SRM vs Stretched Cluster solution /cc @sakacc

Duncan Epping · Feb 11, 2013 ·

I was reading this article by Chad Sakac on vSphere DR / HA, or in other words SRM versus Stretched (vMSC) solutions. I have presented on vSphere Metro Storage Cluster solutions at VMworld together with Lee Dilworth and also wrote a white paper on this topic a while back and various blog posts since. I agree with Chad that there are too many people misinformed about the benefits of both solutions. I have been on calls with customers where indeed people were saying SRM is a legacy solution and the next big thing is “Active / Active”. Funny thing is that in a way I agree when they say SRM has been around for a long time and the world is slowly changing, I do not agree with the term “legacy” though.

I guess it depends on how you look at it, yes SRM has been around for a long time but it also is a proven solution that does what it says it does. It is an orchestration solution for Disaster Recovery solutions. Think about a disaster recovery scenario for a second and then read those two last sentences again. When you are planning for DR, isn’t it nice to use a solution that does what it says it does. Although I am a big believer in “active / active” solutions, there is a time and place for it; in many of the discussions I have been a stretched cluster solution was just not what people were looking for. On top of that Stretched Cluster solutions aren’t always easy to operate. That is I guess what Chad was also referring to in his post. Don’t get me wrong, a stretched cluster is a perfectly viable solution when your organization is mature enough and you are looking for a disaster avoidance and workload mobility solution.

If you are at the point of making a decision around SRM vs Stretched Cluster make sure to think about your requirements / goals first. Hopefully all of you have read this excellent white paper by Ken Werneburg. Ken describes the pros and cons of each of these solutions perfectly, read it carefully and then make your decision based on your business requirement.

So just in short to recap for those who are interested but don’t have time to read the full paper, make time though… really do!

Where does SRM shine:

Disaster Recovery
Orchestration
Testing
Reporting
Disaster Avoidance (will incur downtime when VMs failover to other site)

Where does a Stretched Cluster solution shine:

Workload mobility
Cross-site automated load balancing
Enhanced downtime avoidance
Disaster Avoidance (VMs can be vMotioned, no downtime incurred!)

Thinking about a stretched vCloud Director deployment

Duncan Epping · Nov 20, 2012 ·

Lately I have been thinking about what it would take to deploy a stretched vCloud Director (vCD) infrastructure. “The problem” with a vCloud Director infrastructure is that there are so many moving components, this makes it difficult to figure out how to protect each component. Let me point out that I do not have all the definitive answers to this yet, I am writing this article to get a better understanding of the problem myself. If you do not agree with my reasoning please feel free to comment, as I need YOUR help defining the recommended practices around vCD on a stretched infrastructure.

I listed the components I used in my lab:

vCenter Server Management
vCenter Server Cloud Resources
vCloud Director Cells
vShield Manager
Database Server

That would be 5 moving components, but in reality we are talking more around 8. The thing here is that vCenter Server also has multiple components:

Single Sign On
Inventory Service
Web Client
vCenter Server

How do I protect these 8 components? The first 5 listed will be individual VMs and vCloud Director itself will be multiple cells even. What would this look like?

As you can see there are multiple vCenter Servers, one manages the Management Cluster and its components. While the other manages the “Cloud Resource Cluster”. Lets start listing all the components and discuss what the options are and if we can protect them in a special way or not.

vCenter Server (cloud resources and management)

vCenter Server can be protected through various methods. There is vCenter Heartbeat and of course we have vSphere HA (including VM Monitoring). First of all it is key to realize that neither of these solutions are fully “non-disruptive”. Both vSphere HA and vCenter Heartbeat will cause a slight disruption. vSphere HA will simply restart your VM when a host has failed, and vSphere HA – VM Monitoring can restart the Guest OS when the VM has failed. vCenter Heartbeat is a more intelligent solution, it can detect outages using a heartbeat mechanism and respond to that.

I guess the question is availability vs operational simplicity. How important is vCenter Server availability in your environment? Setting up vSphere HA and VM Monitoring is a matter of seconds. Installing and configuring vCenter Heartbeat is probably hours… And think about upgrade processes etc. I personally prefer not using vCenter Heartbeat but going for vSphere HA and VM Monitoring in this scenario, how about you?

What about these vCenter services like SSO / Inventory Service / Web Client etc. Although in a way, from a scalability/performance perspective, it might make sense to split things out… It also makes your environment more vulnerable to failures. What if 1 VM in your “vCenter service chain” is down. That might render your whole solution unusable. I would personally prefer to have vCenter Server, Inventory Service and the Web Client to be installed in a single VM. I can imagine that for SSO you would like to split it out, so that when you have multiple vCenter Server instances you can link them to the same SSO instance.

As mentioned SSO potentially could be deployed in an HA fashion. HA with regards to SSO is an active/standby solution, but I have been told there are other ways of deploying it and more info would be released soon.

Recommended Practice: I am a big fan of keeping things simple. Keep vCenter and at a minimum the Inventory Service together, and potentially the Web Client. Although Heartbeat has the potential of decreasing vCenter Server downtime, in many cloud environments SLAs are around vCloud workload availability and not about vCenter itself. One component that I would recommended to configure in a HA fashion is SSO. Without SSO you cannot login, this is critical for operations.

vCloud Director

Hopefully all of you are aware that vCloud Director can easily scale by deploying new “cells” as we call it. A cell is simply said a virtual machine running the vCD software. These cells are all connected to the same database and can handle load. Not only can they handle load, but they can also continue where another stopped. So from an Availability perspective this is ideal. I already depicted this in the diagram above by the way.

Recommended Practice: Deploy multiple vCloud Director cells in your management cluster. Ensure that at a minimum two cells reside on each of the “sites” of your stretched cluster. In order to achieve this vSphere DRS VM-Host affinity groups should be used!

vShield Manager

vShield Manager is one of the difficult components. It is a single virtual machine. You can protect it using vSphere HA but that is about it as the VM has multiple vCPUs which rules out FT. So what would make sense in this case? I would try to ensure that the vShield Manager is in the same site as vCenter Server. In the case there is a network failure between sites, at least the vShield Manager and vCenter Server can communicate when needed.

Recommended Practice: The vShield Manager virtual appliance resides in the same site as the vCenter Server, in other words it is a recommended practice to have both be part of the same vSphere DRS VM-Host affinity group. It is also recommended to leverage vSphere HA – VM Monitoring to allow for automatic restarts to occur in the case of a host or guest failure.

Database

This is the challenging one… As of vCloud Director 5.1 it is supported to cluster your database. So you could potentially cluster the vCD database. However this Database Server will host more than just vCD, it will probably also host the vCenter Server database and potentially other bits and pieces like Chargeback / Orchestrator etc. Not all of these support a clustered database solution today unfortunately. It is difficult defining a recommended practice in this case. Although Database Clustering will theoretically increase availability it will also complicate operations. From an operational perspective the difficult part is how to manage site isolations. Just imagine the network between Site-A and Site-B is down but all components are still running. What will you do with the database?

This is definitely one I am not sure about what to do with…

Summary

As you can see this is not a fully worked out set of recommended practices guide yet, there is still stuff to be figured out and I am going through the exercise as we speak. If you have an opinion about this, and I am sure many do, don’t hesitate to leave a comment!

vSphere Metro Storage Cluster – Uniform vs Non-Uniform

Duncan Epping · Nov 13, 2012 ·

Last week I presented in Belgium at the quarterly VMUG event in Brussels. We did a Q&A and got some excellent questions. One of them was about vSphere Metro Storage Cluster (vMSC) solutions and more explicitly about Uniform vs Non-Uniform architectures. I have written extensively about this in the vSphere Metro Storage Cluster whitepaper but realized I never blogged that part. So although this is largely a repeat of what I wrote in the white paper I hope it is still useful for some of you.

<update>As of 2013 the official required bandwidth is 250Mbps per concurrent vMotion</update>

Uniform Versus Nonuniform Configurations

VMware vMSC solutions are classified in two distinct categories, based on a fundamental difference in how hosts access storage. It is important to understand the different types of stretched storage solutions because this will impact your design and operational considerations. Most storage vendors have a preference for one of these solutions, so depending on your preferred vendor it could be you have no choice. The following two main categories are as described on the VMware Hardware Compatibility List:

Uniform host access configuration – When ESXi hosts from both sites are all connected to a storage node in the storage cluster across all sites. Paths presented to ESXi hosts are stretched across distance.
Nonuniform host access configuration – ESXi hosts in each site are connected only to storage node(s) in the same site. Paths presented to ESXi hosts from storage nodes are limited to the local site.

We will describe the two categories in depth to fully clarify what both mean from an architecture/implementation perspective.

With the Uniform Configuration, hosts in Datacenter A and Datacenter B have access to the storage systems in both datacenters. In effect, the storage-area network is stretched between the sites, and all hosts can access all LUNs. NetApp MetroCluster is an example of this. In this configuration, read/write access to a LUN takes place on one of the two arrays, and a synchronous mirror is maintained in a hidden, read-only state on the second array. For example, if a LUN containing a datastore is read/write on the array at Datacenter A, all ESXi hosts access that datastore via the array in Datacenter A. For ESXi hosts in Datacenter A, this is local access. ESXi hosts in Datacenter B that are running virtual machines hosted on this datastore send read/write traffic across the network between datacenters. In case of an outage, or operator-controlled shift of control of the LUN to Datacenter B, all ESXi hosts continue to detect the identical LUN being presented, except that it is now accessed via the array in Datacenter B.

The notion of “site affinity”—sometimes referred to as “site bias” or “LUN locality”—for a virtual machine is dictated by the read/write copy of the datastore. For example, when a virtual machine has site affinity with Datacenter A, its read/write copy of the datastore is located in Datacenter A.

The ideal situation is one in which virtual machines access a datastore that is controlled (read/write) by the array in the same datacenter. This minimizes traffic between datacenters and avoids the performance impact of reads’ going across the interconnect. It also minimizes unnecessary downtime in case of a network outage between sites. If your virtual machine is hosted in Datacenter B but its storage is in Datacenter A you can imagine the virtual machine won’t be able to do I/O when there is a site partition.

With the Non-uniform Configuration, hosts in Datacenter A have access only to the array in Datacenter A. Nonuniform configurations typically leverage the concept of a “virtual LUN.” This enables ESXi hosts in each datacenter to read and write to the same datastore/LUN. The clustering solution maintains the cache state on each array, so an ESXi host in either datacenter detects the LUN as local. Even when two virtual machines reside on the same datastore but are located in different datacenters, they write locally without any performance impact on either of them.

Note that even in this configuration each of the LUNs/datastores has “site affinity” defined. In other words, if anything happens to the link between the sites, the storage system on the preferred site for a given datastore is the only remaining one that has read/write access to it, thereby preventing any data corruption in the case of a failure scenario. This also means that it is recommended to align virtual machine – host affinity with datastore affinity to avoid any unnecessary disruption caused by a site isolation.

I hope this helps understanding the differences between Uniform vs Non-Uniform configurations. Many more details about vSphere Metro Storage Cluster solutions, including design and operational considerations, can be found in the vSphere Metro Storage Cluster whitepaper. Make sure to read it if you are considering, or have implemented, a stretched storage solution!

Bandwidth requirements for long distance vMotion

Duncan Epping · Oct 31, 2012 ·

I received a question a while back about the bandwidth requirements for long distance vMotion, aka live migration across distance. I was digging through some of the KBs around stretched clusters and must say they weren’t really clear, or at least not consistently clear…

Thanks everyone. Is Long Distance vMotion still requiring a minimum of 622 (1Gb) in current versions? /cc @duncanyb

— Kurt Bales (@networkjanitor) October 3, 2012

I contacted support and asked them for a statement but have had no clear response yet. The following statements is what I have been able to validate when it comes to “long distance vmotion”. So this is no VMware support statement, but my observations:

Maximum latency of 5 milliseconds (ms) RTT (round trip time) between hosts participating in vMotion, or 10ms RTT between hosts participating with Enterprise Plus (Metro vMotion feature).
<update>As of 2013 the official required bandwidth is 250Mbps per concurrent vMotion</update>
Source and destination vSphere hosts must have a network interface on the same IP subnet and broadcast domain.

There are no longer any direct bandwidth requirements as far as I have been able to validate. The only requirement VMware seems to have are the ones mentioned above around maximum tolerated latency and layer 2 adjacency. If this statement changes I will update this blog post accordingly.

PS: There are various KBs that mention 622Mbps, but there are also various that don’t list it. I have requested our KB team to clear this up.

Some questions about Stretched Clusters with regards to power outages

Duncan Epping · Oct 9, 2012 ·

Today I received an email about the vSphere Metro Storage Cluster paper I wrote, or better said about stretched clusters in general. I figured I would answer the questions in a blog post so that everyone can chip in / read etc. So lets show the environment first so that the questions are clear. Below is an image of the scenario.

Below are the questions I received:

If a power outage occurs at Frimley the 2 hosts get a message by the UPS that there is a power outage. After 5 minutes (or any other configured value) the next action should start. But what will be the next action? If a scripted migration to a host at Bluefin starts, will DRS move some VMs back to Frimley? Or could the VMs get a mark to stick at Bluefin? Should the hosts at Frimley placed into Maintenance mode so the migration will be done automatically? And what happens if there is a total power outage both at Frimley and Bluefin? How a controlled shutdown across hosts could be arranged?

Lets start breaking it down and answer where possible. The main question is how do we handle power outages. As in any datacenter this is fairly complex. Well the powering-off part is easy, powering everything on in the right order isn’t. So where do we start? First of all:

If you have a stretched cluster environment and, in this case, Frimley data center has a power outage, it is recommended to place the hosts in maintenance mode. This way all VMs will be migrated to the Bluefin data center without disruption. Also, when power returns it allows you to do check on the host before introducing them to the cluster again.
If maintenance mode is not used and a scripted migration is done virtual machines will be migrated back probably by DRS. DRS is triggered every 5 minutes (at a minimum). Avoid this, use maintenance mode!
If there is an expected power outage and the environment is brought down it will need to be manually powered on in the right order. You can also script this, but a stretched cluster solution doesn’t cater for this type of failure unfortunately.
If there is an unexpected power outage and the environment is not brought down then vSphere HA will start restarting virtual machines when the hosts come back up again. This will be done using the “restart priority” that you can set with vSphere HA. It should be noted that the “restart priority” is only about the completion of the power-on task, not about the full boot of the virtual machine itself.

I hope that clarifies things.