Different tiers of storage in a single Storage DRS datastore cluster?

This question around adding different tiers of storage in a single Storage DRS datastore cluster keeps popping up every once in a while. I can understand where it is coming from as one would think that VM Storage Profiles combined with Storage DRS would allow you to have all types of tiers in one cluster, but then balance within that “tier” within that pool.

Truth is that that does not work with vSphere 5.1 and lower unfortunately. Storage DRS and VM Storage Profiles (Profile Driven Storage) are not tightly integrated. Meaning that when you provision a virtual machine in to a datastore cluster and Storage DRS needs to rebalance the cluster at one point, it will consider ANY datastore within that datastore cluster as a possible placement destination. Yes I agree, it is not what you hoped for… it is – what it is. (feature request filed) Frank visualized this nicely in his article a while back:

So when you architect your datastore clusters, there are a couple of things you will need to keep in mind. These are the design rules at a minimum, that is if you ask me:

  • LUNs of the same storage tier
    • See above
  • More LUNs = more balancing options
    • Do note size matters, a single LUN will need to be able to fit your largest VM!
  • Preferably LUNs of the same array (so VAAI offload works properly)
    • VAAI XCOPY (used by SvMotion for instance) doesn’t work when going from Array-A to Array-B
  • When replication is used, LUNs that are part of the same consistency group
    • You will want to make sure that VMs that need to be consistent from a replication perspective are not moved to a LUN that is outside of the consistency group
  • Similar availability characteristics and performance characteristics
    • You don’t want potential performance or availability to degrade when a VM is moved

Hope this helps,

Vendors to check out at VMworld

Cormac just released his article about storage vendors to check out at VMworld, right when I was typing up this article. Make sure to read that one as well as it contains some great suggestions… I was looking at the list of vendors who have a booth at VMworld, there are a whole bunch I am going to try to check out this round. Of course some of the obvious ones are my friends over at Tintri, Nutanix and Pure Storage but lets try to list a few lesser known vendors. These are not all storage vendors by the way, but a mix of various types of startups from the VMware ecosystem. I have added my own oneliner to it, so you know what to expect.

  • Actifio – Business Continuity / Disaster Recovery solution that seems to be gaining traction, maybe I should say “Copy Data Management” solution instead, as that is ultimately what it is they do.
  • CloudPhysics – Monitoring / Analytics, the power of many! Or as I stated a while back: Where most monitoring solutions stop CloudPhysics continues.
  • Cumulus Networks – Linux Network Operating System is how they describe themselves, decoupling software from hardware is another way of looking at it… interesting company!
  • Infinio – Downloadable NFS performance enhancer! AKA memory caching solutions for NFS based infrastructures, check the intro article I wrote a while back…
  • Maxta – Software Defined Storage solution, virtual appliance based and hypervisor agnostic… Not spoken with them, or seen their solution yet
  • Panzura – A name that keeps popping up more and more often, a global distributed cloud storage solution. Haven’t dug in to it yet, but when I get the chance at VMworld I will…
  • PernixData – Came out of stealth this year, and as you all know is working on a write back flash caching solution… One of the few offering a clustered write back solution within the hypervisor
  • Plexxi – Networking done in a different way, SDN I would say.
  • SolidFire – SolidFire is definitely one cool scale-out storage solution to watch out for, one of the few which actually has a good answer to the question: do you offer Quality of Service? More details about what it is they do here… Not on the show floor, but outside of the expo.

Just a couple of companies which I feel are interesting and worth talking with.

Change in Permanent Device Loss (PDL) behavior for vSphere 5.1 and up?

Yesterday someone asked me a question on twitter about a whitepaper by EMC on stretched clusters and Permanent Device Loss (PDL) behavior. For those who don’t know what a PDL is, make sure to read this article firstThis EMC whitepaper states the following on page 40:

Note:

In a full WAN partition that includes cross-connect, VPLEX can only send SCSI sense code (2/4/3+5) across 50% of the paths since the cross-connected paths are effectively dead. When using ESXi version 5.1 and above, ESXi servers at the non-preferred site will declare PDL and kill VM’s causing them to restart elsewhere (assuming advanced settings are in place); however ESXi 5.0 update 1 and below will only declare APD (even though VPLEX is sending sense code 2/4/3+5). This will result in a VM zombie state. Please see the section Path loss handling semantics (PDL and APD)

Now as far as I understood, and I tested this with 5.0 U1 the VMs would not be killed indeed when half of the paths were declared APD and the other half PDL. But I guess something has changed with vSphere 5.1. I knew about one thing that has changed which isn’t clearly documented so I figured I would do some digging and write a short article on this topic. So here are the changes in behavior:

Virtual Machine using multiple Datastores:

  • vSphere 5.0 u1 and lower: When a Virtual Machine’s files are spread across multiple Datastores it might not be restarted in the case a Permanent Device Loss scenario occurs.
  • vSphere 5.1 and higher: When a Virtual Machine’s files are spread across multiple Datastores and a Permanent Device Loss scenario occurs then vSphere HA will restart the virtual machine taking availability of those datastores on the various hosts in your cluster in to account.

Half of the paths in APD state:

  • vSphere 5.0 u1 and lower: When a datastore on which your virtual machine resides is not in a 100% declared in a PDL state (assume half of the paths in APD) then the virtual machine will not be killed and restarted.
  • vSphere 5.1 and higher: When a datastore on which your virtual machine resides is not in a 100% declared in a PDL state (assume half of the paths in APD) then the virtual machine will be killed and restarted. This is a huge change compared to 5.0 U1 and lowe

These are the changes in behavior I know about for vSphere 5.1, I have asked engineering to confirm these changes for vSphere Metro Storage Cluster environments. When I have received an answer I will update this blog.

CloudPhysics KB Advisor, how cool is that?

Just imagine, you have 3-8 hosts – an EMC array – Dell hardware – some FibreChannel cards – Specific versions of firmware – Specific versions of ESXi and vCenter… How do you know what works and what does not? Well, you go to kb.vmware.com and you do a search and try to figure out what applies to you and what does not. In this depicted environment of only 3-8 hosts that should be simple? Well with thousands of KB articles I can ensure you that it is not… Just imagine now that you have 2 arrays and 2 clusters of 8 hosts… Or you add iSCSI to the mix? Yes it gets extremely overly complicated really really quick, in fact I would say it is impossible to figure out what does and does not apply to your environment. How do you solve that?

Well you don’t solve that yourself, it requires a big database and an analytics engine behind it… Big data platform even. Luckily though, the smart folks of CloudPhysics have solved it for you. Sign up, download the appliance and let them do the work for you… It doesn’t get any easier than that if you ask me. Some more details can be found in the press release.

I knew the CPhy guys were working on this, surprises me that no one else has done this so far to be honest. What an elegant / simple / awesome solution! Thanks CloudPhysics for making my life once again a whole lot easier.