I received a bunch of questions around storage masking over the last couple of weeks. One of them was around VMware’s best practice to mask LUNs on a per cluster basis. The best practice has been around for years and basically is there to reduce conflicts. More hosts accessing the same LUNs means more overhead, just to give you an example every 5 minutes a rescan of both HBAs takes place automatically to check for dead storage paths. You can imagine that there’s a difference between 64 hosts accessing your storage or limiting it to for instance 16 hosts. Also think about things like the failure domain you are introducing, what if an APD condition exists, this now doesn’t just impact 1 cluster… It could impact all of them.
For vSphere 5.1 read this revision…
The obvious next question is, won’t I lose a lot of flexibility? Well in a way you do as a simple VMotion to another cluster will not work anymore. But of course there’s always a way to move a host to a different cluster. In my design I usually propose a so called “Transfer Volume”. This Volume(NFS or VMFS) can be used to transfer VMs to a different cluster. Yes there’s a slight operational overhead here, but is also reduces overhead in terms of traffic to a LUN and decreases the chance of scsi reservation conflicts etc.
Here’s the process:
- Storage VMotion the VM from LUN on Array 1 to Transfer LUN
- VMotion VM from Cluster A to Cluster B
- Storage VMotion the VM from Transfer LUN to LUN on Array 2
Of course these don’t necessarily need to be two separate arrays, it could just as easily be a single array with a group of LUNs masked to a particular cluster. For the people who have a hard time visualizing it:
Jason Boche says
We call them swing volumes and they come in handy.
PiroNet says
I use to setup such thing to store templates and various tools and scripts. We call it a Step Stone VMDK 🙂
Ed Marshall says
Very interesting.
Presumably there is no reason why this volume couldn’t itself be a VMFS stored on a available SAN volume?
Duncan Epping says
VMFS or NFS, both is fine. It should be either of those otherwise Storage VMotion will not work.
Duncan says
and of course it can be on either of the two arrays.
Rick Vanover says
I call it LUN exchange, but I like your Transfer Volume term.
Big clusters for sure this is an overhead saver. For smaller environments, it becomes an better design issue.
VMFS’s nature that allows connectivity between management ‘zones’ or dimensions is really underrated. There is no transaction coordinator (except that on each host) and it can even include free ESXi.
Doug says
I, too, use this sort of utility datastore and find the flexibility it provides most useful. However, as with anything reasonably powerful, there is a dark side. I’ve encountered an environment that had an entirely separate ESX cluster for PCI compliance. It was firewalled off and secured from a network perspective and login. Unfortunately, for provisining simplicity, they had presented their templates datastore from the production environment to the PCI environment.
No point in locking the door if the windows are open… 🙂
Hany Michael says
Very nice trick! I remember I saw something similar in a VMworld session but it was much complicated than that. The presenters used to call it “Gateways” if I recall.
Great illustration by the way Duncan, loving your Visio work!
Scott Sauer says
I like this concept and will be embracing it, thanks for writing this up. What do you guys do about ISO’s and templates? Do you seperate these out per cluster? It seems like a waste to maintain multiple copies of this data type.
Duncan says
It’s not only a waste of time but also means you need to maintain multiple templates! Only if the clusters are geo dispersed I would create multiple templates, but just to keep the traffic local. Otherwise 1 would be fine.
dmVI says
If the Transfer LUN is zoned to ALL of the ESX Hosts on both arrays (or group of LUNs as it were) and each array/group of LUNs has 20-30 ESX Hosts – that would result in the transfer LUN being “seen” by 40-60 ESX Hosts total.
Would this be a problem? Would a large number of ESX Hosts seeing the Transfer LUN be fine as long as all (or a large number of those ESX Hosts above 32 for instance) aren’t actively running VM Guests on it?
Duncan Epping says
Correct. This normally shouldn’t be an issue at all. None of the hosts has any VM’s running on it for an extended period of time. And any slight performance degradation is acceptable for these types of LUNs.
Aran says
So you leave the Transfer LUN masked to all ESX clusters and only have a VM hosted on that LUN if it is in the process of being transferred from one cluster to another?
Having one LUN masked to separate clusters does not introduce any arbitration issues?