I was talking to someone before the start of the holiday season about running the Vendor Provider (VP) for vVols as a VM and what the best practices are around that. I was thinking about the implication of the VP not being available and came to the conclusion that when the VP is unavailable a bunch of things stop working out of which “bind” is probably most important.
The “bind” operation is what allows vSphere to access a given Virtual Volume (vVol), and this operation is issued during a power-on of a VM. This is how the vVols FAQ describes it:
When a vVol is created, it is not immediately accessible for IO. To Access vVol, vSphere needs to issue a “Bind” operation to a VASA Provider (VP), which creates IO access point for a vVol on a Protocol Endpoint (PE) chosen by a VP. A single PE can be the IO access point for multiple vVolss. “Unbind” Operation will remove this IO access point for a given vVol.
This means that when the VP is unavailable, you can’t power-on VMs at that particular time. For many storage systems that problem is mitigated by having the VP as part of their storage system itself, and of course there is the option to have multiple VPs as part of your solution, either in active/active or in active/standby configuration. In the case of VSAN for instance, each host has a VASA provider out of which one is active and others are standby, if the active fails the standby will take over automatically. So to be clear, it is up to the vendor to decide what type of availability to provide for the VP, some have decided to go for a single instance and rely on vSphere HA to restart the appliance, others have created active/standby etc.
But back to vVols, what if you own a storage system that requires an external VASA VP as a VM?
- Run your VP VMs in a management cluster, if the hosts in the “production” cluster are impacted and VMs are restarted then at least the VP VMs should be up and running in your management cluster
- Use multiple VP VMs if and when possible, if active/active or active / standby is supported make sure to run your VPs in that configuration
- Do not use vVols for the VP itself, you don’t want to have any (circular) dependency between the availability of the VP and being able to power-on the VP itself
- If there is no availability story for the VP, depending on the configuration of the appliance vSphere FT should be considered.
One more thing, if you are considering buying new storage, I think one question you definitely need to ask your vendor is what their story is around the VP. Is it a VM or is it part of the storage system itself? Is there an availability story for the VP, and if so is this “active/active” or “active/standby”? If not, what do they have on their roadmap around this? You are probably also asking yourself what VMware has planned to solve this problem, well there are a couple of things cooking and I can’t say too much about it. One important effort though is the inclusion of bind/unbind in the T10 SCSI standard, but as you can imagine, those things take time. (Which would allow us to power-on new VMs even when the VP is unavailable as it would be a SCSI command.) Until then, when you design a vVol environment, take the above into account when it comes to your Vendor Provider aka VP!