Over the past couple of months I have had more and more discussions with customers and partners about VVols. It seems that Policy Based Management and the VVol granular capabilities are really starting to sink in, and more and more customers are starting to see the benefit of using vSphere as the management plane. The other option of course is pre-defining what is enabled on a datastore/LUN level and use spreadsheets and complex naming schemes to determine where a VM should land, far from optimal. I am not going to discuss the VVols basics at this point, if you need to know more about that simply do a search on VVol.
When having these discussions a bunch of things typically come up, these all have to do with design and procurement considerations when it comes to VVol capable storage. VMware provided a framework, and API, and based on this each vendor has developed their own implementation. These vary from vendor to vendor, as not all storage systems are created equal. So what do you have to think about when designing a VVols environment or when you are procuring new VVol capable storage? Below you find a list of questions to ask, with a short explanation of why this may be important. I will try to add new questions and considerations when I come up with them.
- What level of software is needed for my storage system to support VVol?
In many cases, especially existing legacy storage systems, an upgrade is needed of the software to support VVols, ask:
- What does this upgrade entail?
- What is the risk?
When it is clear what you need to support VVols from a software point of view, ask:
- What are the constraints and limits?
- How many Protocol Endpoints can I have per storage system?
- Do you support all protocols? (FC, NFS, iSCSI etc)
- Is the IO proxied via the Protocol Endpoint? If it is, is their an impact with a large number of VMs?
- Some systems can make a distinction between traffic type and for normal IO will not go through the PE, which means you don’t hit any PE limitations (queue depth being one)
- How many Storage Pools can you have per storage system?
- In some cases (legacy storage systems) the storage pool equals an existing physical construct on the array, what is it and what is the impact of this?
- What kind of options do I select during the creation of the pool? Anything you select on a per Pool level means that when you change policy VVols may have to migrate to other pools, I prefer to avoid data movement. In some cases for instance “replication” is enabled on a storage pool level, I prefer to have this as a policy option
- In some cases (legacy storage systems) the storage pool equals an existing physical construct on the array, what is it and what is the impact of this?
- How many VVols can I have per storage system? (How many VMs do you have, and how many VVols do you expect to have per VM?)
- In some cases, usually legacy storage systems, the number of VVols per array is limited. I have seen as “low” as 2000, with 3 VVols per VM at a mininum (typical 5) you can imagine this restricts the number of VMs you can run on single storage system
- How many Protocol Endpoints can I have per storage system?
And then there is the control / management plane:
- How is the VASA (vSphere APIs for Storage Awareness) Provider implemented?
- There are two options here, either it comes as part of the storage system or it is provided as a virtual machine.
- Then as part of that there’s also the decision around the availability model of the VASA Provider:
- Is it a single instance?
- Active/Standby?
- Active/Active?
- Scale-out?
Note, as it stands today, in order to power-on a VM or create a VM the VASA Provider needs to be available. Hence the availability model is probably of importance, depending on the type of environment you are designing. Also, some prefer to avoid having it implemented on the storage system, as any update means touching the storage system. Others prefer to have it as part of the storage system as it removes the need to have a separate VM that needs to be managed and maintained.
Last but not least, policy capabilities:
- What is exposed through policy?
- Availability? (RAID type / number of copies of object)
- QoS?
- Reservations
- Limits
- Replication?
- Snapshot (scheduling)?
- Encryption?
- Application type?
- Thin provisioning?
I hope this helps having the conversation with your storage vendor, developing your design or guide the conversation during the procurement process. If anyone has additional considerations please leave a comment so I can add it to the list when applicable.
tronar says
Hi,
do you have any pointer to docs about policy granularity ? E.g. can you have a policy that differentiates the swap file of a VM from the rest of the VM files ?
Also, regarding reservations, what “pool” does the reservation act on ? I.e. where does the total capacity come from ? Can it actually prevent a machine from starting?
Thanks!
Duncan Epping says
The documentation explains what you can set and what not: https://pubs.vmware.com/vsphere-65/index.jsp?topic=%2Fcom.vmware.vsphere.storage.doc%2FGUID-9368E15D-3493-4B71-B1FF-AFA53C92662E.html I have not seen anything else so far that goes in to any level of depth to be honest.
With regards to reservations, that depends on the implementation of the vendor, not much I can say about that. Will be different for SolidFire then vendor X. You will need to check their documentation. Also, that is not VVol specific, more relates to the storage system logic.
Dennis Gerolymatos says
Another important design consideration is the capability of your HBAs to support VVOLs – specifically the Second Level LUN ID (SLLID) feature. You can find out by viewing the by doing an esxcli storage core adapter list on a host.
One gotcha for many Cisco UCS users is that the VIC 1225T-1227T & 1225-1227 cards currently do not support this feature.
Here’s a good blog on the topic: https://blogs.vmware.com/virtualblocks/2015/11/12/getting-ready-for-virtual-volume/
MrTaliz says
Actually Cisco hasn’t supported VVOL at all, the VMware HCL has been incorrectly listing some Cisco HBAs as compliant yet they’re not.
I spoke to their EMEA Dev-boss about this last year, and it seemed to me like Cisco, even though they’re a VMware partner, hadn’t really noticed VVOL.
It was like he was surprised that VVOL was released, even though VMware announced it in 2012.
About a week ago Cisco released FNIC 1.6.0.33 which, finally, is supposed to support VVOL. Haven’t tested it yet though.
MrTaliz says
I wonder how many are actually using VVOL in production.
Its been close to 2 years now since 6.0 was launched with it.
Its very interesting technology but I’m starting to feel like it was a bit too late.
The storage arrays its meant for just aren’t as popular anymore, people invest in hyper-converged or “VMaware” and the likes instead.
I’ve also seen a lot of people having big issues when trying to implement VVOL, even data corruption.
It feel its a bit too complex honestly.
Like a difficult way of working around the drawbacks of antique storage arrays.
Just buy modern, simple, storage solutions designed for virtualization from the beginning instead.
peteflecha says
@MrTaliz (VMware employee here)
We have a seen a steady growth in VVol adoption this year. IMHO VVol technology is right on time. There are many organizations that are not using HCI but interested in the simplified operations and Storage Policy Based Management found in VVols. I’m interested in your comment around complexity. VVols is actually pretty straight forward and significantly reduces the complexity of operations for both storage and virtualization teams. Pease feel free to elaborate here or DM on twitter @vPedroArrow.
MrTaliz says
Don’t get me wrong, I love the idea of VVOL.
Its just that implementing it… well how many large organisations have done it?
For example my company has a storage team where a lot of the daily task are done offshore, they basically only know how to follow routines.
Then you have the onshore storage guys who mostly do design and troubleshooting.
Then the organisation has had to be slimmed down, onshore guys have been let go. In turn the storage vendors now do more of what the onshore guys did before.
So basically we have three different levels of storage teams.
Now add the VMware teams to this…
Look at for example Hitachi and their implementation of VVOL. Their first generation required clustered HCS servers dedicated to VVOL.
Forget about trying to get the offshore storage guys to understand it. And the few onshore guys don’t have time.
Then you have things like compability. We use Cisco UCS, and Cisco had apparantly not noticed VVOL, so that was impossible.
Now Hitachi has improved their implementation, and its now directly in the VASA appliance.
And just last week Cisco has finally released a driver supporting VVOL.
But now all flash storage solutions built for VMware from the ground up, like vSAN and Tintri, are much cheaper than the enterprise spinning disk from Hitachi and EMC, so the VMware teams dont want to use what the storage team provides anymore.
And therefore VVOL is basically dead in the water.
peteflecha says
A lot of larger organizations are taking advantage of VVols for a few key reasons, simplified operations, policy based management with SPBM and storage optimization or space reclamation.
I can’t really speak to price because there are too many variables. I will say however that VMware teams love VVols for the fact that they don’t have to constantly go to the storage teams anymore. The storage team is involved only on day 0. From that point fwd the VMware teams manage policies and space. No more going back to the offshore\onshore team to request a LUN.
The storage teams love this as well because they move form a “retail” to a “wholesale” approach to management. Do it once on day 0, set the guidelines for the virtual infrastructure and let the VI admins manage and consume space as they see fit. No more LUN sprawl and doing tedious repeatable tasks. One Datastore (tied to a storage container) that can granularly assign class of service as dictated by the storage polices. It’s actually quite beautiful. Storage guys also love it because of the space reclamation. No more orphaned space wasting away on the array. When a file is deleted (even in guest), VVols handles the UNMAP for you and optimizes your storage consumption.
Point taken on UCS; that was a real pain point for cisco shops for some time. Glad to see that is finally going away.
If your teams decide to go to vSAN you won’t hear me complain. I love vSAN for some of the same reasons I love VVols, SPBM and and simplicity. My only point is you can also achieve this with VVols.
MrTaliz says
Yeah VVOL is probably great once you get it running. The problem is getting there.
Its been possible with vCenter plugins for a long time to have the VMware teams provision storage by themselves, but there are good reasons for it not being used.
For example capacity planning. Its great in theory that the VMware teams don’t have to order LUNs anymore. But what about the storage team? They’d have to change their routines on capacity planning, big time.
What if the VMware teams deploy/migrate 1000 VMs in a few days and fill up the disk arrays(trust me this will happen).
Today with the manual routines the storage teams have a way to “stall” and prevent that from happening.
Now if the VMware teams get their own storage they will be responsible for their own capacity planning.
And again, if they get their own storage why would they get something antique?
peteflecha says
With VVols storage admins have the ability to “stall” as well if they desire. They can either set a storage limit on the storage container or just let it consume space as needed, and the good news is when they delete 500 of those 1,000 VMs, VVols will do real space reclamation.
As far as “antiques” go I am definitely a fan of vSAN but in the use cases where HCI doesn’t make sense, there are some really cutting edge arrays out there these days mixed among the antiques. 🙂
MrTaliz says
Maybe its time for some articles on who has the “best” VVOL implementation?
I’ve been reading up on Hitachi lately, who’ve been championed as one of the leaders on VMwares own blogs.
When reading the documentation however its obvious it needs a lot of work still.
For example when upgrading the HDS VASA provider from 3.2 to 3.3 you need to remove all VVOL VMs using the provider first. What if you have say 5000 VMs, just removing them isn’t a viable solution.
Will you have to do that again when its time for 3.4?
Then you need a dedicated HCS for the storage array, where you do some basic VVOL stuff. But then you also need a “VVOL with HCS”, which is included in the VASA VM, which is the same software(confusing!) where you also do some basic VVOL stuff.
Again, you need both. Cant use only HCS or only “VVOL with HCS”.
Then if you lose the VVOL database, or it gets corrupt or whatever, in the VASA VM, you’re screwed. You’ll have potentially thousands of VMs that cannot be migrated/restarted.
You can’t have multiple VASA provider VMs to the same storage array. So lets say you want to add more vCenters to the VASA provider after you’ve been running for a while… then you have to remove the VASA provider, log in as root via CLI, edit a file to make it “multi vCenter” and restart it. Maybe also remove all VMs using it first(?). Not very pretty.
Then they advice against cancelling svmotions, snapshots etc as it may lead to issues.
It just doesn’t feel very mature yet.
MrTaliz says
I looked at Nimble now and it looks much better. Integrated directly into the array management, no separate VM needed.
They also already support VVOL 2.0 with replication.
Netapp doesn’t look too good though. They had support for VVOL 1.0 in Ontap 8, then it disappeared in Ontap 9?!