virtual volumes

VVols design and procurement considerations

Duncan Epping · Feb 21, 2017 ·

Over the past couple of months I have had more and more discussions with customers and partners about VVols. It seems that Policy Based Management and the VVol granular capabilities are really starting to sink in, and more and more customers are starting to see the benefit of using vSphere as the management plane. The other option of course is pre-defining what is enabled on a datastore/LUN level and use spreadsheets and complex naming schemes to determine where a VM should land, far from optimal. I am not going to discuss the VVols basics at this point, if you need to know more about that simply do a search on VVol.

When having these discussions a bunch of things typically come up, these all have to do with design and procurement considerations when it comes to VVol capable storage. VMware provided a framework, and API, and based on this each vendor has developed their own implementation. These vary from vendor to vendor, as not all storage systems are created equal. So what do you have to think about when designing a VVols environment or when you are procuring new VVol capable storage? Below you find a list of questions to ask, with a short explanation of why this may be important. I will try to add new questions and considerations when I come up with them.

What level of software is needed for my storage system to support VVol?

In many cases, especially existing legacy storage systems, an upgrade is needed of the software to support VVols, ask:

What does this upgrade entail?
What is the risk?

When it is clear what you need to support VVols from a software point of view, ask:

What are the constraints and limits?
- How many Protocol Endpoints can I have per storage system?
  - Do you support all protocols? (FC, NFS, iSCSI etc)
  - Is the IO proxied via the Protocol Endpoint? If it is, is their an impact with a large number of VMs?
    - Some systems can make a distinction between traffic type and for normal IO will not go through the PE, which means you don’t hit any PE limitations (queue depth being one)
- How many Storage Pools can you have per storage system?
  - In some cases (legacy storage systems) the storage pool equals an existing physical construct on the array, what is it and what is the impact of this?
    - What kind of options do I select during the creation of the pool? Anything you select on a per Pool level means that when you change policy VVols may have to migrate to other pools, I prefer to avoid data movement. In some cases for instance “replication” is enabled on a storage pool level, I prefer to have this as a policy option
- How many VVols can I have per storage system? (How many VMs do you have, and how many VVols do you expect to have per VM?)
  - In some cases, usually legacy storage systems, the number of VVols per array is limited. I have seen as “low” as 2000, with 3 VVols per VM at a mininum (typical 5) you can imagine this restricts the number of VMs you can run on single storage system

And then there is the control / management plane:

How is the VASA (vSphere APIs for Storage Awareness) Provider implemented?
- There are two options here, either it comes as part of the storage system or it is provided as a virtual machine.
Then as part of that there’s also the decision around the availability model of the VASA Provider:
- Is it a single instance?
- Active/Standby?
- Active/Active?
- Scale-out?

Note, as it stands today, in order to power-on a VM or create a VM the VASA Provider needs to be available. Hence the availability model is probably of importance, depending on the type of environment you are designing. Also, some prefer to avoid having it implemented on the storage system, as any update means touching the storage system. Others prefer to have it as part of the storage system as it removes the need to have a separate VM that needs to be managed and maintained.

Last but not least, policy capabilities:

What is exposed through policy?
- Availability? (RAID type / number of copies of object)
- QoS?
  - Reservations
  - Limits
- Replication?
- Snapshot (scheduling)?
- Encryption?
- Application type?
- Thin provisioning?

I hope this helps having the conversation with your storage vendor, developing your design or guide the conversation during the procurement process. If anyone has additional considerations please leave a comment so I can add it to the list when applicable.

Virtually Speaking Podcast episode 32 – VVol 2.0

Duncan Epping · Dec 6, 2016 ·

Just wanted to share the Virtually Speaking Podcast with you, this episode (32) is on the topic of VVol 2.0 and features Pete Flecha, Ben Meadowcroft (PM for VVol) and I. Make sure to listen to it, it has some good info on where VVol is today and where it may be going in the near future!

Goodbye SAN Huggers!

Duncan Epping · Jun 20, 2016 ·

This week I presented at the German VMUG and a day after the event someone commented on my session. Well not really on my session, but more on my title. My title was “Goodbye SAN Huggers“. Mildly provocative indeed. “SAN Huggers” is a play on the term “Server Hugger“. That term has been around for the longest time and refers to people who prefer to be able to point out their servers. People who prefer the physical world, where every application ran on one server and every server was equal to one physical box.

SAN Huggers are pretty much the same type of people. They prefer the physical world. A world where they define RAID Groups, Storage Pools and LUNs. A world where a bunch of servers end up on the LUN they created. Those LUNs have certain data services enabled and if you need other data services, well then you simply move your servers around! SAN Huggers like to maintain strict control, and to me personally they are in the same position the Server Huggers were 12-15 years ago. It is time to let go however!

Now let it be clear, 12-15 years ago when virtualization changed the world of IT and VMware exploded literally and server huggers felt threatened by the rise of virtualization servers did not go away. Server Administrators did not disappear. Server Administrators evolved. Many took on additional responsibilities, in most cases that would be the responsibility over VMware ESX / Virtual Infrastructure. The same applies to storage.

When I say goodbye SAN Huggers, I am not referring to “Virtual SAN” taking over the world. (Although I do think that Hyper-Converged will eat the traditional storage system’s lunch for a large portion.) I am talking about how the world of storage is (and has been) evolving. Literally my next slide typically has a quote on it that states the following: “Hardware evolution started the storage revolution“. The story around this slide makes it clear what I mean when I say Goodbye SAN Huggers.

The hardware evolution has literally changed the storage landscape. Software Defined Storage is quickly taking over the world, but in my opinion the key reason for this is the evolution from a hardware perspective. In the past we had to group harddisk to provide a single unit that could deliver sufficient capacity, performance and increase availability at the same time. That was achieved using RAID constructs, and with the introduction of virtualization and high demanding workloads storage systems had to resort to wide striping, introduced larger caches, disk pools etc to deliver the capabilities required.

In todays world a lot of these constructs are no longer needed. The evolution in the world of hardware allowed for the introduction of Software Defined Storage. First and foremost flash, whether PCIe, NVMe or SAS/SATA based. These devices removed the need for complex constructs to provide thousands of IOPS. A single flash device today, even consumer grade, can provide higher performance than many of the storage systems we have all managed over the years. Not even talking about enterprise grade flash devices where 100k IOPS (with sub millisecond latency) is more or less the norm. Than there is the network, 10GbE, 25GbeE, 40GbE and even higher. Low latency and high bandwidth comes at (relative) low cost, and add to that the ever growing CPU capabilities, cores and speed combined with faster bus speeds and high (and fast) memory configurations. Hardware is no longer a constraint, the revolution is now, enter the world of Software Defined Storage.

And this, this is where I typically introduce: Virtual SAN, Virtual Volumes and the vSphere APIs for IO Filtering (vSphere Data Services delivered through filter drivers). Functionality provided by VMware which enables efficient operations, simplicity and flexibility. All through the use of policy, which can simply be attached to your workloads, be it a virtual machine or virtual disk even. The days of creating LUNs, RAID groups and needing wide striping or huge amounts of devices to get a decent user experience are gone. We can say goodbye to the physical world, we can say goodbye to the SAN Hugger. We can move forward and evolve, not just our datacenters but also our personal growth and areas of interest and expertise as a result.

Where do I run my VASA Vendor Provider for vVols?

Duncan Epping · Jan 6, 2016 ·

I was talking to someone before the start of the holiday season about running the Vendor Provider (VP) for vVols as a VM and what the best practices are around that. I was thinking about the implication of the VP not being available and came to the conclusion that when the VP is unavailable a bunch of things stop working out of which “bind” is probably most important.

The “bind” operation is what allows vSphere to access a given Virtual Volume (vVol), and this operation is issued during a power-on of a VM. This is how the vVols FAQ describes it:

When a vVol is created, it is not immediately accessible for IO. To Access vVol, vSphere needs to issue a “Bind” operation to a VASA Provider (VP), which creates IO access point for a vVol on a Protocol Endpoint (PE) chosen by a VP. A single PE can be the IO access point for multiple vVolss. “Unbind” Operation will remove this IO access point for a given vVol.

This means that when the VP is unavailable, you can’t power-on VMs at that particular time. For many storage systems that problem is mitigated by having the VP as part of their storage system itself, and of course there is the option to have multiple VPs as part of your solution, either in active/active or in active/standby configuration. In the case of VSAN for instance, each host has a VASA provider out of which one is active and others are standby, if the active fails the standby will take over automatically. So to be clear, it is up to the vendor to decide what type of availability to provide for the VP, some have decided to go for a single instance and rely on vSphere HA to restart the appliance, others have created active/standby etc.

But back to vVols, what if you own a storage system that requires an external VASA VP as a VM?

Run your VP VMs in a management cluster, if the hosts in the “production” cluster are impacted and VMs are restarted then at least the VP VMs should be up and running in your management cluster
Use multiple VP VMs if and when possible, if active/active or active / standby is supported make sure to run your VPs in that configuration
Do not use vVols for the VP itself, you don’t want to have any (circular) dependency between the availability of the VP and being able to power-on the VP itself
If there is no availability story for the VP, depending on the configuration of the appliance vSphere FT should be considered.

One more thing, if you are considering buying new storage, I think one question you definitely need to ask your vendor is what their story is around the VP. Is it a VM or is it part of the storage system itself? Is there an availability story for the VP, and if so is this “active/active” or “active/standby”? If not, what do they have on their roadmap around this? You are probably also asking yourself what VMware has planned to solve this problem, well there are a couple of things cooking and I can’t say too much about it. One important effort though is the inclusion of bind/unbind in the T10 SCSI standard, but as you can imagine, those things take time. (Which would allow us to power-on new VMs even when the VP is unavailable as it would be a SCSI command.) Until then, when you design a vVol environment, take the above into account when it comes to your Vendor Provider aka VP!

vVols primer

Duncan Epping · Mar 9, 2015 ·

I was digging through my blog for a link to a vVols primer article and I realized I never wrote one. I did an article which described what Virtual Volumes (VVol) is in 2012 but that is it. I am certain that Virtual Volumes is a feature that will be heavily used with vSphere 6.0 and beyond, so it was time to write a primer. What is vVols about? What will they bring to the table?

First and foremost, vVols was developed to make your life (vSphere admin) and that of the storage administrator easier. This is done by providing a framework that enables the vSphere administrator to assign policies to virtual machines or virtual disks. In these policies capabilities of the storage array can be defined. These capabilities can be things like snapshotting, deduplication, raid-level, thin / thick provisioning etc. What is offered to the vSphere administrator is up to the Storage administrator, and of course up to what the storage system can offer to begin with. When a virtual machine is deployed and a policy is assigned then the storage system will enable certain functionality of the array based on what was specified in the policy. So no longer a need to assign capabilities to a LUN which holds many VMs, but rather a per VM or even per VMDK level control. So how does this work? Well lets take a look at an architectural diagram first.

The diagram shows a couple of components which are important in the VVol architecture. Lets list them out:

Protocol Endpoints aka PE
Virtual Datastore and a Storage Container
Vendor Provider / VASA
Policies
vVols

Lets take a look at all of these three in the above order. Protocol Endpoints, what are they?

Protocol Endpoints are literally the access point to your storage system. All IO to vVols is proxied through a Protocol Endpoint and you can have 1 or more of these per storage system, if your storage system supports having multiple of course. (Implementations of different vendors will vary.) PEs are compatible with different protocols (FC, FCoE, iSCSI, NFS) and if you ask me that whole discussion with vVols will come to an end. You could see a Protocol Endpoint as a “mount point” or a device, and yes they will count towards your maximum number of devices per host (256). (Virtual Volumes it self won’t count towards that!)

Next up is the Storage Container. This is the place where you store your virtual machines, or better said where your vVols end up. The Storage Container is a storage system logical construct and is represented within vSphere as a “virtual datastore”. You need 1 per storage system, but you can have many when desired. To this Storage Container you can apply capabilities. So if you like your virtual volumes to be able to use array based snapshots then the storage administrator will need to assign that capability to the storage container. Note that a storage administrator can grow a storage container without even informing you. A storage container isn’t formatted with VMFS or anything like that, so you don’t need to increase the volume in order to use the space.

But how does vSphere know which container is capable of doing what? In order to discover a storage container and its capabilities we need to be able to talk to the storage system first. This is done through the vSphere APIs for Storage Awareness. You simply point vSphere to the Vendor Provider and the vendor provider will report to vSphere what’s available, this includes both the storage containers as well as the capabilities they possess. Note that a single Vendor Provider can be managing multiple storage systems which in its turn can have multiple storage containers with many capabilities. These vendor providers can also come in different flavours, for some storage systems it is part of their software but for others it will come as a virtual appliance that sits on top of vSphere.

Now that vSphere knows which systems there are, what containers are available with which capabilities you can start creating policies. These policies can be a combination of capabilities and will ultimately be assigned to virtual machines or virtual disks even. You can imagine that in some cases you would like Quality of Service enabled to ensure performance for a VM while in other cases it isn’t as relevant but you need to have a snapshot every hour. All of this is enabled through these policies. No longer will you be maintaining that spreadsheet with all your LUNs and which data service were enabled and what not, no you simply assign a policy. (Yes, a proper naming scheme will be helpful when defining policies.) When requirements change for a VM you don’t move the VM around, no you change the policy and the storage system will do what is required in order to make the VM (and its disks) compliant again with the policy. Not the VM really, but the vVols.

The great thing about vVols is the fact that you know have a granular control over your workloads. Some storage systems will even allow you to assign IO profiles to your VM to ensure optimal performance. Also, when you delete a VM the vVols will be deleted and the space will automatically be reclaimed by the storage system, no more fiddling with vmkfstools. Another great thing about virtual volumes is that even when you delete something within your VM this space can also be reclaimed by the storage system. When your storage system supports T10 UNMAP that is.

That is in short how vVols work and what they bring. You as the vSphere administrator create policies and assign those to VMs, while the storage administrator manages capacity and capabilities. Easy right?!