Virtual Volumes and queueing

I was reading an article last week by Ray Lucchesi on Virtual Volumes and queueing. In that article (and podcast) Ray (and friends on the podcast) describe Virtual Volumes and the benefits they bring but also a potential danger. I have written about Virtual Volumes before and if you don’t know what it is or does then I recommend reading those articles. I have been wondering as well, how all of this works, as I also felt that there could easily be a bottleneck. I had some conversations over the last couple of weeks and I figured I would share it with you instead of just leaving a comment on Ray’s blog. Lets look at an architectural diagram first:

In the diagram above (which I borrowed from the vSphere Storage blog, thanks Rolo) you see two important constructs which are part of the overall VVOL architecture namely the Storage Container aka Virtual Datastore and the Protocol Endpoint (PE). The Storage Container is where the VVOLs will be stored. The IO though is proxied through the Protocol Endpoint. You can imagine that if we would not do this and expose every single VVOL directly to vSphere that you would have 1000s of devices connected to vSphere, and as you know vSphere has a 256 device limit at the moment. This would never scale, and as such the Protocol Endpoint is used as an access point to a VVOL capable storage system.

Now think about a VMFS volume and look at the VVOL architectural diagram again. Yes, there is a potential bottleneck indeed. However, what the diagram does not show is that you can have multiple Protocol Endpoints. Ray mentions the following in his post: “I am also not aware of any VASA 2.0 requirement that restricts the number of PEs for a storage system’s support of a single vSphere cluster”. And I can confirm that VMware did not limit the number of Protocol Endpoints in any shape or form. I read the specifications and it literally states 1 PE at a minimum and preferably more. Note that vendor implementations of VVOL may differ, I have seen implementations that describe many PEs per storage system, but also implementations which have 1 PE per storage system. And in the case of 1 PE per storage system can that be a bottleneck?

The queue depth of the Protocol Endpoint isn’t limited to 32 like a regular LUN when multiple VMs are contending for IO (“disk.schednumreqoutstanding”) or 64 (typical device queue depth) but set to 128 by default. This can be increased when required however. Before you do, please consult your storage vendor. There are a couple of variables that need to be taken in to account like the max device queue depth for instance and then there also is the HBA max queue depth as well. (For NFS queue depth is no concern typically.) The potential constraint when there is only (uncommon) a single PE can be mitigated. What is important here is that VVOL itself does not impose any constraints.

I am hoping that clears up some of the misunderstandings out there.

HP ConvergedSystem 200–HC EVO:RAIL available now!

Yesterday I was informed by the EVO:RAIL team that the HP ConvergedSystem 200–HC EVO:RAIL is available (shipping) as of this week. I haven’t seen much around additional pieces HP is including, but I was told though that they are planning to integrate HP One View. HP One View is a management/monitoring solution that gives you a great high level overview of the state of your systems but at the same time enables you to dive deep when required. Depending on the version included HP One View can also do things like Firmware Management, which is very useful in a Virtual SAN environment if you ask me. I know though that many people have been waiting for HP to start shipping as it appears to be a preferred vendor for many customers. In terms of configuration, the HP solution is very much similar to what we have already seen out there:

  • 4 nodes in 2U each containing:
    • 2 x Intel® E5-2620 v2 six-core CPUs
    • 192 GB memory
    • 1 x SAS 300 GB 10k rpm drive ESXi boot device
    • 3 x SAS 1.2 TB 10k rpm drive (VSAN capacity tier)
    • 1 x 400 GB MLC enterprise-grade SSD (VSAN performance tier)
    • 1 x H220 host bus adapter (HBA) pass-through controller
    • 2 x 10GbE NIC ports
    • 1 x 1GbE IPMI port for remote (out-of-band) management

As soon as I find out more around integration of other components I will let you folks know.

What is new for Storage DRS in vSphere 6.0?

Storage DRS must be one of the most under-appreciated features that is part of vSphere. For whatever reason it doesn’t get the airtime it deserves, not even from VMware folks which is a shame if you ask me. I was reading the What’s New material for vSphere 6.0 and I noticed that the “What is new for Storage DRS in vSphere 6.0″ was completely missing. I figured I would do a quick write up of what has been improved and introduced for SDRS in 6.0 as some of the enhancements are quite significant! Lets start with a list and then look at these enhancements in more detail:

  • Deep integration with vSphere APIs for Storage Awareness (VASA)
  • Site Recovery Manager integration
  • vSphere Replication integration
  • Integration with Storage Policy Based Management

Lets start with the top one, deep integration with vSphere APIs for Storage Awareness (VASA) as that is the biggest improvement if you ask me. What the integration with VASA results in is fairly straight forward, when the VASA plugin for your storage system is configured then Storage DRS will understand what capabilities are enabled on your storage system and more specific your datastores. For example: when using Storage DRS previously on a deduplicated datastore it could happen that the migration initiated by Storage DRS had a negative result on the total available capacity on your storage system. This would be caused by the fact that the deduplication ratio was lower on the destination then it was on the source. Not a very pleasant surprise you can imagine. Also when for instance VMs are snapshotted from a storage system point of view or datastores are replicated… you can imagine that there would be an impact when moving a VM around in that scenario. With 6.0 Storage DRS is capable of understanding:

  • Array-based thin-provisioning
  • Array-based deduplication
  • Array-based auto-tiering
  • Array-based snapshot
  • Array-based replication

I guess you get the drill, SDRS is now fully capable of understanding the array capabilities and will make balancing decisions taking these capabilities in to consideration. For instance in the case of replication, when replication is enabled and your datastore is part of a consistency group then SDRS will ensure that the VM is only migrated to a datastore which belongs to the same consistency group! For deduplication this is the opposite by the way, in this case SDRS will be informed about which datastores belong to which deduplication domains and when datastores belong to the same domain it will know that moving between those datastores will have little to no effect on capacity. Depending on the level of detail the storage vendor provides through VASA SDRS will even be aware of how efficient the deduplication process is for a given datastore. (Not a VASA requirement, rather a recommendation so results may vary per vendor implementation) Auto-tiering is also an interesting one as this is something that comes up regularly. In this scenario with previous versions of SDRS it could happen that SDRS was moving VMs while the auto-tier array was just promoting or demoting blocks to a lower or higher tier. As you can imagine not a desired scenario and with the VASA integration this can be prevented from happening.

Second big thing is Site Recovery Manager and vSphere Replication integration. I already mentioned the consistency group awareness, of course this is also part of the SRM integration and when VMs are protected by SRM then SDRS will make sure that those VMs are only moved within their consistency group. If for whatever reason there is no way to move within a consistency group then SDRS as a second option can move VMs between datastores which are part of the same SRM Protection Group. Note that this could have an impact though on your workloads! SDRS of course will never automatically move a VM from a replicated to a non-replicated datastore. In fact, there is a strict hierarchy of what type of moves can be recommended:

  1. Moves within the same consistency group
  2. Moves across consistency groups, but within the same protection group
  3. Moves across protection groups
  4. Moves from a replicated datastore to non-replicated

Note that SDRS will try option 1 first, if it fails, will try option 2, if that fails will try option 3, and so on. Under no circumstances is a recommendation in the category of 2, 3 or 4 executed automatically. You will receive a warning after which you can manually apply the recommendation. This is done to ensure the administrator has full control and full awareness of the migration and can apply it during maintenance or during non-peak hours.

With regards to vSphere Replication also a lot has changed. So far there was no support for vSphere Replication enabled VMs to be part of an SDRS datastore cluster but with 6.0 it is fully supported. As of 6.0 Storage DRS will recognize replica VMs (which are replicated using vSphere Replication) and thresholds have been exceeded then SDRS will query vSphere Replication and will be able to migrate replicas to solve the resource constraint.

Up next the integration with Storage Policy Based Management. In the past when you had different tiers of datastores as part of the same Datastore Cluster then SDRS could potentially move a VM which was assigned policy “gold” to a datastore which was associated with a “silver” policy. With vSphere 6.0, SDRS is aware of storage policies in SPBM and will only move or place VMs to a datastore that can satisfy that VM’s storage policy.

Oh and before I forget, there is also the introduction of IOPS reservations on a per virtual disk level. This isn’t really part of Storage DRS but a function of the mClock scheduler and integrated with Storage IO Control and SDRS where needed. It isn’t available in the UI even in this release, only exposed through the VIM API so I doubt many of you will use it… figured though I would mention it already, and I will do a deeper write up later this week probably.

What is new for vMotion in vSphere 6.0?

vMotion is probably my favourite VMware feature ever. It is one of those features which revolutionized the world and just when you think they can’t really innovate anymore they take it to a whole new level. So what is new?

  • Cross vSwitch vMotion
  • Cross vCenter vMotion
  • Long Distance vMotion
  • vMotion Network improvements
    • No requirement for L2 adjacency any longer!
  • vMotion support for Microsoft Clusters using physical RDMs

That is a nice long list indeed. Lets discuss each of these new features one by one and lets start at the top with Cross vSwitch vMotion. Cross vSwitch vMotion basically allows you to do what the name tells you. It allows you to migrate virtual machines between different vSwitches. Not just from vSS to vSS but also from vSS to vDS and vDS to vDS. Note that vDS to vSS is not supported. This is because when migrating from vDS metadata of the VM is transferred as well and the vSwitch does not have this logic and cannot handle the metadata. Note that the IP Address of the VM that you are migrating will not magically change, so you will need to make sure both  the source and the destination portgroup belong to the same layer 2 network. All of this is very useful during for instance Datacenter Migrations or when you are moving VMs between clusters for instance or are migrating to a new vCenter instance even.

Next on the list is Cross vCenter vMotion. This is something that came up fairly frequent when talking about vMotion, will we ever have the ability to move a VM to a new vCenter Server instance? Well as of vSphere 6.0 this is indeed possible. Not only can you move between vCenter Servers but you can do this with all the different migration types there are: change compute / storage / network. You can even do it without having a shared datastore between the source and destination vCenter aka “shared nothing migration. This functionality will come in handy when you are migrating to a different vCenter instance or even when you are migrating workloads to a different location. Note, it is a requirement for the source and destination vCenter Server to belong to the same SSO domain. What I love about this feature is that when the VM is migrated things like alarms, events, HA and DRS settings are all migrated with it. So if you have affinity rules or changed the host isolation response or set a limit or reservation it will follow the VM!

My personal favourite is Long Distance vMotion. When I say long distance, I do mean long distance. Remember that the max tolerated latency was 10ms for vMotion? With this new feature that just went up to 100ms. Long distance vMotion uses socket buffer resizing techniques to ensure that migrations succeed when latency is high. Note that this will work with any storage system, both VMFS and NFS based solutions are fully supported. I have been told that this feature is still being tested and that we may even see the latency requirements increase, when I get an official statement I will make sure to let you know.

Then there are the network enhancements. First and foremost, vMotion traffic is now fully supported over an L3 connection. So no longer is there the need for L2 adjacency for your vMotion network, I know a lot of you have asked for this and I am happy to be able to announce it. On top of that. You can now also specify which VMkernel interface should be used for migration of cold data. It is not something many people are aware off, but depending on the type of migration you are doing and the type of VM you are migrating it could be in previous versions that the Management Network was used to transfer data. (Frank Denneman described this scenario in this post.) For this specific scenario it is now possible to define a VMkernel interface for “Provisioning traffic” as shown in the screenshot below. This interface will be used for, and let me quote the documentation here, “Supports the traffic for virtual machine cold migration, cloning, and snapshot creation. You can use the provisioning TPC/IP stack to handle NFC (network file copy) traffic during long-distance vMotion. NFC provides a file-type aware FTP service for vSphere, ESXi uses NFC for copying and moving data between datastores.”

Full support for vMotion of Microsoft Cluster virtual machines is also newly introduced in vSphere 6.0. Note that these VMs will need to use physical RDMs and only supported with Windows 2008, 2008 R2, 2012 and 2012 R2. Very useful if you ask me when you need to do maintenance or you have resource contention of some kind.

That was it for now… There is some more stuff coming with regards to vMotion but I cannot disclose that yet unfortunately.

What’s new for HA in vSphere 6.0?

Instead of one generic post with a bunch of data I picked a couple of features and dug a little bit deeper, today I will be discussing what is new for HA in vSphere 6.0. Lets start with a list and then look at the features / enhancements individually:

  • Support for Virtual Volumes – With Virtual Volumes a new type of storage entity is introduced in vSphere 6.0.
  • VM Component Protection – This allows HA to respond to a scenario where the connection to the virtual machine’s datastore is impacted temporarily or permanently.
    • “Response for Datastore with All Paths Down”
    • “Response for Datastore with Permanent Device Loss”
  • Increased scale – Cluster limit has grown from 32 to 64 hosts and to a max of 8000 VMs per cluster
  • Registration of “HA Disabled” VMs on hosts after failure

Lets start with support for Virtual Volumes. It may sound like this is a given but as the whole concept of a VMFS volume no longer exists with Virtual Volumes and VMs have “virtual volumes” instead of VMDKs you can imagine that some work was needed to allow for HA to restart virtual machines stored on a VVOL enabled storage system.

VM Component Protection (VMCP) is in my opinion THE big thing that got added to vSphere HA. What this feature basically allows you to do is protect yourself against storage failures. There are two types of failures VMCP will respond to and those are PDL and APD. Before we look at some of the details, I want to point out that configuring is extremely simple… Just one tickbox to enable it.

HA in vSphere 6.0

In the case of a PDL (permanent device loss), this is something HA already was capable of doing when configured through the command line, a VM will be restarted instantly when a PDL signal is issued by the storage system. For an APD (all paths down) this is a bit different. A PDL more or less indicates that the storage device does not expect the device to return any time soon. An APD is more of an unknown situation, it may return… it may not… and no clue how long it takes. With vSphere 5.1 some changes were introduced to the way APD is handled by the hypervisor in this mechanism is leveraged by HA to allow for a response. (Cormac wrote an excellent post about this APD handling here.) When an APD occurs a timer starts. After 140 seconds the APD is declared and the device is marked as APD time out. When the 140 seconds has passed HA will start counting. The HA time out is 3 minutes. When the 3 minutes has passed HA can restart the virtual machine, but you can configure VMCP to respond differently if you want it to. You could for instance specify that events are issued that a PDL or APD has occurred. You can also specify how aggressively HA needs to try to restart VMs that are impacted by an APD. Note that aggressive / conservative refers to the likelihood of HA being able to restart VMs. When set to “conservative” HA will only restart the VM that is impacted by the APD if it knows another host can restart it. In the case of “aggressive” HA will try to restart the VM even if it doesn’t know the state of the other hosts, which could lead to a situation where your VM is not restarted as there is no host that has access to the datastore the VM is located on. It is also good to know that if the APD is lifted and access to the storage is restored during the total of roughly 5 minutes and 20 seconds it would take to reboot the VM, that HA will not do anything unless you explicitly configure it do so. This is where the “Response for APD recovery after APD timeout” comes in to play.

HA in vSphere 6.0

Increased scale is pretty straight forward, from 32 to 64 hosts and a total of 8000 VMs per cluster. I don’t know too many customers hitting this boundaries but I do come across a request like this occasionally. So if you want to grow your cluster, you can now do so. Do note that you may hit other limits like the LUN limit or the VM limit or…

Registration of HA Disabled VMs after a failure is a feature I have requested a long time ago. I am glad to see this made it in to the release. Basically when you have HA disabled on a specific VM this feature will make sure that the VM gets registered on another host after a failure. This will allow you to easily power-on that VM when needed without needed to manually re-register it yourself. Note, HA will not do a power-on of the VM but it will just register it for you.

That was it for now…