VMworld 2015: Site Recovery Manager 6.1 announced

This week Site Recovery Manager 6.1 was announced. There are many enhancements in SRM 6.1 like the integration with NSX for instance and policy driven protection, but personally I feel that support for stretched storage is huge. When I say stretched storage I am referring to solutions like EMC VPLEX, Hitachi Virtual Storage Platform and IBM San Volume Controller(etc). In the past, and you can still today, when you had these solutions deployed you would have a single vCenter Server with a single cluster and moved VMs around manually when needed, or let HA take care of restarts in failure scenarios.

As of SRM 6.1 running these types of stretched configurations is now also supported. So how does that work, what does it allow you to do, and what does it look like? Well in contrary to a vSphere Metro Storage Cluster solution with SRM 6.1 you will be using two vCenter Server instances. These two vCenter Server instances will have an SRM server attached to it which will use a storage replication adaptor to communicate to the array.

But why would you want this? Why not just stretch the compute cluster also? Many have deployed these stretched configurations for disaster avoidance purposes. The problem is however that there is no form of orchestration whatsoever. This means that all workloads will come up typically in a random fashion. In some cases the application knows how to recover from situations like that, in most cases it does not… Leaving you with a lot of work, as after a failure you will now need to restart services, or VMs, in the right order. This is where SRM comes in, this is the strength of SRM, orchestration.

Besides doing orchestration of a full failover, what SRM can also do in the 6.1 release is evacuate a datacenter using vMotion in an orchestrated / automated way. If there is a disaster about to happen, you can now use the SRM interface to move virtual machines from one datacenter to another, with just a couple of clicks, planned migration is what it is called as can be seen in the screenshot above.

Personally I think this is a great step forward for stretched storage and SRM, very excited about this release!

What is new for Virtual SAN 6.1?

It is VMworld, and of course there are many announcements being doing one of which is Virtual SAN 6.1 which will come as part of vSphere 6.0 Update 1. Many new features have been added, but there are a couple which stand out if you ask me. In this post I am going to talk about what are in my opinion the key new features. Lets list them first and then discuss some of them individually.

  • Support for stretched clustering
  • Support for 2 node ROBO configurations
  • Enhanced Replication
  • Support for SMP-FT
  • New hardware options
    • Intel NVMe
    • Diablo Ultra Dimm
  • Usability enhancements
    • Disk Group Bulk Claiming
    • Disk Claiming per Tier
    • On-Disk Format Upgrade from UI
  • Health Check Plug-in shipped with vCenter Server
  • Virtual SAN Management Pack for VR Ops

When explaining the Virtual SAN architecture and concepts there is always one question that comes up, what about stretched clustering? I guess the key reason for it being the way Virtual SAN distributes objects across multiple hosts for availability reasons and people can easily see how that would work with datacenters. With Virtual SAN 6.1 we now fully supported stretched clustering. But what does that mean, what does that look like?

As you can see in the diagram above it starts with 3 failure domains, two of which will be “hosting data” and one of which will be a “witness site”. All of this is based on the Failure Domains technology that was introduced with 6.0, and those who have used it now how easy it is. Of course there are requirements when it comes to deploying in a stretched fashion and the key requirements for Virtual SAN are:

  • 5ms RTT latency max between data sites
  • 100ms RTT latency at most from data sites to witness site

Worth noting from a networking point of view is that from the data sites to the witness site there is no requirement for multicast routing and it can be across L3. On top of that the Witness can be nested ESXi, so no need to dedicate a full physical host just for witness purposes. Of course the data sites can also connect to each other over L3 if that is desired, but personally I suspect that VSAN over L2 will be a more common deployment and it is also what I would recommend. Note that between the data sites there is still a requirement for multicast.

When it comes to deploying virtual machines on a stretched cluster not much has changed. Deploy a VM, and VSAN will ensure that there is 1 copy of your data in Fault Domain A and one copy in Fault Domain B with your witness in Fault Domain C. Makes sense right? If one of the data sites fails then the other can take over. If the VM is impacted by a site failure then HA can take action… It is no rocket science and dead simple to set up. I will have a follow up post with some more specifics in a couple of weeks

Besides stretched clustering Virtual SAN 6.1 also brings a 2 node ROBO option. This is based on the same technique as the stretched clustering feature. It basically allows you to have 2 nodes in your ROBO location and a witness in a central location. The max latency (RTT) in this scenario is 500ms, which should accommodate for almost every ROBO deployment out there. Considering the low number of VMs typically in these scenarios you are usually okay as well with 1GbE networking in the ROBO location, which further reduces the cost.

When it comes to disaster recovery work has also been done to reduce the recovery point objective (RPO) for vSphere Replication. By default this is 15 minutes, but for Virtual SAN this has now been certified for 5 minutes. Just imagine combining this with a stretched cluster, that would be a great disaster avoidance and disaster recovery solution. Sync replication between active sites and then async to where ever it needs to go.

But that is not it in terms of availability, support for SMP FT has also been added. I never expected this to be honest, but I have had many customers asking for this in the past 12 months. Other common requests I have seen is the support of these super fast flash devices like Intel NVMe and Diablo Ultra Dimm, and 6.1 delivers exactly that.

Another big focus in this release has been usability and operations. Many enhancements have been done to make life easier. I like the fact that the Health Check plugin is now included with vCenter Server and you can do things like upgrading the on-disk format straight from the UI. And of course there is the VR Ops Management Pack, which will enrich your VR Ops installation with all the details you ever need about Virtual SAN. Very very useful!

All of this making Virtual SAN 6.1 definitely a release to check out!

VMworld 2015: vSphere APIs for IO Filtering update

I suspect that the majority of blogs this week will all be about Virtual SAN, Cloud Native Apps and EVO. If you ask me then the vSphere APIs for IO Filtering announcements are just as important. I’ve written about VAIO before, in a way, and it was first released in vSphere 6.0 and opened to a select group of partners. For those who don’t know what it is, lets recap, the vSphere APIs for IO Filtering is a framework which enables VMware partners to develop data services for vSphere in a fully supported fashion. VMware worked closely with EMC and Sandisk during the design and development phase to ensure that VAIO would deliver what partners would require it to deliver.

These data services can be applied to on a VM or VMDK granular level and can be literally anything by simply attaching a policy to your VM or VMDK. In this first official release however you will see two key use cases for VAIO though:

  1. Caching
  2. Replication

The great thing about VAIO if you ask me is that it is an ESXi user space level API, which over time will make it possible for all the various data services providers (like Atlantis, Infinio etc) who now have a “virtual appliance” based solution to move in to ESXi and simplify their customers environment by removing that additional layer. (To be technically accurate, VAIO APIs are all user level APIs, the filters are all running in user space, only a part of the VAIO framework runs inside the kernel itself.) On top of that, as it is implemented on the “right” layer it will be supported for VMFS (FC/iSCSI/FCoE etc), NFS, VVols and VSAN based infrastructures. The below diagram shows where it sits.

VAIO software services are implemented before the IO is directed to any physical device and does not interfere with normal disk IO. In order to use VAIO you will need to use vSphere 6.0 Update 1. On top of that of course you will need to procure a solution from one of the VMware partners who are certified for it, VMware provides the framework – partners provide the data services!

As far as I know the first two to market will be EMC and Sandisk. Other partners who are working on VAIO based solutions and you can expect to see release something are Actifio, Primaryio, Samsung, HGST and more. I am hoping to be able to catch up with one or two of them this week or over the course of the next week so I can discuss it a bit more in detail.

Virtual SAN Ready Nodes taking charge!

Yes that is right, Virtual SAN Ready Nodes are taking charge! As of today when you visit the VMware Compatibility Guide for Virtual SAN it will all revolve around Virtual SAN Ready Nodes instead of individual components. You may ask yourself why that is, well basically because we want to make it easier for you to purchase the hardware needed while removing the complexity of selecting components. This means that if you are a Dell customer and want to run Virtual SAN you can simply select Dell in the VMware Compatibility Guide and then look at the different models there are of the different sizes. It is very easy as can be seen in the screenshot below.

virtual san ready nodes

Traditionally there were 3 different sizes for “Server Virtualization”, but with the full overhaul of the VSAN VCG a new size was added. The naming of the sizing has also changed. Let me explain what it looks like now, note that these “sizing profiles” are the same across all vendors so comparing HP to Dell or IBM (etc) was never easier!

New NameOld Name
HY-2Hybrid Server Low
HY-4** new **
HY-6Hybrid Server Medium
HY-8Hybrid Server High
HY-8Hybrid VDI Linked Clones
Hybrid VDI Full Clones
AF-6All Flash Server Medium
AF-8All Flash Server High
AF VDI Linked Clones
AF VDI Full Clones

The new model introduced is HY-4 Series, the reason this model was introduced is because some customers felt that the price difference between HY-2 and H&-6 was too big. By introducing a model in between we now cover all price ranges. Note that it is still possible when selecting the models to make changes to the configuration. If you want model HY-2 with an additional 2 disks, or with 128GB of memory instead of 32GB then you can simply request this.

So what are we talking about in terms of capacity etc? Of course this is all documented and listed on the VCG as well, but let me share it with you here also for your convenience. Note that performance and VM numbers may be different for your scenario, this of course will depend on your workload and the size of your VMs etc.

ModelCPU / MemStorage CapStorage PerfVMs per node
HY-21 x 6 core / 32GB2TB4000 IOPSUp to 20
HY-42 x 8 core / 128GB4TB10K IOPSUp to 30
HY-62 x 10 core / 256GB8TB20K IOPSUp to 50
HY-82 x 12 core / 348GB12TB40K IOPSUp to 100
AF-62x12 core / 256GB8TB50K IOPSUp to 60
AF-82x12 core / 348GB12TB80K IOPSUp to 120

In my opinion, this new “Ready Node” driven VMware Compatibility Guide driven approach is definitely 10 times easier then focusing on individual components. You pick the ready node that comes close to what you are looking for, provide your OEM with the SKU listed and tell them about any modifications needed in terms of CPU/Mem or Disk Capacity. PS: If you want to access the “old school HCL” then just click on the “Build Your Own based on Certified Components” link on the VCG page.

Tintri announces all-flash storage device and Tintri OS 4.0

Last week I had the pleasure of catching up with Tintri. It has been a while since I spoke with them, but I have been following them from the very start. I met up with them in Mountain View a couple of times when it was just a couple of guys on a rather empty floor with a solution that sounded really promising. Tintri’s big thing is simplicity if you ask me. Super simple to setup, really easy to manage, and providing VM granular controls for about everything you can imagine. The solution comes in the form of a hybrid storage device (disks and flash) which is served up to the hypervisor as an NFS mount.

Today Tintri announces that they will be offering an all-flash system next to their hybrid systems. When talking to Kieran he made it clear that the all-flash system would probably be only for a subset of their customers. The key reason for this being that the hybrid solution already brings great performance and is at a much lower cost of course. The new all-flash model is named VMstore T5000 and comes in two variants: T5060 and T5080. The T5060 can hold up to 2500 VMs and around 36TB with dedupe and compression. For the T5080 that is 5000 VMs and around 73TB. Both delivered in a 2U form factor by the way. The expected use case for the all flash systems is large persistent desktops and multi TB high performance databases. Key thing here is of course not jus the number of IOPS it can drive, but the consistent low latency it can deliver.

Besides the hardware, there is also a software refresh. Tintri OS 4.0 and Global Center 2.1 are being announced. Tintri OS 4.0 is what is sitting on the VMstore storage systems and Global Center is their central management solution. With the 2.1 release Global Center now supports up to 100.000 VMs. It allows you to centrally manage both Tintri’s hybrid and all-flash systems from one UI and smart things like informing you when a VM is provisioned to the wrong storage system (hybrid but performance wise requires all-flash for instance). Not just inform you, but it also has the ability to migrate the VM from storage system to storage system. Note that during the migration all aspects that were associated with it (QoS, Replication etc) is kept. (Not unlike Storage DRS, but in this case the solution is aware of all that happens on the storage system) What I liked personally about Global Center is the performance views / health views. It is very easy to see what the state of your environment is, where latency is coming from etc. Also, if you need to configure things like QoS, replication or snapshotting for multiple VMs you can do this from the Global Center console by simply grouping them as show in the screenshot below.

Tintri QoS was demoed during the call, and I found this also particularly interesting as it allows you to define QoS on a VM (or VMDK) granular level. When you do things like specifying an IOPS limit it is good to know that Tintri normalizes the IOPS based on the size of the IO. Simply said, all IO of 8KB or lower becomes 1 normalized IOPS, an IO which is 16KB will be 2 normalized IOPS etc. This to ensure fairness in environments (this will be almost every environment) where IO sizes greatly vary. Those whom have ever tried to profile their workloads will know why this is important. What I’ve always like about Tintri is their monitoring things like latency for instance how they split that up in hypervisor, network and storage is very useful. They have done an excellent job again for QoS management.

Last but not least Tintri introduces Tintri VMstack. Basically their converged offering where Compute + Storage + Hypervisor is bundled and delivered as a single stack to customers. It will provide you the choice of storage platform (well needs to be Tintri of course), hypervisor, compute and network infrastructure. It can also include things like OpenStack or the vRealize Suite. Personally I think this is a smart move, but this is something I would have preferred to have seen launched 12-18 months ago. Nevertheless, it is a good move.