vSphere

HA/DRS configuration with Virtual SAN Stretched Cluster environment

Duncan Epping · Sep 9, 2015 ·

This question is going to come sooner or later, how do I configure HA/DRS when I am running a Virtual SAN Stretched cluster configuration. I described some of the basics of Virtual SAN stretched clustering in a what’s new for 6.1 post already, if you haven’t read it then I urge you to do so first. There are a couple of key things to know, first of all the latency between data sites that can be tolerated is 5ms and to the witness location ~100ms.

If you look at the picture you below you can imagine that when a VM sits in Fault Domain A and is reading from Fault Domain B that it could incur a latency of 5ms for each read IO. From a performance perspective we would like to avoid this 5ms latency, so for stretched clusters we introduce the concept of read locality. We don’t have this in a non-stretched environment, as there the latency is microseconds and not miliseconds. Now this “read locality” is something we need to take in to consideration when we configure HA and DRS.

NexentaConnect for VSAN for free? Get it now!

Duncan Epping · Sep 8, 2015 ·

I was at VMworld last week and bumped in to the Nexenta team. They told me about this great promotion they are running (limited time!) for NexentaConnect. NexentaConnect provides you NFS storage on top of VSAN, Cormac wrote an article about it a while back which I suggest reading. The promotion gives you NexentaConnect for $ 0,- and that includes:

NexentaConnect for VMware Virtual SAN license (s)
Unlimited raw storage capacity to match VMware VSAN Licenses
Nexenta technical support for first 12 months included

Now I have seen many companies giving away software for free, but can’t recall having seen anyone include free support for 12 months. If you are a Virtual SAN user, or about to deploy/implement Virtual SAN, and looking to include some form of file services on top of it, then NexentaConnect may just be what you are looking for. I definitely recommend giving it a try, it is free and comes with support… what more can you ask for?

Have fun!

VMworld Session: VSAN – Software Defined Storage Platform of the Future #STO6050

Duncan Epping · Sep 3, 2015 ·

Unfortunately I haven’t been able to attend too many sessions, only 2 so far. This is one I didn’t want to miss as it was all about what VMware is working on for VSAN and layers that could sit on top of VSAN. Rawlinson and Christos spoke about where VSAN is today first. Mainly discussion the use cases (monolithic apps like Exchange, SQL etc.) and the simplicity VSAN brings. After which an explanation of the VSAN object/component model was provided which was the lead in to the future.

We are in the middle of an evolution towards cloud native applications Christos said. Cloud native apps scale in a different way then traditional apps, and their requirements differ. Usually not a need for HA and DRS, and will contain this functionality within their own framework. What does this result in for the vSphere layer?

VMware vSphere Integrated Containers and VMware Photon Platform enabled these new types of applications. But how do we enable these from a storage point of view? What kind of scale will we require? Will we need different data services? Will we need to different tools, what about performance?

First project being discussed is the Performance Service which will come as part of the Health Check plugin. Providing cluster level, host level, disk group level, disk level… The Performance Service Architecture is very interesting and is not a “standard vCenter Server service”. Providing deep insights using per host traces is not possible as it would not scale. A distributed model is proposed which will enable this, but in a decentralized way. Each host can collect data, each cluster can roll this up, and this can be done for many clusters. Data is both processed and stored in a distributed fashion. The cost for a solution like this should be around 10% of 1 core on a server. Just think what a vCenter Server would look like if you had the same type of scale and cost, with a 1000 host solution could easily result in a 100 vCPU requirement, which is not realistic.

Rawlinson demoes a potential solution for this, in this scenario we are talking 1000s of hosts of which data is gathered, analyzed and presented in what appears to be an HTML-5 interface. The solution doesn’t just provides details on the environment it also allows you to mitigate these problems. Note that this is a prototype of an interface that may or may not at some point in time be released. If you like what you see though, make sure to leave a comment as I am sure that helps making this prototype reality!

Next being discussed is the potential to leverage VSAN not just for virtual machines, but also for containers, having the capabilities to store files on top of VSAN. A distributed file system for cloud native apps is now introduced. Some of the requirements for a distributed file system would be a scalable data path, clones at massive scale, multi-tenancy and multi-purpose.

VMware is also prototyping a distributed file system and have it running in their labs. It sits on top of VSAN and leverages that scalable path and uses it to store its data and metadata. Rawlinson demonstrates how he can create 2000 clones of a file in under a second across a 1000 host and runs his application. Note that this application isn’t copied to those 1000 hosts, but it is a simple mountpoint on 1000 hosts, truly distributed filesystem with extremely scalable clone and snapshot technology.

Christos wraps up, key points are that VSAN will be the enabler of future storage solutions as it provides extreme scale, with at a low resource overhead. Awesome session, great peak in to the future.

VMworld 2015: Site Recovery Manager 6.1 announced

Duncan Epping · Sep 1, 2015 ·

This week Site Recovery Manager 6.1 was announced. There are many enhancements in SRM 6.1 like the integration with NSX for instance and policy driven protection, but personally I feel that support for stretched storage is huge. When I say stretched storage I am referring to solutions like EMC VPLEX, Hitachi Virtual Storage Platform and IBM San Volume Controller(etc). In the past, and you can still today, when you had these solutions deployed you would have a single vCenter Server with a single cluster and moved VMs around manually when needed, or let HA take care of restarts in failure scenarios.

As of SRM 6.1 running these types of stretched configurations is now also supported. So how does that work, what does it allow you to do, and what does it look like? Well in contrary to a vSphere Metro Storage Cluster solution with SRM 6.1 you will be using two vCenter Server instances. These two vCenter Server instances will have an SRM server attached to it which will use a storage replication adaptor to communicate to the array.

But why would you want this? Why not just stretch the compute cluster also? Many have deployed these stretched configurations for disaster avoidance purposes. The problem is however that there is no form of orchestration whatsoever. This means that all workloads will come up typically in a random fashion. In some cases the application knows how to recover from situations like that, in most cases it does not… Leaving you with a lot of work, as after a failure you will now need to restart services, or VMs, in the right order. This is where SRM comes in, this is the strength of SRM, orchestration.

Besides doing orchestration of a full failover, what SRM can also do in the 6.1 release is evacuate a datacenter using vMotion in an orchestrated / automated way. If there is a disaster about to happen, you can now use the SRM interface to move virtual machines from one datacenter to another, with just a couple of clicks, planned migration is what it is called as can be seen in the screenshot above.

Personally I think this is a great step forward for stretched storage and SRM, very excited about this release!

What is new for Virtual SAN 6.1?

Duncan Epping · Aug 31, 2015 ·

It is VMworld, and of course there are many announcements being doing one of which is Virtual SAN 6.1 which will come as part of vSphere 6.0 Update 1. Many new features have been added, but there are a couple which stand out if you ask me. In this post I am going to talk about what are in my opinion the key new features. Lets list them first and then discuss some of them individually.

Support for stretched clustering
Support for 2 node ROBO configurations
Enhanced Replication
Support for SMP-FT
New hardware options
- Intel NVMe
- Diablo Ultra Dimm
Usability enhancements
- Disk Group Bulk Claiming
- Disk Claiming per Tier
- On-Disk Format Upgrade from UI
Health Check Plug-in shipped with vCenter Server
Virtual SAN Management Pack for VR Ops

When explaining the Virtual SAN architecture and concepts there is always one question that comes up, what about stretched clustering? I guess the key reason for it being the way Virtual SAN distributes objects across multiple hosts for availability reasons and people can easily see how that would work with datacenters. With Virtual SAN 6.1 we now fully supported stretched clustering. But what does that mean, what does that look like?

As you can see in the diagram above it starts with 3 failure domains, two of which will be “hosting data” and one of which will be a “witness site”. All of this is based on the Failure Domains technology that was introduced with 6.0, and those who have used it now how easy it is. Of course there are requirements when it comes to deploying in a stretched fashion and the key requirements for Virtual SAN are:

5ms RTT latency max between data sites
200ms RTT latency at most from data sites to witness site

Worth noting from a networking point of view is that from the data sites to the witness site there is no requirement for multicast routing and it can be across L3. On top of that the Witness can be nested ESXi, so no need to dedicate a full physical host just for witness purposes. Of course the data sites can also connect to each other over L3 if that is desired, but personally I suspect that VSAN over L2 will be a more common deployment and it is also what I would recommend. Note that between the data sites there is still a requirement for multicast.

When it comes to deploying virtual machines on a stretched cluster not much has changed. Deploy a VM, and VSAN will ensure that there is 1 copy of your data in Fault Domain A and one copy in Fault Domain B with your witness in Fault Domain C. Makes sense right? If one of the data sites fails then the other can take over. If the VM is impacted by a site failure then HA can take action… It is no rocket science and dead simple to set up. I will have a follow up post with some more specifics in a couple of weeks

Besides stretched clustering Virtual SAN 6.1 also brings a 2 node ROBO option. This is based on the same technique as the stretched clustering feature. It basically allows you to have 2 nodes in your ROBO location and a witness in a central location. The max latency (RTT) in this scenario is 500ms RTT, which should accommodate for almost every ROBO deployment out there. Considering the low number of VMs typically in these scenarios you are usually okay as well with 1GbE networking in the ROBO location, which further reduces the cost.

When it comes to disaster recovery work has also been done to reduce the recovery point objective (RPO) for vSphere Replication. By default this is 15 minutes, but for Virtual SAN this has now been certified for 5 minutes. Just imagine combining this with a stretched cluster, that would be a great disaster avoidance and disaster recovery solution. Sync replication between active sites and then async to where ever it needs to go.

But that is not it in terms of availability, support for SMP FT has also been added. I never expected this to be honest, but I have had many customers asking for this in the past 12 months. Other common requests I have seen is the support of these super fast flash devices like Intel NVMe and Diablo Ultra Dimm, and 6.1 delivers exactly that.

Another big focus in this release has been usability and operations. Many enhancements have been done to make life easier. I like the fact that the Health Check plugin is now included with vCenter Server and you can do things like upgrading the on-disk format straight from the UI. And of course there is the VR Ops Management Pack, which will enrich your VR Ops installation with all the details you ever need about Virtual SAN. Very very useful!

All of this making Virtual SAN 6.1 definitely a release to check out!