It is VMworld, and of course there are many announcements being doing one of which is Virtual SAN 6.1 which will come as part of vSphere 6.0 Update 1. Many new features have been added, but there are a couple which stand out if you ask me. In this post I am going to talk about what are in my opinion the key new features. Lets list them first and then discuss some of them individually.
- Support for stretched clustering
- Support for 2 node ROBO configurations
- Enhanced Replication
- Support for SMP-FT
- New hardware options
- Intel NVMe
- Diablo Ultra Dimm
- Usability enhancements
- Disk Group Bulk Claiming
- Disk Claiming per Tier
- On-Disk Format Upgrade from UI
- Health Check Plug-in shipped with vCenter Server
- Virtual SAN Management Pack for VR Ops
When explaining the Virtual SAN architecture and concepts there is always one question that comes up, what about stretched clustering? I guess the key reason for it being the way Virtual SAN distributes objects across multiple hosts for availability reasons and people can easily see how that would work with datacenters. With Virtual SAN 6.1 we now fully supported stretched clustering. But what does that mean, what does that look like?
As you can see in the diagram above it starts with 3 failure domains, two of which will be “hosting data” and one of which will be a “witness site”. All of this is based on the Failure Domains technology that was introduced with 6.0, and those who have used it now how easy it is. Of course there are requirements when it comes to deploying in a stretched fashion and the key requirements for Virtual SAN are:
- 5ms RTT latency max between data sites
- 200ms RTT latency at most from data sites to witness site
Worth noting from a networking point of view is that from the data sites to the witness site there is no requirement for multicast routing and it can be across L3. On top of that the Witness can be nested ESXi, so no need to dedicate a full physical host just for witness purposes. Of course the data sites can also connect to each other over L3 if that is desired, but personally I suspect that VSAN over L2 will be a more common deployment and it is also what I would recommend. Note that between the data sites there is still a requirement for multicast.
When it comes to deploying virtual machines on a stretched cluster not much has changed. Deploy a VM, and VSAN will ensure that there is 1 copy of your data in Fault Domain A and one copy in Fault Domain B with your witness in Fault Domain C. Makes sense right? If one of the data sites fails then the other can take over. If the VM is impacted by a site failure then HA can take action… It is no rocket science and dead simple to set up. I will have a follow up post with some more specifics in a couple of weeks
Besides stretched clustering Virtual SAN 6.1 also brings a 2 node ROBO option. This is based on the same technique as the stretched clustering feature. It basically allows you to have 2 nodes in your ROBO location and a witness in a central location. The max latency (RTT) in this scenario is 500ms RTT, which should accommodate for almost every ROBO deployment out there. Considering the low number of VMs typically in these scenarios you are usually okay as well with 1GbE networking in the ROBO location, which further reduces the cost.
When it comes to disaster recovery work has also been done to reduce the recovery point objective (RPO) for vSphere Replication. By default this is 15 minutes, but for Virtual SAN this has now been certified for 5 minutes. Just imagine combining this with a stretched cluster, that would be a great disaster avoidance and disaster recovery solution. Sync replication between active sites and then async to where ever it needs to go.
But that is not it in terms of availability, support for SMP FT has also been added. I never expected this to be honest, but I have had many customers asking for this in the past 12 months. Other common requests I have seen is the support of these super fast flash devices like Intel NVMe and Diablo Ultra Dimm, and 6.1 delivers exactly that.
Another big focus in this release has been usability and operations. Many enhancements have been done to make life easier. I like the fact that the Health Check plugin is now included with vCenter Server and you can do things like upgrading the on-disk format straight from the UI. And of course there is the VR Ops Management Pack, which will enrich your VR Ops installation with all the details you ever need about Virtual SAN. Very very useful!
All of this making Virtual SAN 6.1 definitely a release to check out!
Jon Retting says
Very cool! Too many cool things to comment on in fact. Super excited for 6.1 and the beta in your latest post. I am literately salivating over all this. Thanks
Soooo… This means that vSAN can finally be used for vMSC?
This kicks ass!
Yes, we love stretched cluster. That’s THE feature to complete this product. Now, we can migrate from VSA to VSAN. Finally!!!
P. Cruiser says
I don’t know.. I feel like over a year has been lost just trying to catch up to VSA’s features and maturity. Also, that witness node requirement per ROBO instance is not going to be pretty if you have a lot of small offices.
We have only one VSA cluster with 2 nodes dedicated for DMZ.
I think that in small offices you can’t use streched cluster because there are lot of requirements. Personnally, we don’t have 2 DCs by small office. So in this case, I prefer use VSAN without the ROBO option.
But today, we just have one node by small office.
@Duncan What is the minimum nodes by fault domain: two or three?
Is there a release date targeted yet for vSphere 6.0 Update 1 / VSAN6.1?
Is there a reason you can’t do ‘FTT=2’ and set it up so that you have a redundant copy in the ‘preferred’ data center and then another copy over at the ‘DR data center’? To me, this would be much more preferable than a single host failure causing you to automatically power up in another data center (or even just routine maintenance). Is this being worked on?
I believe the Health Check Plug-in (shipped with vCenter Server 6.1) licencing model is a bit complicated. VMware says that it is offered for free but the again DRS is needed to install it which is an enterprise feature.
I believe the Health Check Plug-in should be available for all (standard) clients not only enterprise and above.
Additionally while VMware says the below :
Question: For a 2-node deployment model where the Witness VM is required, does that now require an advanced license?
VMware’s Answer: No, a 2-node clusters with the witness VM can be deployed with any type of license (ROBO, Standard or Advanced).
Any cluster larger than 2-nodes requires Advanced licensing to use the witness VM.
While VMware say that Standard licencing should cover STreched cluster (but no more than 2 hosts & 1 witness VM), there is no documentation covering this and also once again the Health check plugin can not be used to verify this installation.
Duncan Epping says
Hi George thanks for your comments.
1) Yes we were aware that DRS was required to install it, that was fixed with a patch release of v1 of the healthcheck. With version 2 being shipped as part of vCenter as well there is no longer even a need to install it. (this version has the fix: https://my.vmware.com/web/vmware/details?downloadGroup=VSANHEALTH600&productId=492)
2) see 1
3) Depends on what it is used for.
– 2 node ROBO: Standard
– Node stretched: Advanced
I agree we have a gap here in documentation / clear guidelines and this have been requested. It is described in our EULA by the way what requires which license.
Boris EY says
If I understand, we can do a 2 nodes stretched cluster with vsan standard license.
Can we play with rack awarness between two datacenter if we have less then 1ms RTT ? so we can use vsan standard license instead of advanced one.
Duncan Epping says
No, any type of “stretched” using the “stretched workflow with the external witness” requires Advanced. When using fault domains for stretched without the witness you can use Standard.