virtual san

Looking for the VROps VSAN Content Pack?

Duncan Epping · Jun 7, 2016 ·

I just noticed the link for the VROps VSAN Content Pack had changed and it isn’t easy to find through google either. Figured I would post it quickly so at least it is indexed and easier to find. Also including the LogInsight VSAN content pack link for your convenience:

LogInsight VSAN content pack: https://solutionexchange.vmware.com/store/products/vmware-vsan
VROps VSAN content pack: https://solutionexchange.vmware.com/store/products/vrealize-operations-management-pack-for-storage-devices

Kindle version of Essential Virtual SAN (6.2) available now!

Duncan Epping · Jun 3, 2016 ·

Just noticed the Kindle version of Essential Virtual SAN (6.2), second edition, is available now! We decided to go for an “e-book” first model to get it to market as quick as possible. I hope you will enjoy the book as much as Cormac and I did writing it. Pick it up!

Fully updated for the newest versions of VMware Virtual SAN, this guide show how to scale VMware’s fully distributed storage architecture to meet any enterprise storage requirement. World-class Virtual SAN experts Cormac Hogan and Duncan Epping thoroughly explain how Virtual SAN integrates into vSphere 6.x and enables the Software Defined Data Center (SDDC). You’ll learn how to take full advantage of Virtual SAN, and get up-to-the-minute insider guidance for architecture, implementation, and management.

If you want to order the paper version at a local book store, here are the ISBN details, or just go to Amazon and pre-order it of course.

ISBN-13: 978-0134511665
ISBN-10: 0134511662

VSAN everywhere with Computacenter

Duncan Epping · Jun 1, 2016 ·

This week I had the pleasure to have a chat with Marc Huppert (VCDX181). Marc works for Computacenter in Germany as a Senior Consultant and Category Leader VMware Solutions. He primarily focuses on datacenter technology. I noticed a tweet from Marc that he was working on a project where they will be implementing a 60 site ROBO deployment. It is one of those use cases that I don’t get to see too often in Europe so I figured I would drop him a note. We had a conversation of about an hour and below is the story around this project, and some of the other projects Marc has worked on.

Marc mentioned that he has been involved with VSAN for about 1.5 years now. They at first did intensive testing internally to see what VSAN could do for their customers and looked at the various use cases for their customer base. Quickly they discovered that when combining VSAN with Fusion-IO they ended up with a very powerful combination. Not only extremely reliable (Marc mentioned he has never seen a Fusion-IO card fail), but also an extremely well performing solution. They did comparisons between Fusion-IO and regular SATA connected SSDs and performance literally doubled, not just for reads but also writes was a big difference. One of the other reasons for considering PCIe based flash is to have the maximum number of disk slots available for the capacity tier. It all makes sense to me. Right now for current projects, NVMe based flash by Intel is being explored, and I am very curious to see what Marc’s experience is going to be like in terms of performance, reliability and the operational aspects compared to Fusion-IO.

Which brings us to the ROBO project, as this is the project where NVMe will be used surprisingly enough. Marc mentioned that this customer, a large company, has over 60 locations all connected to a central (main) datacenter. Each location will be equipped with 2 hosts. Depending on the size of the location and the number of VMs needed a particular VSAN configuration can be selected:

Small – 1 NVMe device + 5 disks, 128GB RAM
Medium – 2 NVMe devices + 10 disks, 256GB RAM
Large – 3 NVMe devices + 15 disks, 384GB RAM

Yes, that also leaves room to grow when desired, as every disk group can go up to 7 disks. From a Compute point of view the configurations do not differ too much besides memory config and disk capacity, actually the CPU is the same, to keep operations simple. In terms of licensing, the vSphere ROBO and VSAN ROBO edition are being leveraged, which provides a great scalable and affordable ROBO infrastructure, especially when coming from a two node configuration with a traditional storage system per location. Not just the price point, but primarily the day 2 management.

When demonstrating VSAN to their customer this is what impressed the customer the most. They have two people managing the entire virtual and physical estate, that is 60 locations (120 nodes) and the main datacenter which houses ~ 5000VMs and many physical machines as a result. You can imagine that they spend a lot of time in vCenter and they prefer to manage things end to end from that same spot, definitely don’t want be switching between different management interfaces. Today they manage many small storage systems for their ROBO locations, and they immediately realised that VSAN in a ROBO configuration would reduce the time they spend managing those locations significantly.

And that is just the first step, next up would be the DMZ. They have a separate compute cluster as it stands right now, but it unfortunately connects back to the same shared storage system as where there production is running. They do fully understand the risk, but never wanted to incur the large cost associated with a storage system dedicated for their DMZ, not just capex but also from an opex point of view. With VSAN the economics change, making a fully isolated and self-contained DMZ compute and storage cluster dead simple to justify, especially when combining it with NSX.

One awesome customer if you ask me, and I am hoping they will become a public VSAN reference at some point in the future as it is a great testimony to what VSAN can do. We briefly discussed other use cases Marc had seen out in the field and Horizon View, Management Clusters and production came up. Which is very similar to what I see. Marc also mentioned that there is a growing interest in all-flash, which is not surprising considering the dollar per GB cost of SAS is very close to flash these days.

Before we wrapped it up, I asked Marc if had any challenges with VSAN itself, what he felt was most complex. Marc mentioned that sizing was a critical aspect and that they have spend a lot of time in the past figuring out which configurations to offer to customers. Today the process they use is fairly straight forward: select Ready Node configuration, change SATA SSD with PCIe based Flash or NVMe, increase or decrease number of disks. Fully supported, yet still flexible enough to meet all the demands of his customers.

Thanks Marc for the great conversation, and looking forward to meeting up with you at VMworld. (PS: Marc has visited all VMworld events so far in both the US and EMEA, a proper old-timer you could say :-))

You can find him on Twitter or on his blog: http://www.vcdx181.com

600GB write buffer limit for VSAN?

Duncan Epping · May 17, 2016 ·

I get this question on a regular basis and it has been explained many many times, I figured I would dedicate a blog to it. Now, Cormac has written a very lengthy blog on the topic and I am not going to repeat it, I will simply point you to the math he has provided around it. I do however want to provide a quick summary:

When you have an all-flash VSAN configuration the current write buffer limit is 600GB. (only for all-flash) As a result many seem to think that when a 800GB device is being used for the write buffer that 200GB will go unused. This simply is not the case. We have a rule of thumb of 10% cache to capacity ratio. This rule of thumb has been developed with both performance and endurance in mind as described by Cormac in the link above. The 200GB that is above the 600GB limit of the write buffer is actively used by the flash device for endurance. Note that an SSD usually is over-provisioned by default, most of them have extra cells for endurance and write performance. Which makes the experience more predictable and at the same time more reliable, the same applies in this case with the Virtual SAN write buffer.

The image at the top right side shows how this works. This SSD has 800GB as advertised capacity. The “write buffer” is limited to 600GB however the white space is considered “dynamic over provisioning” capacity as it will be actively used by the SSD automatically (SSDs do this by default). Then there is an additional x % of over provisioning by default on all SSDs, which in the example is 28% (typical for enterprise grade) and even after that there usually is an extra 7% for garbage collection and other SSD internals. If you want to know more about why this is and how this works, Seagate has a nice blog.

So lets recap, as a consumer/admin the 600GB write buffer limit should not be a concern. Although the write buffer is limited in terms of buffer capacity, the flash cells will not go unused and the rule of thumb as such remains unchanged: 10% cache to capacity ratio. Lets hope this puts this (non) discussion finally to rest.

How HA handles a VSAN Stretched Cluster Site Partition

Duncan Epping · Apr 25, 2016 ·

Over the past couple of weeks I have had some interesting questions from folks about different VSAN Stretched failure scenarios, in particular what happens during a VSAN Stretched Cluster site partition. These questions were in particular about site partitions and how HA and VSAN know which VMs to fail-over and which VMs to power-off. There are a couple of things I like to clarify. First lets start with a diagram that sketches a stretched scenario. In the diagram below you see 3 sites. Two which are “data” sites and one which is used for the “witness”. This is a standard VSAN Stretched configuration.

The typical question now is, what happens when Site 1 is isolated from Site 2 and from the Witness Site? (While the Witness and Site 2 remain connected.) Is the isolation response triggered in Site 1? What happens to the workloads in Site 1? Are the workloads restarted in Site 2? If so, how does Site 2 know that the VMs in Site 1 are powered off? All very valid questions if you ask me, and if you read the vSphere HA deepdive on this website closely and letter for letter you will find all the answers in there, but lets make it a bit easier for those who don’t have the time.

First of all, all the VMs running in Site 1 will be powered off. Let is be clear that this is not done by vSphere HA, this is not the result of an “isolation” as technically the hosts are not isolated but partitioned. The VMs are killed by a VSAN mechanism and they are killed because the VMs have no access to any of the components any longer. (Local components are not accessible as there is no quorum.) You can disable this mechanism by the way, although I discourage you from doing so, through the advanced host settings. Set the advanced host setting called VSAN.AutoTerminateGhostVm to 0.

In the second site a new HA master node will be elected. That master node will validate which VMs are supposed to be powered on, it knows this through the “protectedlist”. The VMs that were on Site 1 will be missing, they are on the list, but not powered on within this partition… As this partition has ownership of the components (quorum) it will now be capable of powering on those VMs.

Finally, how do the hosts in Partition 2 know that the VMs in Partition 1 have been powered off? Well they don’t. However, Partition 2 has quorum (Quorum meaning that is has the majority of the votes / components (2 our of 3) and as such ownership and they do know that this means it is safe to power-on those VMs as the VMs in Partition 1 will be killed by the VSAN mechanism.

I hope that helps. For more details, make sure to read the clustering deepdive, which can be downloaded here for free.