6.0

vMSC and Disk.AutoremoveOnPDL on vSphere 6.x and higher

Duncan Epping · Mar 21, 2016 ·

I have discussed this topic a couple of times, and want to inform people about a recent change in recommendation. In the past when deploying a stretched cluster (vMSC) it was recommended by most storage vendors and by VMware to set Disk.AutoremoveOnPDL to 0. This basically disabled the feature that automatically removes LUNs which are in a PDL (permanent device loss) state. Upon return of the device a rescan would then allow you to use the device again. With vSphere 6.0 however there has been a change to how vSphere responds to a PDL scenario, vSphere does not expect the device to return. To be clear, the PDL behaviour in vSphere was designed around the removal of devices, they should not stay in the PDL state and return for duty, this did work however in previous version due to a bug.

With vSphere 6.0 and higher VMware recommends to set Disk.AutoremoveOnPDL to 1, which is the default setting. If you are a vMSC / stretched cluster customer, please change your environment and design accordingly. But before you do, please consult your storage vendor and discuss the change. I would also like to recommend testing the change and behaviour to validate that the environment returns for duty correctly after a PDL! Sorry about the confusion.

KB article backing my recommendation was just posted: https://kb.vmware.com/kb/2059622. Documentation (vMSC whitepaper) is also being updated.

Rebuilding failed disk in VSAN versus legacy storage

Duncan Epping · Jan 26, 2016 ·

This is one of those questions that comes up every now and then, I have written about this before, but it never hurts to repeat some of it. The comment I got was around rebuild time of failed drives in VSAN, surely it takes longer than with a “legacy” storage system. The answer of course is: it depends (on many factors).

But what does it depend on? Well it depends on what exactly we are talking about, but in general I think the following applies:

With VSAN components (copies of objects, in other words copies of data) are placed across multiple hosts, multiple diskgroups and multiple disks. Basically if you have a cluster of lets say 8 hosts with 7 disks each and you have 200 VMs then the data of those 200 VMs will be spread across 8 hosts and 56 disks in total. If one of those 56 disks happens to fail then the data that was stored on that disk would need to be reprotected. That data is coming from the other 7 hosts which is potentially 49 disks in total. You may ask, why not 55 disks? Well because replica copies are never stored on the same hosts for resiliency purposes, look at the diagram below where a single object is split in to 2 data components and a witness, they are all located on different hosts!

We do not “mirror” disks, we mirror the data itself, and the data can and will be place anywhere. This means that when a failure has occurred of a disk within a diskgroup on a host all remaining disk groups / disk / hosts will be helping to rebuild the impacted data, which is 49 disks potentially. Note that not only will disks and hosts containing impacted objects help rebuilding the data, all 8 hosts and 55 disks will be able to receive the replica data!

Now compare this to a RAID set with a spare disk. In the case of a spare disk you have 1 disk which is receiving all the data that is being rebuild. That single disk can only take an X number of IOPS. Lets say it is a really fast disk and it can take 200 IOPS. Compare that to VSAN… Lets say you used really slow disks which only do 75 IOPS… Still that is (potentially) 49 disks x 75 IOPS for reads and 55 disks for writes.

That is the major difference, we don’t have a single drive as a designated hot spare (or should I say bottleneck?), we have the whole cluster as a hot spare! As such rebuild times when using similar drives should always be faster with VSAN compared to traditional storage.

Jumbo Frames and VSAN Stretched Cluster configurations

Duncan Epping · Dec 22, 2015 ·

I received a question last week from a customer who had implemented a stretched VSAN cluster. The Health Check after the implementation indicated that there was an “issue” with the MTU configuration. The customer had explained that he had configured an MTU of 9000 between the two data sites and an MTU of (default) 1500 between data sites and the witness.

The question of course was, why the Health Check indicated there was an issue. The problem here is that witness traffic and data in todays version of Virtual SAN use the same VMkernel interface. If the VSAN VMkernel interface on the the “data” site is configured for 9000 and one the “witness” site is configured for 1500 then there is a mismatch which causes fragmentation etc. This is what the health check calls out. VSAN (and the health check as such) expects an “end-to-end” consistently configured MTU, even in a stretched environment.

VSAN enabling Sky to be fast / responsive / agile…

Duncan Epping · Nov 30, 2015 ·

Over the last couple of months I’ve been talking to a lot of VSAN customers. A while ago I had a very interesting use case with a deployment on an Oil Platform. This time it is a more traditional deployment: I had the pleasure of talking to James Cruickshank who works for Sky. Sky is Europe’s leading entertainment company, serving 21 million customers across five countries: Italy, Germany, Austria, the UK and Ireland.

James is part of Sky’s virtualisation group which primarily focusses on new technologies. In short, the team figures out if a technology will benefit Sky, how it works, how to implement it and how to support it. He documents all of his findings then develops and delivers the solution to the operations team.

One of the new products that James is working with is Virtual SAN. The project started in March and Sky has a handful of VSAN ready clusters in each of its strategic data centres. These clusters currently have ESXi 5.5 hosts with one 400GB SSD and 4 x 4TB NL-SAS drives all connected over 10GbE, a significant amount of capacity per host. The main reason for that is that there is a requirement for Sky to run with FTT=2 (for those who don’t know, this means that a 1TB disk will consume ~3TB). James anticipates VSAN 6 will be deployed with a view to deliver production workloads in Q1 2016.

We started talking about the workloads Sky had running and what some of the challenges were for James. I figured that, considering the size of the organisation and the number of workloads it has, getting all the details must not have been easy. James confirmed that it was difficult to get an understanding of the IO profile and that he spent a lot of time developing representative workloads. James mentioned that when he started his trial the VSAN Assessment Tool wasn’t available yet, and that it would have saved him a lot of time.

So what is Sky running? For now mainly test/dev workloads. These clusters are used by developers for short term usage, to test what they are building and trash the environment, all of which is enabled through vRealize Automation. Request a VM or multiple, deploy on VSAN cluster and done. So far in Sky’s deployment all key stakeholders are pleased with the technology as it is fast and responsive, and for the ops team in particular it is very easy to manage.

James mentioned that recently he has been testing both VSAN 5.5 and 6.0. He was so surprised about the performance increase that he re-ran his test multiple times, then had his colleagues do the same, while others reviewed the maths and the testing methodology. Each time they came to the same conclusion; there was an increase in excess of 60% performance between 5.5 and 6.0 (using a “real-world” IO profile), an amazing result.

Last question for me was around some of the challenges James faced. The first thing he said was that he felt the technology was fantastic. There were new considerations around the design/sizing of their VSAN hosts, the increased dependency on TCP/IP networks and the additional responsibilities for storage placed within the virtualisation operations team. There were also some minor technical challenges, but these were primarily from an operational perspective, and with vSphere / VSAN 5.5. In some cases he had to use RVC, which is a great tool, but as it is CLI based it does have a steep learning curve. The HealthCheck plugin has definitely helped a lot with 6.0 to improve this.

Another thing James wanted to call out is that in the current VSAN host design Sky uses an SSD to boot ESXi, as VSAN hosts with more than 512GB RAM cannot boot from SD card. This means the company is sacrificing a disk slot which could have been used for capacity, when it would prefer to use SD for boot if possible to optimise hardware config.

I guess it is safe to say that Sky is pleased with VSAN and in future the company is planning on adopting a “VSAN first” policy for a proportion of their virtual estate. I want to thank Sky, and James in particular, for taking the time to talk to me about his experience with VSAN. It is great to get direct feedback and hear the positive stories from such a large company, and such an experienced engineer.

Virtual SAN: Generic storage platform of the future

Duncan Epping · Nov 20, 2015 ·

Over the last couple of weeks I have been presenting at various events on the topic of Virtual SAN. One of the sections in my deck is a bit about the future of Virtual SAN and where it is heading towards. Someone tweeted one of the diagrams in my slides recently which got picked up by Christian Mohn who provided his thoughts on the diagram and what it may mean for the future. I figured I would share my story behind this slide, which is actually a new version of a slide that was originally presented by Christos and also discussed in one of his blog posts. First, lets start with the diagram:

If you look at VSAN today and ask people what VSAN is today then most will answer: a “virtual machine” storage system. But VSAN to me is much more than that. VSAN is a generic object storage platform, which today is used to primarily store virtual machines. But these objects can be anything if you ask me, and on top of that can be presented as anything.

So what is it VMware is working towards, what is our vision? VSAN was designed to serve as a generic object storage platform from the start, and is being extended to serve as a platform to different types of data by providing an abstraction layer. In the diagram you see “REST” and “FILE” and things like Mesos and Docker, it isn’t difficult to imagine what types of workloads we envision to run on top of VSAN and what types of access you have to resources managed by VSAN. This could be through a native Rest API that is part of the platform which can be used by developers directly to store their objects on or through the use of a specific driver for direct “block” access for instance.

Combine that with the prototype of the distributed filesystem which was demonstrated at VMworld and I think it is fair to say that the possibilities are endless. VSAN isn’t just a storage system for virtual machines, it is a generic object based storage platform which leverages local resources and will be able to share those in a clustered fashion in any shape or form in the future. Christian definitely had a point, in which shape or form all of this will be delivered has to be seen though, this is not something I can (or want) to speculate on. Whether that is through Photon Platform, or something else is in my opinion besides the point. Even today VSAN has no dependencies on vCenter Server and can be fully configured, managed and monitoring using the APIs and/or the different command-line interface options we offer. Agility and choice have always been the key design principles for the platform.

Where things will go exactly and when this will happen is still to be seen. But if you ask me, exciting times are ahead for sure, and I can’t wait to see how everything plays out.