I am on a holiday and catching up on some articles I had saved that I still wanted to read. I stumbled on an article about sizing VMFS volumes by Ravi Venkat (Pure Storage) on flash based arrays. I must say that Ravi has a couple of excellent arguments around the operational and architectural simplicity of these new types of arrays and I do strongly believe that indeed it makes the world a lot easier.
IOps requirements, indeed forget about them when you have thousands at your disposal… And indeed your raid penalty also doesn’t really matter anymore, especially as many of these new storage arrays also have new types of raid-levels. Great right?
Yes in most cases this is great news! One thing to watch out for though is the failure domain. Meaning that if you create a large 32TB volume with hundreds of virtual machines the impact would be huge if this volume for whatever reason blows up. Not only the impact of the failure itself but also the RTO aka “Recovery Time Objective” would be substantially longer. Yes the array might be lightning fast, but your will and probably are limited by your backup solution. How long will it take to restore those 32TBs? Have you ever done the math?
It isn’t too complicated to do the math, but I would strongly suggest to test it! When I was an admin we had a clearly defined RTO and RPO. We tested these every once in a while, and even though we were already using tapeless backups, it still took a long time to restore 2TB.
Nevertheless, I do feel that Ravi points out the “hidden value” of these types of storage architectures. Definitely something to take in to account when you are looking for new storage… I am wondering how many of you are already using flash based solutions, and how you do your sizing.
Backup is a side effect for most companies. I think a lot of companies never really tested to restore their major applications or in your case a 32TB volume.
Disclosure IBMer
Even before working for IBM I utilized the IBM XIV for my VMware environments because it also addresses all of the points that Ravi has brought up in his post. A 2TB volume or a 32TB volume both utilize all resources in the XIV, so performance is essentially the same. The choice of LUN size comes down to what works best for the customer environment. I personally like the idea of staying around 2TB per VMFS and utilizing datastore clusters and Storage DRS to simplify management.
Interesting topic indeed. I’m always interested to hear how other people approach sizing their volumes for VM use, and definitely with the introduction of EFD/SSD storage technologies this does shake things up somewhat (ie: less IOPS performance issues, etc). I personally am still a fan of using 2TB volumes as the sweet spot, though of course this may vary in some circumstances.
Loving the new opportunities that EFD/SSD technologies present.
Good conversation guys.
Cheers,
Simon
Shoot! Now where am I supposed to put all those 64TB Hyper-V VHDX files?
I’m not sure this is the right way to look at the Pure Storage array technology. The biggest array is the Dual Controller FA-320 which has 22TB of Raw capacity with a compression ratio up to 5 to 1; so best case scenario you will get 120TB of usable space. Losing a single 64TB LUN would mean you probably lost at least half the disks in the array. A failure this big in the FA-320 would be an array failure rather then multiple disk failures. In that case we would hope the client was running SRM with a redundant datacentre site/array which could be failed over to.
This article is not about Pure Storage itself, but anyway… Filesystems can go corrupt, LUNs can go corrupt or admins can wipe a LUN… Anything can be the cause of a LUN being gone.
Yeah good point. It wouldn’t be the first time that someone with to much power and not enough experience installed the ESXi OS on a datastore housing VMs!
It’d be interesting to consider backup/restore/DR aspects – snapshot and replication technologies exist in pretty much all commercialized storage platforms. I think it’d be imperative to test out snapshot efficiency – taking snapshot of a 64TB LUN and observe how much space/metadata overhead is needed. In terms of replication, I’d think at minimum, a split between production LUN and dev/test LUN would be ideal.
SSD is definitely disruptive technology – it’d be good to hear from folks on how they use SSD in their infrastructure, and how it impacts BCDR strategy.
Duncan,
Thanks much for the article, appreciate the mention. As always you bring up excellent points around BC/DR and RTO/RPO in particular.
Disaster recovery is a very important topic for us and we have innovative technology with RAID-3D, and other resiliency built into our FlashArray. On top of that, there is a lot of innovation that we are investing and developing currently to address BC/DR (can’t say much on that topic). Please stay tuned for that.
Wen, Point taken as well. HIG?
–Ravi
Good day Duncan
Interesting post. I’ve just had my Sunday lunch and a couple of cans of Bud, and felt like replying.
I think when you’re looking at failure domains, it’s all relative. Take as an example the SAN I’ve recently been nurturing, an old CX600 with 90 x 66GB disks; now if someone said lets use it to make a 2TB LUN (No. of disks in a RAID 6 = (2048 / 66) + 2 = 33 disks or there about, and I’m too lazy to check what the max number of disks in a CX600 Raid 6 is,) then we would all have been pretty much aghast, but hey it would probably have worked. Regards the restore – that’s assuming we backed up the whole volume and not just individual entities inside it, and had somewhere to restore to – then on a 2 Gbps fabric (assuming that was our restore speed – a big assumption I know) then it would have taken (2048 * 8 / 2 / 60 / 60) ~= 2.27 hours.
These days, with 10 Gbps networking, and soon to be 100 Gbps networking (and varieties in between) then you’re looking at possibly restoring 1 TB in either 13.65 minutes (for 10 Gbps) or 1.365 minutes (for 100 Gbps). And perhaps you would be asynchronously replicating the volume in some way or another, so you’d just bring up your DR instead.
NetApp have recently developed the Infinite Volume using NFSv3, and Data ONTAP 8.1.1 operating in Cluster-mode, which is not quite infinite but scales up to 20 PB or 2 billion files. Now, I don’t think anyone would try to backup an entire volume of that size, you’d be looking at backing up entities inside it, or mirroring parts of it some way – who knows, I don’t. To an extent you need to trust in the technology (and your sys admins too.)
I am a big fan of the Tintri VMstore solutions where you don’t worry about LUNs, RAIDs, and whatnot, you just present the appliance as a single NFS volume, and the storage has the intelligence to be aware of the existence of the individual VMs on the appliance.
Final word from Ursula Le Guin’s excellent book ‘The Dispossessed,’ – “The duty of an individual is to accept no rule.”
Cheers
Vidad Cosonok
That is assuming your backup solution can actually make use of that bandwidth. In most of the environments I have seen that is the main constraint. Add the fact that there is a lack of logic in T1 and T2 virtual machines and you might end up with a 9r restore for a bunch of VMs.
If an entire volume of whatever size goes corrupt/down or even get’s wiped by some stupid engineer, you would not be looking at your backup solution to get it back online. You would want your array to put back the last good snapshot. And with snapshots I mean that you have bought a copy-on-write solution that doesn’t need volume*2 size offcourse 🙂
IMO Backup is for 2 things: restoring ITEMS or going BACK IN TIME. Not for disaster recovery.
If you can afford snapshots…
hey Ravi! doing well thx – hope to catch up with you @ vmworld this year?
I come from the public cloud space, and this is something we have discussed many times internally and with existing and potential vendors. The issues may not be the sort of issues that enterprises may be facing, but issues to think about when deciding on the size of a LUN/volume.
We tried doing multiple VMs per volume (2TB), but quickly found out as you cycle through VMs (delete and create new ones in their place) you start chewing through space on that volume and no longer take advantage of thin provisioning (as each new VM may use different blocks on the volume than the previous VM, and we couldn’t reclaim this unused space).
As a result our next iteration will have a volume per VM, which will give us the following flexibility:
* Ability to reclaim all storage upon deletion (easily)
* Choose individual backup policies for each volume/VM
* Choose individual performance policies per volume/VM
The downside is:
* Many more iSCSI connections (exponentially increases as the number of VMs and hypervisors increases)
* Automation of volume operations on the SAN (creation, deletion, modification) can be hard