Last week I presented at a couple of VMUGs and at those VMUGs all whole bunch of sessions were recorded. I receive a lot of requests to speak at VMUGs and although I try to attend many of them, there is still quite a few I have to decline unfortunately. Whenever I visit a VMUG I try to attend various sessions just to get a better understanding of what it is our partners offer, how our customers use our products and what type of questions are raised. Below you can find a couple of the sessions (including my own) which I enjoyed and recommend watching. I understand that it is difficult to find a block of 5 hrs to watch these, but I would like to urge to do so as they will prepare you for what is coming in the future.
Last week I presented at the UK VMUG, Nordic VMUG and VMUG Belgium. My topic was vSphere futures… I figured I would share the deck publicly. The deck is based on this blog post and essentially is a collection of what has been revealed at last VMworld. Considering the number of announcements I am guessing that this deck is a nice summary of what is coming, feel free to use it / share it / comment etc.
Once again, I would like to thank the folks of the VMUG organizations throughout EMEA for inviting me, three great events last week with very passionate people. One thing I want to call out in particular that struck me last week: Erik from the VMUG in Belgium has created this charity program where he asks sponsors (and attendees) to contribute to charity. Last event he collected over 8000 euros which went to a local charity, it was the biggest donation that this particular charity received in a long time and you can imagine they were very thankful… all of this while keeping the event free for attendees, great work Erik! Thanks for giving back to the community in various ways.
See you next time.
In the last couple of weeks I had 3 different instances of people encountering weird behaviour in their environment. In two cases it was a VSAN environment and the other one was an environment using a flash caching solution. What all three had in common is that when they were driving a lot of IO the SSD device would be unavailable, for one of them they even had challenges enabling VSAN in the first place before any IO load was placed on it.
With the first customer it took me a while what was going on. I asked him the standard questions:
- Which disk controller are you using?
- Which flash device are you using?
- Which disks are you using?
- Do you have 10GbE networking?
- Are they on the HCL?
- What is the queue depth of the devices?
All the answers were “positive”, meaning that the full environment was supported… the queue depth was 600 so that was fine, enterprise grade MLC devices used and even the HDDs were on the HCL. So what was causing their problems? I asked them to show me the Web Client and the disk devices and the flash devices, then I noticed that the flash devices were connected to a different disk controller. The HDDs (SAS drives) were connected to the disk controllers which was on the HCL, a highly performant and reliable device… The flash device however was connected to the on-board shallow queue depth and non-certified controller. Yes indeed, the infamous AHCI disk controller. When I pointed it out the customers were shocked ,”why on earth would the vendor do that…”, well to be honest if you look at it: SAS drives were connected to the SAS controller and the SATA flash device was connected to the SATA disk controller, from that perspective it makes sense right? And in the end, the OEM doesn’t really know what your plans are with it when you configure your own box right? So before you install anything, open it up and make sure that everything is connected properly in a fully supported way! (PS: or simply go VSAN Ready Node / EVO:RAIL :-))
Someone at out internal field conference asked me a question around why doing a full back up of a virtual machine on a VSAN datastore is slower then when doing the same exercise for that virtual machine on a traditional storage array. Note that the test that was conducted here was done with a single virtual machine. The best way to explain why this is is by taking a look at the architecture of VSAN. First, let me mention that the full backup of the VM on a traditional array was done on a storage system that had many disks backing the datastore on which the virtual machine was located.
Virtual SAN, as hopefully all of you know, creates a shared datastore out of host local resources. This datastore is formed out of disk and flash. Another thing to understand is that Virtual SAN is an object store. Each object typically is stored in a resilient fashion and as such on two hosts, hence 3 hosts is the minimum. Now, by default the component of an object is not striped which means that components are stored in most cases on a single physical spindle, for an object this means that as you can see in the diagram below that the disk (object) has two components and without stripes is stored on 2 physical disks.
Now lets get back to the original question. Why did the backup on VSAN take longer then with a traditional storage system? It is fairly simple to explain looking at the above info. In the case of the traditional storage array you are reading from multiple disks (10+) but with VSAN you are only reading from 2 disks. As you can imagine when reading from disk performance / throughput results will differ depending on the number of resources the total number of disks it is reading from can provide. In this test, as there it is just a single virtual machine being backed up, the VSAN result will be different as it has a lower number of disks (resources) to its disposal and on top of that is the VM is new there is no data cached so the flash layer is not used. Now, depending on your workload you can of course decide to stripe the components, but also… when it comes to backup you can also decided to increase the number of concurrent backups… if you increase the number of concurrent backups then the results will get closer as more disks are being leveraged across all VMs. I hope that helps explaining why results can be different, but hopefully everyone understands that when you test things like this that parallelism is important or provide the right level of stripe width.
I have been receiving various questions around support for non-uniform configurations in VSAN environments (sometimes also referred to as “unbalanced” configurations) . I was a bit surprised by it to be honest as personally I am not a big fan of non-uniform configurations to begin with. First, with “non-uniform” I am referring to different hardware configurations. In other words you have four hosts with 400GB Intel s3700 flash and one host with 200GB Intel s3500 flash. The question was if this is an acceptable configuration if the overall flash capacity still meets the recommended practice of 10% of used capacity.
Although technically speaking this configuration will work and is supported, from an operational and user experience perspective you need to ask yourself if this is a desired scenario. I have seen people doing these type of constructions out in the field as well with “flash caching” solutions and believe me when I say that the result were very mixed. The problem is that when you have a non-uniform configuration your predictability of performance will be impacted. As you can imagine cutting your flash capacity in half on a host could impact the cache hit ratio for that particular host. Also using a different type of flash will change your results / experience more then likely. On top of that, imagine you need to do maintenance on your hosts, it could be that the “non-uniform” host will have different procedures for whatever maintenance you are doing… it just complicates things unnecessarily.
So again, although this is supported and will work from a technical perspective it is not something I would recommend from an operational and user experience point of view.