Virtualization networking strategies…

I was asked a question on LinkedIn about the different virtualization networking strategies from a host point of view. The question came from someone who recently had 10GbE infrastructure introduced in to his data center and the way the network originally was architected was with 6 x 1 Gbps carved up in three bundles of 2 x 1Gbps. Three types of traffic use their own pair of NICs: Management, vMotion and VM. 10GbE was added to the current infrastructure and the question which came up was: should I use 10GbE while keeping my 1Gbps links for things like management for instance? The classic model has a nice separation of network traffic right?

Well I guess from a visual point of view the classic model is nice as it provides a lot of clarity around which type of traffic uses which NIC and which physical switch port. However in the end you typically still end up leveraging VLANs and on top of the physical separation you also provide a logical separation. This logical separation is the most important part if you ask me. Especially when you leverages Distributed Switches and Network IO Control you can create a great simple architecture which is fairly easy to maintain and implement both from a physical and virtual point of view, yes from a visual perspective it may be bit more complex but I think the flexibility and simplicity that you get in return definitely outweighs that. I definitely would recommend, in almost all cases, to keep it simple. Converge physically, separate logically.

Recommended viewing: VMUG Sessions

Last week I presented at a couple of VMUGs and at those VMUGs all whole bunch of sessions were recorded. I receive a lot of requests to speak at VMUGs and although I try to attend many of them, there is still quite a few I have to decline unfortunately. Whenever I visit a VMUG I try to attend various sessions just to get a better understanding of what it is our partners offer, how our customers use our products and what type of questions are raised. Below you can find a couple of the sessions (including my own) which I enjoyed and recommend watching. I understand that it is difficult to find a block of 5 hrs to watch these, but I would like to urge to do so as they will prepare you for what is coming in the future.

Sharing VMUG presentation “vSphere futures”

Last week I presented at the UK VMUG, Nordic VMUG and VMUG Belgium. My topic was vSphere futures… I figured I would share the deck publicly. The deck is based on this blog post and essentially is a collection of what has been revealed at last VMworld. Considering the number of announcements I am guessing that this deck is a nice summary of what is coming, feel free to use it / share it / comment etc.

Once again, I would like to thank the folks of the VMUG organizations throughout EMEA for inviting me, three great events last week with very passionate people. One thing I want to call out in particular that struck me last week: Erik from the VMUG in Belgium has created this charity program where he asks sponsors (and attendees) to contribute to charity. Last event he collected over 8000 euros which went to a local charity, it was the biggest donation that this particular charity received in a long time and you can imagine they were very thankful… all of this while keeping the event free for attendees, great work Erik! Thanks for giving back to the community in various ways.

See you next time.

Validate your hardware configuration when it arrives!

In the last couple of weeks I had 3 different instances of people encountering weird behaviour in their environment. In two cases it was a VSAN environment and the other one was an environment using a flash caching solution. What all three had in common is that when they were driving a lot of IO the SSD device would be unavailable, for one of them they even had challenges enabling VSAN in the first place before any IO load was placed on it.

With the first customer it took me a while what was going on. I asked him the standard questions:

  • Which disk controller are you using?
  • Which flash device are you using?
  • Which disks are you using?
  • Do you have 10GbE networking?
  • Are they on the HCL?
  • What is the queue depth of the devices?

All the answers were “positive”, meaning that the full environment was supported… the queue depth was 600 so that was fine, enterprise grade MLC devices used and even the HDDs were on the HCL. So what was causing their problems? I asked them to show me the Web Client and the disk devices and the flash devices, then I noticed that the flash devices were connected to a different disk controller. The HDDs (SAS drives) were connected to the disk controllers which was on the HCL, a highly performant and reliable device… The flash device however was connected to the on-board shallow queue depth and non-certified controller. Yes indeed, the infamous AHCI disk controller. When I pointed it out the customers were shocked ,”why on earth would the vendor do that…”, well to be honest if you look at it: SAS drives were connected to the SAS controller and the SATA flash device was connected to the SATA disk controller, from that perspective it makes sense right? And in the end, the OEM doesn’t really know what your plans are with it when you configure your own box right? So before you install anything, open it up and make sure that everything is connected properly in a fully supported way! (PS: or simply go VSAN Ready Node / EVO:RAIL :-))

Slow backup of VM on VSAN Datastore

Someone at out internal field conference asked me a question around why doing a full back up of a virtual machine on a VSAN datastore is slower then when doing the same exercise for that virtual machine on a traditional storage array. Note that the test that was conducted here was done with a single virtual machine. The best way to explain why this is is by taking a look at the architecture of VSAN. First, let me mention that the full backup of the VM on a traditional array was done on a storage system that had many disks backing the datastore on which the virtual machine was located.

Virtual SAN, as hopefully all of you know, creates a shared datastore out of host local resources. This datastore is formed out of disk and flash. Another thing to understand is that Virtual SAN is an object store. Each object typically is stored in a resilient fashion and as such on two hosts, hence 3 hosts is the minimum. Now, by default the component of an object is not striped which means that components are stored in most cases on a single physical spindle, for an object this means that as you can see in the diagram below that the disk (object) has two components and without stripes is stored on 2 physical disks.

Now lets get back to the original question. Why did the backup on VSAN take longer then with a traditional storage system? It is fairly simple to explain looking at the above info. In the case of the traditional storage array you are reading from multiple disks (10+) but with VSAN you are only reading from 2 disks. As you can imagine when reading from disk performance / throughput results will differ depending on the number of resources the total number of disks it is reading from can provide. In this test, as there it is just a single virtual machine being backed up, the VSAN result will be different as it has a lower number of disks (resources) to its disposal and on top of that is the VM is new there is no data cached so the flash layer is not used. Now, depending on your workload you can of course decide to stripe the components, but also… when it comes to backup you can also decided to increase the number of concurrent backups… if you increase the number of concurrent backups then the results will get closer as more disks are being leveraged across all VMs. I hope that helps explaining  why results can be different, but hopefully everyone understands that when you test things like this that parallelism is important or provide the right level of stripe width.