ScaleIO in the ESXi Kernel, what about the rest of the ecosystem?

Duncan Epping · Jan 6, 2015 ·

Before reading my take on this, please read this great article by Vijay Ramachandran as he explains the difference between ScaleIO and VSAN in the kernel. And before I say anything, let me reinforce that this is my opinion and not VMware’s necessarily. I’ve seen some negative comments around Scale IO / VMware / EMC, most of them are around the availability of a second storage solution in the ESXi kernel next to VMware’s own Virtual SAN. The big complaint typically is: Why is EMC allowed and the rest of the ecosystem isn’t? The question though is if VMware is really not allowing other partners to do the same? While flying to Palo Alto I read an article by Itzik which stated the following:

ScaleIO 1.31 introduces several changes in the VMware environment. First, it provides the option to install the SDC natively in the ESX kernel instead of using the SVM to host the SDC component. The V1.31 SDC driver for ESX is VMware PVSP certified, and requires a host acceptance level of “PartnerSupported” or lower in the ESX hosts.

Let me point out here that the solution that EMC developed is under PVSP support. What strikes me is the fact that many seem to think that what ScaleIO achieved is a unique thing despite the “partner support” statement. Although I admit that there aren’t many storage solutions that sit within the hypervisor, and this is great innovation, it is not unique for a solution to sit within the hypervisor.

If you look at flash caching solutions for instance you will see that some sit in the hypervisor (PernixData, SanDisk’s Flashsoft) and some sit on top (Atlantis, Infinio). It is not like VMware favours one over the other in case of these partners. It was their design, it was their way to get around a problem they had… Some managed to develop a solution that sits in the hypervisor, others did not focus on that. Some probably felt that optimizing the data path first was most important, and maybe even more important they had the expertise to do so.

Believe me when I say that it isn’t easy to create these types of solutions. There is no standard framework for this today, hence they end up being partner supported as they leverage existing APIs and frameworks in an innovative way. Until there is you will see some partners sitting on top and others within the hypervisor, depending on what they want to invest in and what skill set they have… (Yes a framework is being explored as talked about in this video by one of our partners, I don’t know when or if this will be release however!)

What ScaleIO did is innovative for sure, but there are others who have done something similar and I expect more will follow in the near future. It is just a matter of time.

Comments

scotthdavis says

7 January, 2015 at 19:18

Good post, Duncan. I wanted to add a few observations… One big difference between these techniques is running in kernel mode vs. user mode. User mode provides a richer application environment and better protects the kernel from bugs and instability, however adds latency. VMware has chosen (to date) to offer only limited, architecturally sanctioned interfaces for third party functionality in the hypervisor, for storage this is primarily the PSA framework which was originally designed for Fibre-channel multi-pathing plug-ins. This design target means that third party kernel-node intercepts can only occur at a very specific point in the IO processing flow. Other OS’s, such as Microsoft Windows deliver a much more flexible, stackable filter driver architecture, however that architecture and the accompanying third party drivers have been blamed for many of the “blue screen of death” issues that have historically plagued that platform. VMware’s VAIO filter architecture is a promising innovation that goes beyond these options, once it’s in the market. It provides a userworld/user mode filter environment without the context switching/latency issues involved in running within a VM and I certainly look forward to its commercial availability.
Michael says

9 January, 2015 at 16:45

Historically we have used “Converged Storage” solutions such as the HP P4000 VSA, but faced issues with latency spikes which VMware/HP put down to hair pinning on the network. Fix has been to isolate VM’s on hosts away from their storage – not ideal, or through additional nics

Although I haven’t tested new

Another consideration is that We also have a challenge under the existing VSPP rules which means we have to pay for all powered on VM’s RAM resource. Anything which is kernel based wont be picked up by the usage tool, which makes life a bit easier
- Duncan Epping says
  
  13 January, 2015 at 16:53
  
  Never thought about, the VSPP argument is a nice one 🙂
John says

9 January, 2015 at 17:33

Very timely post, Duncan. recently, I have experienced “purple screen of death” when upgrading esxi to 5.5 due to the third party kernel mode drivers. I have not encountered “purple screen of death” for years. Naturally, I am concerned that third party kernel mode filter drivers will be out of control and contaminate the kernel purity like in the Windows world. User mode solutions like Controller VMs do add latency but leave kernel untouched. As a VMware admin, I personally prefer user mode solutions. Let’s not compromise the kernel stability, which is Vmware’s bread and butter.
Victor da Costa says

14 August, 2015 at 22:00

Different Goals, leads to different approaches to solve the same problem. i prefer the integrated (driver) approach, since it brings better performance and better resilience… personally, i don’t like to wait in lines, it’s much better with fastpass…

Related

Reader Interactions

Comments