Datrium finally out of stealth… Welcome Datrium DVX!

Before I get started, I have not been briefed by Datrium so I am also still learning as I type this and it is purely based on the somewhat limited info on their website. Datrium’s name has been in the press a couple of times as it was the company that was often associated with Diane Greene. The rumours back then were that Diane Greene was the founder and was going to take on EMC, that was just a rumour as Diane Greene is actually an investor in Datrium. Not just her of course, Datrium is also backed by NEA (Venture Capitalist) and various other well known people like Ed Bugnion, Mendel Rosenblum, Frank Slootman and Kai Li. Yes, a big buy in from some of the original VMware founders. Knowing that two of the Datrium founders (Boris Weissman and Ganesh Venkitachalam) are former VMware Principal Engineers (and old-timers) that makes sense. (Source) This morning a tweet was send out, and it seems today they are officially out of stealth.

So what is Datrium about? Well Datrium delives a new type of storage system which they call DVX. Datrium DVX is a hybrid solution comprised of host local data services and a network accessed capacity shelf called “netshelf”. I think this quote from their website says it all what their intention is… Move all functionality to the host and let the “shelf” just take care of storing bits. I included a diagram that I found on their website as it makes it more clear.

On the host, DiESL manages in-use data in massive deduplicated and compressed caches on BYO (bring your own) commodity SSDs locally, so reads don’t need a network hop. Hosts operate locally, not as a pool with other hosts.

datrium dvx

It seems that from a host perspective the data services (caching, compression, raid, cloning etc) are implemented through the installation of a VIB. So not VM/Appliance based but rather kernel based. The NetShelf is accessible via 10GbE and Datrium uses a proprietary protocol to connect to it. From a host side (ESXi) they connect locally over NFS, which means they have implemented an NFS Server within the host. The NFS connection is also terminated within the host and they included their own protocol/driver on the host to be able to connect to the NetShelf. It is a bit of an awkward architecture, or better said … at first it is difficult to wrap your head around it. This is the reason I used the word “hybrid” but maybe I should have used unique. Hybrid, not because of the mixture of flash and HDD but rather because it is a hybrid of hyper-converged / host local caching and more traditional storage but done in a truly unique way. What does that look like? Something like this I guess:

datrium dvx

So what does this look like from a storage perspective? Well each NetShelf will come with 29TB of usable capacity. Expected deduplication and compression rate for enterprise companies is between 2-6x which means you will have between 58TB and 175TB to your disposal. In order to ensure your data is high available the NetShelf is a dual controller setup with dual port drives (Which means the drives are connected to both controllers and used in an “active/standby” fashion). Each controller has NVRAM which is used for write caching, and a write will be acknowledge to the VM when it has been written to the NVRAM of both controllers. In other words, if a controller fails there should be no data loss.

Talking about availability, what if a host fails? If I read their website correctly then there is no write caching from a host point of view as it is states that each host operates independently from a caching point of view (no mirroring of writes to other hosts). This also means that all the data services need to be inline –> dedupe / compress / raid. When those actions complete the result will be stored on the NetShelf and then it is accessible by other hosts when needed. It makes me wonder what happens when DRS is enabled and a VM is migrated from one host to another. Will the read cache migrate with it to the other host? And what about very write intensive workloads, how will those perform when all data services are inline? What kind of overhead can/will it have on the host? How will it scale out? What if I need more than 1 Netshelf? Those are some of the questions that popup immediately. Considering the brain-power within Datrium I am assuming they have a simple answer to those questions… (Former VMware, Data Domain, NetApp, EMC etc) I will try to ask them these questions at VMworld or during a briefing and write a follow up.

From an operational aspect it is an interesting solution as it should lower the effort involved with managing storage almost to zero. There is the NFS connection and you have your VMs and VMDKS at the front end, at the back-end you have a blackbox or better said a shelf dedicated to storing bits. This should be dead easy to manage and deploy. It shouldn’t require a dedicated storage administrator but the VMware admin should be able to manage it. Some of you may ask, well what if I want to connect anything other than a VMware host to it? For now Datrium appears to be mainly targeting VMware environments (which makes sense considering their dna) but I guess they could implement this for various platforms in a similar fashion.

Again, I was not briefed by Datrium and I accidentally saw their tweet this morning but their solution is so intriguing I figured I would share it anyway. Hope it was useful.

Interested? More info here:

Extending your vSphere platform with Virtual SAN

Over the last couple of months I’ve spoken to many customers about Virtual SAN. What struck me during these conversations is how these customers spoke about Virtual SAN. In all cases when we start the conversation it starts with a conversation about what their environment used to looked like. What kind of storage they had. How it was configured, number of disks etc you name it. Of course we would discuss what kind of challenges they had with their legacy environment. Thinking back to these conversations there is one thing that really stood out, although never explicitly mentioned, the big difference between Virtual SAN and traditional storage systems is that Virtual SAN is not a storage system but rather an extension of the VMware vSphere Platform.

Source: Wiki
Software extension, a file containing programming that serves to extend the capabilities of or data available to a more basic program

I believe this statement is spot on. What is great about Virtual SAN is that it does the extension of the capabilities of vSphere in an extremely easy way. Virtual SAN achieves this simply by abstracting layers of complexity and pooling the resources and allow these to be assigned to workloads in an automated fashion whether through the use of policies and a simple UI or through the vSphere APIs. Keywords here are definitely: abstract, pool and automate.

Maybe I should have used the word “converging” instead of “abstracting”. That is essentially what is happening, and although many other vendors claim the same, I truly believe that Virtual SAN is one of the few solutions which is truly hyper-converged as it seamlessly converges layers instead of adding a layer on top of another layer. Hyper-convergence is more than just stacking layers in a single box.

With Virtual SAN storage is just there. Not bolted on, layered on top or mounted to the side, an integral part of your environment, an extension of your platform. Virtual SAN does for storage what vSphere does for CPU and Memory, it becomes a fundamental component of your cluster.

Horizon View and All-Flash VSAN

I typically don’t do these short posts which simply point to a white paper, but I really liked this paper on the topic of VMware Horizon View and All-Flash VSAN. In the paper it is demonstrated how to build an all-flash VSAN cluster using Dell servers, SanDisk flash and Brocade switches. Definitely recommended read if you are looking to deploy Horizon View anytime soon.

VMware Horizon View and All Flash Virtual SAN Reference Architecture
This Reference Architecture demonstrates how enterprises can build a cost-effective VDI infrastructure using VMware All Flash Virtual SAN combined with the fast storage IO performance offered by SSDs. The combination of Virtual SAN and all flash storage can significantly improve ROI without compromising on the high availability and scalability that customers demand.

Virtual SAN enabling PeaSoup to simplify cloud

This week I had the pleasure of talking to fellow dutchy Harold ButerHarold is the CTO for Peasoup and we had a lively discussion about Virtual SAN and why Peasoup decided to incorporate Virtual SAN in their architecture, what struck me was the fact that Peasoup Hosting was brought to life partly as a result of the Virtual SAN release. When we introduced Virtual SAN, Harold and his co-founder realized that this was a unique opportunity to build something from the ground up while avoiding big upfront costs typically associated with legacy arrays. How awesome is that, a new product that results in to new ideas and in the end a new company and product offering.

The conversation of course didn’t end there, lets get in to some more details. We discussed the use case first. PeaSoup is a hosting / cloud provider. Today they have two clusters running based on Virtual SAN. They have a management cluster which hosts all components needed for a vCloud Director environment and then they have a resource cluster. The great thing for PeaSoup was that they could start out with a relatively low investment in hardware and scale fast when new customers on-board or when existing customers require new hardware.

Talking about hardware PeaSoup looked at many different configurations and vendors and for their compute platform decided to go with Fujitsu RX300 rack mount servers. Harold mentioned that by far these were the best choice for them in terms of price, build quality and service. Personally it surprised me that Fujitsu came out as the cheapest option, it didn’t surprise me that Fujitsu’s service and build quality was excellent though. Specs wise the servers have 800GB SSDs, 7200 RPM NL-SAS disks and 256GB memory and of course two CPUs (Intel 2620 v2 – 6 core).

Harold pointed out that the only down side of this particular Fujitsu configuration was the fact that it only came with a disk controller that is limited to “RAID O” only, no passthrough. I asked him if they experienced any issues around that and he mentioned that they had 1 disk failure so far and that is resulted in having to reboot the server in order to recreate a RAID-0 set for that new disk. Not too big of a deal for PeaSoup, but of course if possible he would prefer to prevent this reboot from being needed. The disk controller by the way is based on the LSI 2208 chipset and it is one of things PeaSoup was very thorough about, making sure it was supported and that it had a high queue depth. The “HCL” came up multiple times during the conversation and Harold felt that although doing a lot of research up front and creating a scalable and repeatable architecture takes time, it also results in a very reliable environment with predictable performance. For a cloud provider reliability and user experience is literally your bread and butter, they couldn’t afford to “guess”. That was also one of the reasons they selected a VSAN Ready Node configuration as a foundation and tweaked where their environment and anticipated workload would require it.

Key take away: RAID-0 works perfectly fine during normal usage, only when disks need to be replaced a slight different operational process is required.

Anticipated is a keyword once again as it has been in many of the conversations I’ve had before, it is often unknown what kind of workloads will run on top of these infrastructures which means that you need to be able to be flexible in terms of scaling up versus scaling out. Virtual SAN provides just that to PeaSoup. We also spoke about the networking aspect. As a cloud provider running vCloud Director and Virtual SAN networking is a big aspect of the overall architecture. I was interested in knowing what kind of switching hardware was being used. PeaSoup uses Huawei 10GbE switches(CE6850), and each Server is connected with at least 4 x 10GbE port to these switches. PeaSoup dedicated 2 of these ports to Virtual SAN, which wasn’t a requirement from a load perspective (or from VMware’s point of view) but they preferred this level of redundancy and performance while having a lot of room to grow. Resiliency and future proof are key for PeaSoup. Price vs Quality was also a big factor in the decision to go with Huawei switches, Huawei in this case had the best price/quality ratio.

Key take away: It is worth exploring different network vendors and switch models. Prices greatly variate between vendors and models which could lead to substantial cost savings without impacting service / quality

Their host and networking configuration is well documented and can be easily repeated when more resources are needed. They even have discount / pricing documented with their suppliers so they know what the cost will be and can assess quickly what is needed and when, and of course what the cost will be. I also asked Harold if they were offering different storage profiles to provide their customers a choice in terms of performance and resiliency. So far they offer two different policies to their customers:

  • Failures to tolerate = 1  //  Stripe Width = 2
  • Failures to tolerate = 1  //  Stripe Width = 4

So far it appears that not too many customers are asking about higher availability, they recently had their first request and it looks like the offering will include “FTT=2″ along side “SW=2 / 4″ in the near future. On the topic of customers they mentioned they have a variety of different customers using the platform ranging from companies who are in the business of media conversion, law firms to a company which sells “virtual private servers” on their platform.

Before we wrapped up I asked Harold what the biggest challenge for them was with Virtual SAN. Harold mentioned that although they were a very early adopter and use it in combination with vCloud Director they have had no substantial problems. What may have been the most challenging in the first months was figuring out the operational processes around monitoring. Peasoup is a happy Veeam customer and they decided to use Veeam One to monitor Virtual SAN for now, but in the future they will also be looking at the vR Ops Virtual SAN management pack, and potentially create some custom dashboards in combination with LogInsight.

Key take away: Virtual SAN is not like a traditional SAN, new operational processes and tooling may be required.

PeaSoup is an official reference customer for Virtual SAN by the way, you can find the official video below and the slide deck of their PEX session here.

PernixData announcements at #VFD5

Today PernixData presented at Virtualization Field Day 5. Excellent presentation by Satyam once again. This week I was fortunate to catch up with Frank Denneman to discuss what was going to be announced and what can be expected in the near future. I want to make it clear that there were no expectations given around release dates, don’t expect this to drop next week.

There were 4 key announcements:

  1. PernixData Architect – A better way to design, operate, and optimize data centers
  2. PernixData Cloud – Making enterprise IT more transparent
  3. PernixData FVP
    1. Freedom – Yes, “free” is the key word here!
    2. New features / functionality…

I am going to go at this in a different order than the deck, as I want to cover some of the changes with regards to FVP first. Satyam spoke about a new thing called “FVP Freedom“. FVP Freedom is a free version of FVP which can be used by anyone in any environment. Of course there are some constraints / limitations and these are:

  • Up to 128GB DFTM cluster for write through acceleration
  • Community support

However, you can use FVP Freedom for an unlimited number of hosts and unlimited number of VMs. Note that “DFTM” stands for Distributed Fault Tolerant Memory. This means that FVP Freedom gives you memory (read) caching only, no SSD caching. (128GB limit per cluster) I think this is huge, and it is a very smart way of getting people to test your solution and run it in production. So what can you do with 128GB? Well of course they tested this, and they were capable of increasing the VSImax users from 181 to 328 with that 128GB of memory on 2 hosts. You may wonder why they took this approach cause what does giving it away for free bring them? Well that will be obvious when you read the other announcements.

Besides a free version of FVP some enhancements to the current version were also announced. For me support for vSphere 6.0 and VVols were two major items. On top of that new “phone home” functionality is build in, which allows for better and pro-active support. What also stood out was the new stand alone UI. This means that you will be taken out of the Web Client to a standalone HTML5+JS based interface. You may wonder why they did this, that is where the two new product announcements come in to play. FVP is still a Windows installable by the way, I hoped they would announce an appliance which lowers complexity in terms of installation and management, but maybe next time who knows.

PernixData Architect was the first announcement. It is a piece of software that enables you to monitor your infrastructure (storage focussed of course) and make educated decisions based on the information and even recommendations provided. So what are we talking about in terms of metrics etc? PernixData Architect (for now?) is focussed on storage, not just from a cluster point of view, but also from a host level and virtual machine level. What is the latency a virtual machine is experiencing? How many IOPS does this VM do on average? What is the throughput? What is the read/write ratio? What are common block sizes? All the things you would like to know when designing, scaling, sizing your storage infrastructure and of course when using FVP.

Besides the details above you can for instance also see what the active working set is in your environment for any of your VMs. You can even get recommendations around how to configure FVP, you may have it set to write back but if you are mainly serving reads from cache you may want to change that for instance. It will also give you other recommendations for instance around networking etc.

You can imagine that with all the metrics and info they are gathering they will be able to provide you much more recommendations in the future. I can see those dashboards expanding fast, and I think it is valuable for everyone to understand how their workloads are behaving. On twitter some comments were made about vR Ops and CloudPhysics. Not a fair comparison as Pernix is focussed on “just” storage for now. Personally I hope they will start tying in other aspects like memory, cpu and networking as I don’t think customers want to be stuck with 2 or 3 monitoring solutions.

Now that you have all that data, what can you do with it? Well that is where PernixData Cloud comes in. PernixData Cloud can give you Insights in to how you are doing compared to others in the industry with similar environments, or even with different environments. Those running PernixData Architect can feed it in to the cloud analytics platform and do an analysis on it. But what if they don’t? How useful is this cloud analytics platform going to be? Well here is the catch, when you use FVP Freedom one of the requirements will be to upload your statistics and environmental details in to PernixData Cloud. So, what kind of data can you get out of it? Let me give you two visual examples as that shows immediately why this is valuable:

Both of the above examples demonstrate what PernixData Cloud Insights can give you. Data that is going help making purchasing decisions, and I can see how it could also be useful in the future for making design decisions. (Here is what others did to achieve X.) Best example is the top screenshot, not sure which flash device to buy? What are others buying? What can you expect out of it in terms of latency/throughput/IOPS? Cloud Insights will enable you to make educated decisions based on real life environments instead of based on fact-sheets which always appear to be misleading.

All in all, exciting news / announcements from PernixData at Virtualization Field Day 5. Nice work guys, and thanks Mr Denneman for taking the time to have a chat with me and thanks Mr Foskett for streaming the event live!