Introducing startup PernixData – Out of stealth!

There are many startups out there that do something with storage these days. To be honest, many of them do the same thing and at times I wonder why on earth everyone focuses on the same segment and tries to attack it with the same product / feature set. One of the golden rules for any startup should be that you have a unique solution that will sell itself. Yes I realize that it is difficult, but if you want to succeed you will need to stand out.

About a year ago Satyam Vaghani (former VMware principal engineer who was responsible for VMFS, VAAI, VVOLs etc.) and Poojan Kumar (former VMware Data products lead and ex-Oracle Exadata founder) decided to start a company – PernixData. PernixData was conceptualized based on their experiences working on the intersection of virtualization, flash based storage and data. Today PernixData is revealed to the world. For those who don’t know, Pernix means “agile”. But what is PernixData about?

How many of you haven’t experienced storage performance problems? It probably is, in fact, the number one bottleneck in most virtualized environments. Convincing your manager (director / VP) that you need a new ultra-fast (and expensive) storage device is not easy; far from it. On top of that, data will always hit the network first before being acknowledged and every read will go over your storage network. How cool would it be if there was a seamless software solution that solves all your storage performance problems without you requiring to rip and replace your existing storage assets?

Server-side flash overcomes problems associated with network based storage and server-side caching solutions provide some respite. Yet, server-side caching solutions usually neither satisfy enterprise class requirements for availability nor transparently support clustered hypervisor features such as VMware vMotion. In addition, while they accelerate reads they fail to do much for writes. Customers are then stuck between either overhauling their entire storage infrastructure or going with caching solutions that work for limited use cases. PernixData is about to release a cool new product – a flash virtualization platform – that bridges this gap. By picking up where hypervisors left off, PernixData is planning to become the VMware of server flash and is aiming to do to server flash what VMware did to CPU and memory. So, what is this flash virtualization platform and why would you need it?

PernixData’s flash virtualization platform virtualizes all flash resources across all server nodes in a vCenter Server cluster into a single high-performance, enterprise class data tier. The great thing is that this happens in a transparent way. PernixData sits completely within the hypervisor and in the data-path of your virtual machine. Note that there are no requirements to install anything in the guest (virtual machine). PernixData is not a virtual appliance because virtual appliances introduce performance overhead and would need to be managed with all costs and complexity associated.

PernixData is also flash technology agnostic. It can leverage SSD or PCIe flash (or both) within the platform. The nice thing is that PernixData uses a scale-out architecture. As you add hosts with flash they can be dynamically added to the platform. On top of that, PernixData does both read and write acceleration while providing full data protection and is fully compatible with VM mobility solutions like vMotion, Storage vMotion, HA, DRS and Storage DRS.

Even more exciting PernixData will support both Write-through and Write-back modes. The cool part is that PernixData also ensures IO is replicated for high availability purposes. You don’t want to run your VM in Write-back mode when you cannot guaranteed data is highly available right?! I guess that is one of the unique selling points of the solution. A distributed, scale out, flash virtualization platform which is not only flash agnostic but also non-disruptive for your virtual workloads.

I would imagine this is many times cheaper than buying a new storage array. Even without knowing what the cost of PernixData will be, or which flash device (PCIe or SSD) you would decide to use… I bet when it comes to overall costs of the solution (product + implementation costs) it will be many many times cheaper.

As I started off with, the golden rule for any startup should be that they have a unique solution that sells itself. I am confident that PernixData FVP has just that by being a disruptive technology that solves a big problem in virtualized environments  in a scale-out and transparent manner while leveraging your existing storage investments.

If you want to be kept up to date, make sure to follow Satyam, Poojan , Charlie and PernixData on twitter. If you are interested in joining the PernixData FVP Beta, make sure to sign up!

Make sure to also read Frank’s article on PernixData.

Faking an SSD in your virtualized vSphere lab

I have written about this before (and so has William Lam, so all credits go to William), but I wanted to note down these commands for my own use as I find myself digging around often for the same commands these days. So what is my goal: Faking an SSD in my virtualized vSphere lab.

In my lab I have a bunch of virtualized ESXi hosts. Those hosts have multiple disks and I want to mark one of those disks as SSD. To keep things simple I set things up as follows. Just to point out, I use 0:0 / 1:0 / 2:0 so that each device gets a new controller and is easy to identifiy:

  • First Disk – ESXi install disk – 5GB – SCSI 0:0
  • Second Disk – Fake SSD – 40GB – SCSI 1:0
  • Third Disk – Large disk – 1TB – SCSI 2:0

When I boot all disks are recognized as regular disks and in some cases as non-local. In my testing I need local disks and need SSD. So this is what I did to get exactly that. With the first command I mark the “second disk” as SSD and local. With the second command I mark the third disk as local. Next I reclaim the devices so that the new SATP rules are applied.

esxcli storage nmp satp rule add --satp VMW_SATP_LOCAL --device mpx.vmhba2:C0:T0:L0 --option "enable_local enable_ssd"
esxcli storage nmp satp rule add --satp VMW_SATP_LOCAL --device mpx.vmhba3:C0:T0:L0 --option "enable_local"
esxcli storage core claiming reclaim -d mpx.vmhba2:C0:T0:L0
esxcli storage core claiming reclaim -d mpx.vmhba3:C0:T0:L0

Next you can simply validate if it has worked by typing the following for device vmhba2 and 3 (if you replace the 2 with a 3 ofcourse) :

esxcli storage core device list --device=mpx.vmhba2:C0:T0:L0

As you can see, faking an SSD is fairly straight forward. Note that even if you have an SSD drive you still might need to do this. In some cases the SSD drive is not recognized and you will need to create a rule for it manually.

Swap to host cache aka swap to SSD?

Before we dive in to it, lets spell out the actual name of the feature “Swap to host cache”. Remember that, swap to host cache!

I’ve seen multiple people mentioning this feature and saw William posting a hack on how to fool vSphere (feature is part of vSphere 5 to be clear) into thinking it has access to SSD disks while this might not be the case. One thing I noticed is that there seems to be a misunderstanding of what this swap to host cache actually is / does and that is probably due to the fact that some tend to call it swap to SSD. Yes it is true, ultimately your VM would be swapping to SSD but it is not just a swap file on SSD or better said it is NOT a regular virtual machine swap file on SSD.

When I logged in to my environment first thing I noticed was that my SSD backed datastore was not tagged as SSD. First thing I wanted to do was tag it as SSD, as mentioned William already described this in his article and it is well documented in our own documentation as well so I followed it. This is what I did to get it working:

  • Check the NAA ID in the vSphere UI
  • Opened up an SSH session to my ESXi host
  • Validate which SATP claimed the device:
    esxcli storage nmp device list
    In my case: VMW_SATP_ALUA_CX
  • Verify it is currently not recognized as SSD by typing the following command:
    esxcli storage core device list -d naa.60060160916128003edc4c4e4654e011
    should say: “Is SSD : False”
  • Set “Is SSD” to true:
    esxcli storage nmp satp rule add -s VMW_SATP_ALUA_CX  –device naa.60060160916128003edc4c4e4654e011  –option=enable_ssd
  • I reloaded claim rules and ran them using the following commands:
    esxcli storage core claimrule load
    esxcli storage core claimrule run
  • Validate it is set to true:
    esxcli storage core device list -d naa.60060160916128003edc4c4e4654e011
  • Now the device should be listed as SSD

Next would be to enable the feature… When you go to your host and click on the “Configuration Tab” there should be a section called “Host Cache Configuration” on the left. When you’ve correctly tagged your SSD it should look like this:

Please note that I already had a VM running on the device and hence the reason it is showing some of the space as being in use on this device, normally I would recommend using a drive dedicated for swap. Next step would be enabling the feature and you can do that by opening the pop-up window (right click your datastore and select “Properties”). This is what I did:

  • Tick “Allocate space for host cache”
  • Select “Custom size”
  • Set the size to 25GB
  • Click “OK”

Now there is no science to this value as I just wanted to enable it and test the feature. What happened when we enabled it? We allocated space on this LUN so something must have been done with it? I opened up the datastore browser and I noticed a new folder was created on this particular VMFS volume:

Not only did it create a folder structure but it also created 25 x 1GB .vswp files. Now before we go any further, please note that this is a per host setting. Each host will need to have its own Host Cache assigned so it probably makes more sense to use a local SSD drive instead of a SAN volume. Some of you might say but what about resiliency? Well if your host fails the VMs will need to restart anyway so that data is no longer relevant, in terms of disk resiliency you should definitely consider a RAID-1 configuration. Generally speaking SAN volumes are much more expensive than local volumes and using local volumes also removes the latency caused by the storage network. Compared to the latency of a SSD (less than 100 μs), network latency can be significant. So lets recap that in a nice design principal:

Basic design principle
Using “Swap to host cache” will severely reduce the performance impact of VMkernel swapping. It is recommended to use a local SSD drive to elimate any network latency and to optimize for performance.

How does it work? Well fairly straight forward actually. When there is severe memory pressure and the hypervisor needs to swap memory pages to disk it will swap to the .vswp files on the SSD drive instead. Each of these, in my case, 25 files are shared amongst the VMs running on this host. Now you will probably wonder how you know if the host is using this Host Cache or not, that can of course simply be validated by looking at the performance statistics within vCenter. It contains a couple of new metrics of which “Swap in from host cache” and “Swap out to host cache” (and the “rate”…) metrics are most important to monitor. (Yes, esxtop has metrics as well to monitor it namely LLSWR/s  and LLSWW/s)

What if you want to resize your Host Cache and it is already in use? Well simply said the Host Cache is optimized to allow for this scenario. If the Host Cache is completely filled memory pages will need to be copied to the regular .vswp file. This could mean that the process takes longer than expected and of course it is not a recommended practice as it will decrease performance for your VMs as these pages more than likely at some point will need to be swapped in. Resizing however can be done on the fly, no need to vMotion away your VMs. Just adjust the slider and wait for the process to complete. If you decide to complete remove all host cache for what ever reason than all relevant data will be migrated to the regular .vswp.

What if the Host Cache is full? Normally it shouldn’t even reach that state, but when you run out of space in the host cache pages will be migrated from your host cache to your regular vswap file and it is first in first out in this case, which should be the right policy for most workloads. Now chances of course of having memory pressure to the extend where you fill up a local SSD are small, but it is good to realize what the impact is. If you are going down the path of local SSD drives with Host Cache enabled and will be overcommitting it might be good to do the math and ensure that you have enough cache available to keep these pages in cache rather than on rotating media. I prefer to keep it simple though and would probably recommend to equal the size of your hosts memory. In the case of a host with 128GB RAM that would be a 128GB SSD. Yes this might be overkill, but the price difference between 64GB and 128GB is probably neglect-able.

Basic design principle
Monitor swap usage. Although “Swap to host cache” will reduce the impact of VMkernel swapping it will not eliminate it. Take your expected consolidation ratio into account including your HA (N-X) strategy and size accordingly. Or keep it simple and just use the same size as physical memory.

One interesting use case could be to place all regular swap files on very cheap shared storage (RAID5 of SATA drives) or even local SATA storage using the “VM swapfile location” (aka. Host local swap) feature. Then install a host cache for any host these VMs can be migrated to. This should give you the performance of a SSD while maintaining most of the cost saving of the cheap storage. Please note that the host cache is a per-host feature. Hence in the time of a vMotion all data from the cache will need to be transferred to the destination host. This will impact the time a vMotion takes. Unless your vMotions are time critical, this should not be an issue though. I have been told that VMware will publish a KB article with advise how to buy the right SSDs for this feature.

Summarizing, Swap to SSD is what people have been calling this feature and that is not what it is. This is a mechanism that caches memory pages to SSD and should be referred to as “Swap to host cache”. Depending on how you do the math all memory pages can be swapped to and from SSD. If there is insufficient space available memory pages will move over to the regular .vswp file. Use local SSD drives to avoid any latency associated with your storage network and to minimize costs.