ssd

Startup News Flash part 1

Duncan Epping · Aug 8, 2013 ·

I am on PTO this week so have tried to avoid spending time behind my mac/iPhone/iPad, well tried I guess… It is difficult as most of you probably know and have realized. While on vacation a couple of interesting things happened, hence this Startup New Flash blog post. The primary focus of this article is Startup news / Flash-related news. Preferably in the storage/flash space. This can be flash caching, flash arrays, hybrid arrays, flash drives… you name it! I guess “new technologies from old companies” would even fit. Will try to make this a regular thing… Or at least use the same title when there is something flashy announced or worth calling out.

For those who have been living under a rock the last week, besides introducing a brand new logo, PernixData announced general availability of FVP 1.0. On Monday my RSS reader was filled with Pernix related articles, and I was almost at the point of muting “Pernix” on twitter. So why the excitement, what did they announce? Hopefully, most of you have read my article on Pernix, or have been following Frank’s series of articles. I guess everyone is aware that Pernix offers a hypervisor-based flash virtualization platform. Meaning that their solution is installed as a “vib” within ESXi, indeed not an appliance-based approach. But others are doing this as well, so what is so unique about it? Write-back caching… Clustered write-back caching that is, so guaranteeing consistency of your IO. In other words, when within FVP you enable “write-back” caching, you can select how many relicas of the IO you want. (Currently, it ranges from 0 to 2.) Pricing for the enterprise solution was also announced, $ 7500,- per host. The announcement mentions there will be a different SKU for SMB, so looking forward to hear the details on that. One thing which I didn’t know is that Pernix also has optimization for View environments, it contains a form of “dedupe” for the base images… Frank revealed this on the APAC podcast (episode 77) he was on, hosted by Mike Laverick. (Recommend listening to it.) All in all an exciting and unique 1.0 release… I guess you might wonder where I think they should focus on, for me that would be NFS support and potentially support for other hypervisors, but if I recall correctly Satyam or Frank mentioned that those are being worked on.

Diablo announced Memory Channel Storage (MCS). The next logical step if you ask me when it comes to reducing latency and increasing bandwidth. MCS basically brings flash closer to your CPU by leveraging the memory bus instead of PCIe/SAS/SATA. Interesting concept, something worth exploring. Especially considering you can present it as either normal memory (how about TBs of memory for a fraction of the price?) or as a block device. This means that you could potentially use Diablo as a target for a flash caching solution. One of the benefits many people listed is that this solution would be very useful in blade environments or hyperconverged due to the fact that it eliminates the need for a PCIe slot or a disk slot… I guess that is somewhat true, in many of those cases the number of memory slots will also be limited so it doesn’t really solve those types of constraints immediately. Nevertheless, an interesting solution which is worth exploring and definitely offers new opportunities.

Another interesting announcement came from a startup called Crossbar. Crossbar came out of stealthmode this week, and is working on RRAM. With 20x faster write performance at 20x lower power consumption and much higher capacity density compared to best-of-breed flash solutions you can understand why people are excited about Crossbar. The market opportunity is huge here, and various companies have been working on it… So far not many have been able to execute on it at scale, so congrats to Crossbar, and definitely a company and a solution to keep your eye on. I know I will, I have already added them to my twitter startup watch list.

Evaluating SSDs in Virtualized Datacenters by Irfan Ahmad

Duncan Epping · Jun 3, 2013 ·

Flash-based solid-state disks (SSDs) offer impressive performance capabilities and are all the rage these days. Rightly so? Let’s find out how you can assess the performance benefit of SSDs in your own datacenter before purchasing anything and without expensive, time-consuming and usually inaccurate proofs-of-concept.

** Please note that this article is written by Irfan Ahmad, follow him on twitter and make sure to attend his webinar on the 5th of June on this topic, and vote for CloudPhysics in the big data startup top 10. **

I was fortunate enough to have started the very first project at VMware that optimized ESX to take advantage of Flash and SSDs. Swap to Host Cache (aka Swap-to-SSD) shipped in vSphere 5. For those customers wanting to manage their DRAM spend, this feature can be a huge cost saving. It also continues to serve as a differentiator for vSphere against competitors.

Swap-to-SSD has the distinction of being the first VMware project to fully utilize the capabilities of Flash but it is certainly not the only one. Since then, every established storage vendor has entered this area, not to mention a dozen awesome startups. Some have solutions that apply broadly to all compute infrastructures, yet others have products that are specifically designed to address the hypervisor platform.

The performance capabilities of the Flash are indeed impressive. But they can cost a pretty penny. Marketing machines are in full force trying to convince you that you need a shiny hardware or software solution. An important question remains: can the actual benefit keep up with the hype? The results are mixed and worth reading through.

Introducing startup PernixData – Out of stealth!

Duncan Epping · Feb 20, 2013 ·

There are many startups out there that do something with storage these days. To be honest, many of them do the same thing and at times I wonder why on earth everyone focuses on the same segment and tries to attack it with the same product / feature set. One of the golden rules for any startup should be that you have a unique solution that will sell itself. Yes I realize that it is difficult, but if you want to succeed you will need to stand out.

About a year ago Satyam Vaghani (former VMware principal engineer who was responsible for VMFS, VAAI, VVOLs etc.) and Poojan Kumar (former VMware Data products lead and ex-Oracle Exadata founder) decided to start a company – PernixData. PernixData was conceptualized based on their experiences working on the intersection of virtualization, flash based storage and data. Today PernixData is revealed to the world. For those who don’t know, Pernix means “agile”. But what is PernixData about?

How many of you haven’t experienced storage performance problems? It probably is, in fact, the number one bottleneck in most virtualized environments. Convincing your manager (director / VP) that you need a new ultra-fast (and expensive) storage device is not easy; far from it. On top of that, data will always hit the network first before being acknowledged and every read will go over your storage network. How cool would it be if there was a seamless software solution that solves all your storage performance problems without you requiring to rip and replace your existing storage assets?

Server-side flash overcomes problems associated with network based storage and server-side caching solutions provide some respite. Yet, server-side caching solutions usually neither satisfy enterprise class requirements for availability nor transparently support clustered hypervisor features such as VMware vMotion. In addition, while they accelerate reads they fail to do much for writes. Customers are then stuck between either overhauling their entire storage infrastructure or going with caching solutions that work for limited use cases. PernixData is about to release a cool new product – a flash virtualization platform – that bridges this gap. By picking up where hypervisors left off, PernixData is planning to become the VMware of server flash and is aiming to do to server flash what VMware did to CPU and memory. So, what is this flash virtualization platform and why would you need it?

PernixData’s flash virtualization platform virtualizes all flash resources across all server nodes in a vCenter Server cluster into a single high-performance, enterprise class data tier. The great thing is that this happens in a transparent way. PernixData sits completely within the hypervisor and in the data-path of your virtual machine. Note that there are no requirements to install anything in the guest (virtual machine). PernixData is not a virtual appliance because virtual appliances introduce performance overhead and would need to be managed with all costs and complexity associated.

PernixData is also flash technology agnostic. It can leverage SSD or PCIe flash (or both) within the platform. The nice thing is that PernixData uses a scale-out architecture. As you add hosts with flash they can be dynamically added to the platform. On top of that, PernixData does both read and write acceleration while providing full data protection and is fully compatible with VM mobility solutions like vMotion, Storage vMotion, HA, DRS and Storage DRS.

Even more exciting PernixData will support both Write-through and Write-back modes. The cool part is that PernixData also ensures IO is replicated for high availability purposes. You don’t want to run your VM in Write-back mode when you cannot guaranteed data is highly available right?! I guess that is one of the unique selling points of the solution. A distributed, scale out, flash virtualization platform which is not only flash agnostic but also non-disruptive for your virtual workloads.

I would imagine this is many times cheaper than buying a new storage array. Even without knowing what the cost of PernixData will be, or which flash device (PCIe or SSD) you would decide to use… I bet when it comes to overall costs of the solution (product + implementation costs) it will be many many times cheaper.

As I started off with, the golden rule for any startup should be that they have a unique solution that sells itself. I am confident that PernixData FVP has just that by being a disruptive technology that solves a big problem in virtualized environments in a scale-out and transparent manner while leveraging your existing storage investments.

If you want to be kept up to date, make sure to follow Satyam, Poojan , Charlie and PernixData on twitter. If you are interested in joining the PernixData FVP Beta, make sure to sign up!

Make sure to also read Frank’s article on PernixData.

I recommend watching the Storage Field Day videos for more details from Satyam Vaghani himself, note the playlist this is 4 videos!

</update>

Faking an SSD in your virtualized vSphere lab

Duncan Epping · Jan 11, 2013 ·

I have written about this before (and so has William Lam, so all credits go to William), but I wanted to note down these commands for my own use as I find myself digging around often for the same commands these days. So what is my goal: Faking an SSD in my virtualized vSphere lab.

In my lab I have a bunch of virtualized ESXi hosts. Those hosts have multiple disks and I want to mark one of those disks as SSD. To keep things simple I set things up as follows. Just to point out, I use 0:0 / 1:0 / 2:0 so that each device gets a new controller and is easy to identifiy:

First Disk – ESXi install disk – 5GB – SCSI 0:0
Second Disk – Fake SSD – 40GB – SCSI 1:0
Third Disk – Large disk – 1TB – SCSI 2:0

When I boot all disks are recognized as regular disks and in some cases as non-local. In my testing I need local disks and need SSD. So this is what I did to get exactly that. With the first command I mark the “second disk” as SSD and local. With the second command I mark the third disk as local. Next I reclaim the devices so that the new SATP rules are applied.

esxcli storage nmp satp rule add --satp VMW_SATP_LOCAL --device mpx.vmhba2:C0:T0:L0 --option "enable_local enable_ssd"
esxcli storage nmp satp rule add --satp VMW_SATP_LOCAL --device mpx.vmhba3:C0:T0:L0 --option "enable_local"
esxcli storage core claiming reclaim -d mpx.vmhba2:C0:T0:L0
esxcli storage core claiming reclaim -d mpx.vmhba3:C0:T0:L0

Next you can simply validate if it has worked by typing the following for device vmhba2 and 3 (if you replace the 2 with a 3 ofcourse) :

esxcli storage core device list --device=mpx.vmhba2:C0:T0:L0

As you can see, faking an SSD is fairly straight forward. Note that even if you have an SSD drive you still might need to do this. In some cases the SSD drive is not recognized and you will need to create a rule for it manually.

Swap to host cache aka swap to SSD?

Duncan Epping · Aug 18, 2011 ·

Before we dive in to it, lets spell out the actual name of the feature “Swap to host cache”. Remember that, swap to host cache!

I’ve seen multiple people mentioning this feature and saw William posting a hack on how to fool vSphere (feature is part of vSphere 5 to be clear) into thinking it has access to SSD disks while this might not be the case. One thing I noticed is that there seems to be a misunderstanding of what this swap to host cache actually is / does and that is probably due to the fact that some tend to call it swap to SSD. Yes it is true, ultimately your VM would be swapping to SSD but it is not just a swap file on SSD or better said it is NOT a regular virtual machine swap file on SSD.

When I logged in to my environment first thing I noticed was that my SSD backed datastore was not tagged as SSD. First thing I wanted to do was tag it as SSD, as mentioned William already described this in his article and it is well documented in our own documentation as well so I followed it. This is what I did to get it working:

Check the NAA ID in the vSphere UI
Opened up an SSH session to my ESXi host
Validate which SATP claimed the device:
esxcli storage nmp device list
In my case: VMW_SATP_ALUA_CX
Verify it is currently not recognized as SSD by typing the following command:
esxcli storage core device list -d naa.60060160916128003edc4c4e4654e011
should say: “Is SSD : False”
Set “Is SSD” to true:
esxcli storage nmp satp rule add -s VMW_SATP_ALUA_CX –device naa.60060160916128003edc4c4e4654e011 –option=enable_ssd
I reloaded claim rules and ran them using the following commands:
esxcli storage core claimrule load
esxcli storage core claimrule run
Validate it is set to true:
esxcli storage core device list -d naa.60060160916128003edc4c4e4654e011
Now the device should be listed as SSD

Next would be to enable the feature… When you go to your host and click on the “Configuration Tab” there should be a section called “Host Cache Configuration” on the left. When you’ve correctly tagged your SSD it should look like this:

Please note that I already had a VM running on the device and hence the reason it is showing some of the space as being in use on this device, normally I would recommend using a drive dedicated for swap. Next step would be enabling the feature and you can do that by opening the pop-up window (right click your datastore and select “Properties”). This is what I did:

Tick “Allocate space for host cache”
Select “Custom size”
Set the size to 25GB
Click “OK”

Now there is no science to this value as I just wanted to enable it and test the feature. What happened when we enabled it? We allocated space on this LUN so something must have been done with it? I opened up the datastore browser and I noticed a new folder was created on this particular VMFS volume:

Not only did it create a folder structure but it also created 25 x 1GB .vswp files. Now before we go any further, please note that this is a per host setting. Each host will need to have its own Host Cache assigned so it probably makes more sense to use a local SSD drive instead of a SAN volume. Some of you might say but what about resiliency? Well if your host fails the VMs will need to restart anyway so that data is no longer relevant, in terms of disk resiliency you should definitely consider a RAID-1 configuration. Generally speaking SAN volumes are much more expensive than local volumes and using local volumes also removes the latency caused by the storage network. Compared to the latency of a SSD (less than 100 μs), network latency can be significant. So lets recap that in a nice design principal:

Basic design principle
Using “Swap to host cache” will severely reduce the performance impact of VMkernel swapping. It is recommended to use a local SSD drive to elimate any network latency and to optimize for performance.

How does it work? Well fairly straight forward actually. When there is severe memory pressure and the hypervisor needs to swap memory pages to disk it will swap to the .vswp files on the SSD drive instead. Each of these, in my case, 25 files are shared amongst the VMs running on this host. Now you will probably wonder how you know if the host is using this Host Cache or not, that can of course simply be validated by looking at the performance statistics within vCenter. It contains a couple of new metrics of which “Swap in from host cache” and “Swap out to host cache” (and the “rate”…) metrics are most important to monitor. (Yes, esxtop has metrics as well to monitor it namely LLSWR/s and LLSWW/s)

What if you want to resize your Host Cache and it is already in use? Well simply said the Host Cache is optimized to allow for this scenario. If the Host Cache is completely filled memory pages will need to be copied to the regular .vswp file. This could mean that the process takes longer than expected and of course it is not a recommended practice as it will decrease performance for your VMs as these pages more than likely at some point will need to be swapped in. Resizing however can be done on the fly, no need to vMotion away your VMs. Just adjust the slider and wait for the process to complete. If you decide to complete remove all host cache for what ever reason than all relevant data will be migrated to the regular .vswp.

What if the Host Cache is full? Normally it shouldn’t even reach that state, but when you run out of space in the host cache pages will be migrated from your host cache to your regular vswap file and it is first in first out in this case, which should be the right policy for most workloads. Now chances of course of having memory pressure to the extend where you fill up a local SSD are small, but it is good to realize what the impact is. If you are going down the path of local SSD drives with Host Cache enabled and will be overcommitting it might be good to do the math and ensure that you have enough cache available to keep these pages in cache rather than on rotating media. I prefer to keep it simple though and would probably recommend to equal the size of your hosts memory. In the case of a host with 128GB RAM that would be a 128GB SSD. Yes this might be overkill, but the price difference between 64GB and 128GB is probably neglect-able.

Basic design principle
Monitor swap usage. Although “Swap to host cache” will reduce the impact of VMkernel swapping it will not eliminate it. Take your expected consolidation ratio into account including your HA (N-X) strategy and size accordingly. Or keep it simple and just use the same size as physical memory.

One interesting use case could be to place all regular swap files on very cheap shared storage (RAID5 of SATA drives) or even local SATA storage using the “VM swapfile location” (aka. Host local swap) feature. Then install a host cache for any host these VMs can be migrated to. This should give you the performance of a SSD while maintaining most of the cost saving of the cheap storage. Please note that the host cache is a per-host feature. Hence in the time of a vMotion all data from the cache will need to be transferred to the destination host. This will impact the time a vMotion takes. Unless your vMotions are time critical, this should not be an issue though. I have been told that VMware will publish a KB article with advise how to buy the right SSDs for this feature.

Summarizing, Swap to SSD is what people have been calling this feature and that is not what it is. This is a mechanism that caches memory pages to SSD and should be referred to as “Swap to host cache”. Depending on how you do the math all memory pages can be swapped to and from SSD. If there is insufficient space available memory pages will move over to the regular .vswp file. Use local SSD drives to avoid any latency associated with your storage network and to minimize costs.