Virtual Volumes vendor demos

I was at the Italian VMUG last week and one of the users asked me what Virtual Volumes would look like. He wanted to know if the experience would be similar to the “VM Storage Policy” experiences he has been having with Virtual SAN. I didn’t have an environment running capable of demonstrating Virtual SAN unfortunately so I shared the following videos with him. Considering I already did a blog post on this topic almost 2 years back I figured I would also publicly share these videos. Note that these videos are demos/previews, and no statement is made when or even if this technology will ever be released.

HA restarts in a DR/DA event

I received a couple of questions last week about HA restarts in the scenario where a full site failure has occurred or a part of the storage system has failed and needs to be taken over by another datacenter. Yes indeed this is related to stretched clusters and HA restarts in a DR/DA event.

The questions were straight forward, how does the restart time-out work and what happens after the last retry? I wrote about HA restarts and the sequence last year, so lets just copy and paste that here:

  • Initial restart attempt
  • If the initial attempt failed, a restart will be retried after 2 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 4 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 8 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 16 minutes of the previous attempt

You can extend the restart retry by increasing the value “das.maxvmrestartcount”. And then after every 15/16 minutes a new restart will be attempted. The question this triggered though is why would it even take 4 retries? The answer I got was: we don’t know if we will be able to fail over the storage within 30 minutes and if we will have sufficient compute resources…

Here comes the sweet part about vSphere HA, it actually is a pretty smart solution, it will know if VMs can be restarted or not. In this case as the datastore is not available there is absolutely no point in even trying and HA as such will not even bother. As soon as the storage becomes available though the restart attempts will start. Same applies to compute resource, if for whatever reason there is insufficient unreserved compute resources to restart your VMs then HA will wait for them to become available… nice right!?! Do note I emphasized the word “unreserved” as that is what HA cares about and not actually about used resources.

Re: SFD5 event and negativity / respect

Storage Field Day was hosted last week, and I typically like these events. Mainly because they have start-ups presenting their new technology and I like the flow of the sessions typically. I also like the interaction between the “delegates” and the vendors, well at times I do. There were several blog posts on the topic from people who are part of the, what I would call at this point, old boys club (yes there were women attending as well but you get the point) as that is what it felt like during the event. I wanted to comment on Bob’s article, but it looks like he is not looking for a healthy debate so I figured a blog post would be the best way to reply.

For those who don’t know: The sessions usually start with some background on the company, a problem description and then followed by a product session with demos and deep-dives where and when needed. Delegates will fire off questions during these sessions, sometimes this leads to a great discussion and sometimes it doesn’t.

This week, as some of you may have noticed on twitter, the event was held but personally I didn’t enjoy it very much. I think this tweet from my friend Jason Boche captures the feeling I had well:

What stood out to me, and by watching twitter to others as well, was the negativity from some of the delegates about some of the vendors. When the initial problem statement/marketing fluff would take too long the “boring” comments from the delegates started to pass by on twitter, especially during the start of the EMC session this was particularly bad. (Not the first time I have seen it… and definitely not trying to defend a vendor here as they could have known what they were up against and should know the first rule of presenting: know your audience.) Maybe even more annoying for the person watching the feed were the “inside jokes” and the “annecotes” / “incrowd discussions”. It really disrupted the flow of some of the sessions, and I think the PernixData session was the best example of it… it derailed too often leading to the presenter running out of time, or as Frank put it:

When several people commented on the tweets/atmosphere some heated debates kicked off. What stood out to me during these debates was that the “delegates” felt that they were doing the vendors a service and that the vendors should respect their time/effort. (I agree with them to  a certain extend) It was also mentioned various times that they were all experts and there was no need for basics/problem descriptions as all had done their due diligence and came well prepared. Personally I don’t believe that based on the questions asked, and personally I think everyone can learn something even from the basics, besides that I would argue that the Tech Field Day website is really clear on this:

Don’t assume all of the attendees are experts in your area. True to the spirit of Gestalt IT, we intentionally mix many IT disciplines at Tech Field Day to spark creativity and out-of-the-box thinking.

And on the topic of respect; it goes both ways and it seems that the Tech Field Day charter agrees with me on this as this is what it states in the section what it is like to be a delegate:

… just treat them with the thoughtfulness, professionalism and mutual respect they deserve.

But what is the underlying problem? What the delegates seem to have forgotten is the vendor’s perception… Why are these vendors there. What is their reason to participate? Are they looking for feedback from a handful of people on their product(s) and aiming to make road map changes  when needed… Or are they looking to introduce their product (or new version) to the world through the reach the event has? (note I said event and not delegates on purpose) I would expect it to be the latter, as the majority of companies presenting are presenting a new product or version and not a road map on top of that I would argue that if they are looking for direct product feedback they would do this in a closed setting with a limited group of people under a strict NDA. Even when that would not be the case, just as you are asking the vendor to be respectful of your time, you should also be respectful towards them for what they are investing. Which is probably a lot more than just time as without their sponsorship there would not be an event. (Assuming Mr Stephen Foskett is not a secret billionaire… But who knows :-)) Either way, think about what allows these events to exist. Without these companies investing, it would be difficult for Stephen to organize these. Also, think about the people watching the event online and even about the person sitting next to you. What is glaringly obvious to you, may not be so for the person sitting next to you simply because they come from a different background.

So why am I writing this, well hopefully so things will change for the better. As I stated, I like these events  as they are valuable to the community in my opinion and they provide a nice podium for start-ups to present themselves to the world, but that positive aspect should not get lost in unneeded debates and negativity. As that is what these events are about in my opinion, it is providing a service to the community and I hope it will stay that way.

PS: I have a lot of respect for the endless effort Stephen puts in organizing these sessions / events…

PernixData feature announcements during Storage Field Day

During Storage Field Day today PernixData announced a whole bunch of features that they are working on and will be released in the near future. In my opinion there were four major features announced:

  • Support for NFS
  • Network Compression
  • Distributed Fault Tolerant Memory
  • Topology Awareness

Lets go over these one by one:

Support for NFS is something that I can be brief about I guess; as it is what it says it is. Something that has come up multiple times in conversations seen on twitter around Pernix and it looks like they have managed to solve the problem and will support NFS in the near future. One thing I want to point out, PernixData does not introduce a virtual appliance in order to support NFS or create an NFS server and proxy the IOs, sounds like magic right… Nice work guys!

It gets way more interesting with Network compression. What is it, what does it do? Network Compression is an adaptive mechanism that will look at the size of the IO and analyze if it makes sense to compress the data before replicating it to a remote host. As you can imagine especially with larger block sizes (64K and up) this could significantly reduce the data that is transferred over the network. When talking to PernixData one of the questions I had was well what about the performance and overhead… give me some details, this is what they came back with as an example:

  • Write back with local copy only = 2700 IOps
  • Write back + 1 replica = 1770 IOps
  • Write back + 1 replica + network compression = 2700 IOps

As you can see the number of IOps went down when a remote replica was added. However, it went up again to “normal” values when network compression was enabled, of course this test was conducted using large blocksizes. When it came to CPU overhead it was mentioned that the overhead so far has been demonstrated to be negligible.You may ask yourself why, it is fairly simple: the cost of compression weighs up against the CPU overhead and results in an equal performance due to lower network transfer requirements. What also helps here is that it is an adaptive mechanism that does a cost/benefit analyses before compressing. So if you are doing 512 byte or 4KB IOs then network compression will not kick in, keeping the overhead low and the benefits high!

I personally got really excited about this feature: DFTM = Distributed Fault Tolerant Memory. Say what? Yes, distributed fault tolerant memory! FVP, indeed besides virtualizing flash, can now also virtualize memory and create an aggregated pool of resources out of it for caching purposes. Or in a more simplistic way: what they allow you to do is reserve a chunk of host memory as virtual machine cache. Once again happens on a hypervisor level, so no requirement to run a virtual appliance, just enable and go! I would want to point out though that there is “cache tiering” at the moment, but I guess Satyam can consider that as a feature request. Also, when you create an FVP cluster hosts within that cluster will either provide “flash caching” capabilities or “memory caching” capabilities. This means that technically virtual machines can use “local flash” resources while the remote resources are “memory” based (or the other way around). I would avoid this at all cost personally though as it will give some strange unpredictable performance result.

So what does this add? Well crazy performance for instance…. We are talking 80k IOps easily with a nice low latency of 50-200 microseconds. Unlike other solutions, FVP doesn’t restrict the size of your cache either. By default it will make a recommendation of 50% unreserved capacity to be used per host. Personally I think this is a bit high, as most people do not reserve memory this will typically result 50% of your memory to be recommended… but fortunately FVP allows you to customize this as required. So if you have 128GB of memory and feel 16GB of memory is sufficient for memory caching then that is what you assign to FVP.

Another feature that will be added is Topology Awareness. Basically what this allows you to do is group hosts in a cluster and create failure domains. An example may make this a bit easier to grasp: Lets assume you have 2 blade chassis each with 8 hosts, when you enable “write back caching” you probably want to ensure that your replica is stored on a blade in the other chassis… and that is exactly what this feature allows you to do. Specify replica groups, add hosts to the replica groups, easy as that!

And then specify for your virtual machine where the replica needs to reside. Yes you can even specify that the replica needs to reside within its failure domain if there are requirements to do so, but in the example below the other “failure domain” is chosen.

Is that awesome or what? I think it is, and I am very impressed by what PernixData has announced. For those interested, the SFD video should be online soon, and those who are visiting the Milan VMUG are lucky as Frank mentioned that he will be presenting on these new features at the event. All in all, an impressive presentation again by PernixData if you ask me… awesome set of features to be added soon!

Heartbleed Security Bug fixes for VMware

It seems to be patch Saturday as today a whole bunch of updates of products were released. All of these updates relate to the heartbleed security bug fix. There is no point in listing every single product as I assume you all know the VMware download page by now, but I do want to link the most commonly used for your convenience:

Time to update, but before you do… if you are using NFS based storage make sure to read this first before jumping straight to vSphere 5.5 U1a!