If only the world of applications was as simple as that…

In the last 6-12 months I have heard many people making statements around how the application landscape should change. Whether it is a VC writing on GigaOm about how server CPU utilization is still low and how Linux Containers and new application architectures can solve this. (There are various reasons CPU util is low by the way, ranging from being memory constraints to storage perf constraints and by design) Or whether it is network administrators simply telling the business that their application will need to cope with a new IP-Addresses after a disaster recovery fail-over situation. Although I can see where they are coming from I don’t think the world of applications is as simple as that.

Sometimes it seems that we forget something which is fundamental to what we do.  IT as it exists today provides a service to its customers. It provides an infrastructure which “our / your customers” can consume. This infrastructure, to a certain point, should be architected based on the requirements of your customer. Why? What is the point of creating a foundation which doesn’t meet the requirements of what will be put on top of it? That is like building a skyscraper on top of the foundation of a house, bad things are bound to happen! Although I can understand why we feel our customers should change I do not think it is realistic to expect this will happen over night. Or even if it is realistic to ask them to change?

Just to give an example, and I am not trying to pick on anyone here, lets take this quote from the GigaOm article:

Server virtualization was supposed to make utilization rates go up. But utilization is still low and solutions to solve that will change the way the data center operates.

I agree that Server Virtualization promised to make the utilization rates go up, and they did indeed. And overall utilization may still be low, although it depends on who you talk to I guess or what you include in your numbers. Many of the customers I talk to are up to around 40-50% utilized from a CPU perspective, and they do not want to go higher than that and have their reasons for it. Was utilization the only reason for them to start virtualizing? I would strongly argue that it was not the only reason, there are many others! Reducing the number of physical servers to manage, availability (HA) of their workloads, transportability of their workloads, automation of deployments, disaster recovery, maintenance… the reasons are almost countless.

I guess you will need to ask yourself what all of these reasons have in common? They are non-disruptive to the application architecture! Yes there is the odd application that cannot be virtualized, but the majority of all X86 workloads can be, without the need to make any changes to the application! Clearly you would have to talk to the app owner as their app is being migrated to a different platform, but there will be hardly any work for them associated with this migration.

Oh I agree, everything would be a lot better when the application landscape was completely overhauled and magically all applications use a highly available and scalable distributed application framework. Everything would be a lot better when all applications magically were optimized for the infrastructure they are consuming, applications can handle instant ip-address changes, applications can deal with random physical servers disappearing. Reality unfortunately is that that is not the case today, and for many of our customers in the upcoming years. Re-architecting an application, which often for most app owners comes from a 3rd party, is not something that happens overnight. Projects like those take years, if they even successfully finish.

Although I agree with the conclusion drawn by the author of the article, I think there is a twist to it:

It’s a dynamic time in the data center. New hardware and infrastructure software options are coming to market in the next few years which are poised to shake up the existing technology stack. Should be an exciting ride.

Reality is that we deliver a service, a service that caters for the needs of our customers. If our customers are not ready, or even willing, to adapt this will not just be a hurdle but a showstopper for many of these new technologies. Being a disruptive (I’m not a fan of the word) technology is one thing, causing disruption is another.

Startup News Flash part 5

After the VMworld storm has slowly died down it is back to business again… This also means less frequent updates, although we are slowly moving towards VMworld Barcelona and I suspect there will be some new announcements at that time. So what happened in the world of flash/startups the last two weeks? This is Startup News Flash part 5, and it seems to be primarily around either funding rounds or acquisitions.

Probably one of the biggest rounds of funding I have seen for a “new world” storage company… $ 150 million. Yes, that is a lot of money. Congrats Pure Storage! I would expect an IPO at some point in the near future, and hopefully they will be expanding their EMEA based team. PureStorage is one of those companies which has always intrigued me. As GigaOm suggests this boosts will probably be used to lower the prices, but personally I would prefer a heavy investment in things like disaster recovery and availability. It is an awesome platform, but in my opinion it needs dedupe aware sync and a-sync replication! That should include VMware SRM integration from day 1 of course!

Flash is hot… Virident just got acquired by Western Digital for $685 million. Makes sense if you consider WD is known as the “hard disk” company, they need to keep on growing their business and the business of hard disks is going to be challenging in the upcoming years with SSD becoming cheaper and cheaper. Considering this is the second acquisition (sTec being the other) related to flash by WD you can say that they mean business.

I just noticed that Cisco announced they intent to acquire Whiptail for $415 million in cash. Interesting to see Cisco moving in to the storage space and definitely a smart move if you ask me.  With UCS for compute and Whiptail for storage they will be able to deliver the full stack considering they more or less already own the world of networking. Will be interesting to see how they integrate it in their UCS offerings. For those who don’t know, Whiptail is an all flash array (afa) which leverages a “scale out” approach, so start small and increase capacity by adding new boxes. Of course they offer most functionality other AFA vendors do, for more details I recommend reading Cormac’s excellent article.

To be honest, no one knew what to expect from this public VSAN Beta announcement. Would we get a couple of hundred registrations or thousands? Well I can tell you that they are going through the roof, make sure to register though if you want to be a part of this! Run it at home nested, run it on your test cluster at the office, do whatever you want with it… but make sure to provide feedback!


Startup News Flash part 4

This is the fourth part already of the Startup News Flash, we are in the middle of VMworld and of course there were many many announcements. I tried to filter out those which are interesting, as mentioned in one of the other posts if you feel one is missing leave a comment.

Nutanix announced version 3.5  of their OS last week. The 3.5 release contains a bunch of new features, one of them being what they call the “Nutanix Elastic Deduplication Engine”. I think it is great they added this feature is ultimately it will allow you to utilize your flash and RAM tier more efficiently. The more you can cache the better right?! I am sure this will result in a performance improvement in many environment, you can imagine that especially for VDI or environments where most VMs are based on the same template this will be the case. What might be worth knowing is that Nutanix dedupe is inline for their RAM and flash tier and then for their magnetic disks is happening in the background. Nutanix also announced that besides supporting vSphere and KVM they also support Hyper-V as of now, which is great for customers as it offers you choice. On top of all that, they managed to develop a new simplified UI and a rest-based API allowing for customers to build a software defined datacenter! Also worth noting is that they’ve been working on their DR story. They’ve developed a Storage Replication Adapter which is one of the components needed to implement Site Recover Manager with array based replication. They also optimized their replication technology by extending their compression technology to that layer. (Disclaimer: the SRA is not listed on the VMware website, as such it is not supported by VMware. Please validate the SRM section of the VMware website before implementing.)

Of course an update from a flash caching vendor, this time it is Proximal Data who announced the 2.0 version of their software. AutoCache 2.0 includes role-based administration features and multi-hypervisor support to meet the specific needs of cloud service providers. Good to see that multi hypervisor and cloud is part of the proximal story soon. I like the Proximal aggressive price point. It starts at $999 per host for flash caches less than 500GB, which is unique for a solution which does both block and file caching. Not sure I agree with Proximal’s stance with regards to write-back caching and “down-playing” 1.0 solutions, especially not when you don’t offer that functionality yourself or were a 1.0 version yesterday.

I just noticed this article published by Silicon Angle which mentions the announcement of the SMB Edition of FVP, priced at a flat $9,999, supports up to 100 VMs across a maximum of four hosts with two processors and one flash drive each. More details to be found in this press release by PernixData.

Also something which might interest people is Violin Memory filing for IPO. It had been rumored numerous times, but this time it seems to be happening for real. The Register has an interesting view by the way. I hope it will be a huge success for everyone involved!

Also want to point people again to some of the cool announcements VMware did in the storage space, although far from being a startup I do feel this is worth listing here again: introduction to vSphere Flash Read Cacheintroduction to Virtual SAN.

Startup News Flash part 3

Who knew so quickly after part 1 and part 2 there would be a part 3, I guess not strange considering VMworld is coming up soon and there was a Flash Memory Summit last week. It seems that there is a battle going on in the land of the AFA’s (all flash arrays), it isn’t about features / data services as one would expect. No they are battling over capacity density aka how many TBs can I cram in to a single U, not sure how relevant this is going to be over time, yes it is nice to have dense configurations, yes it is awesome to have a billion IOps in 1U but most of all I am worried about availability and integrity of my data.  So instead of going all out on density, how about going all out on data services? Not that I am saying density isn’t useful, it is just… Anyway, I digress…

One of the companies which presented at Flash Memory Summit was Skyera. Skyera announced an interesting new product called skyEagle. Another all-flash array is what I can hear many of you thinking, and yes I thought exactly the same… but skyEagle is special compared to others. This 1u box manages to provide 500TB of flash capacity, now that is 500TB of raw capacity. So just imagine what that could end up being after Skyera’s hardware-accelerated data compression and data de-duplication has done its magic. Pricing wise? Skyera has set a list price for the read-optimized half petabyte (500 TB) skyEagle storage system of $1.99 per GB, or $.49 per GB with data reduction technologies. More specs can be found here. Also, I enjoyed reading this article on The Register which broke the news…

David Flynn (Former Fusion-io CEO) and Rick White (Fusion-io founder) started a new company called Primary Data. The WallStreet Journal reported on this and more or less revealed what they will be working on:”that essentially connects all those pools of data together, offering what Flynn calls a “unified file directory namespace” visible to all servers in company computer rooms–as well as those “in the cloud” that might be operatd by external service companies.” This kind of reminds me of Aetherstore, or at least the description aligns with what Aetherstore is doing. Definitely a company worth tracking if you ask me.

One of the companies I did an introduction post on is Simplivity. I liked their approach to converged as it not only combines just compute and storage, but they also included backup, replication, snapshots, dedupe and cloud integration. They announced this week an update on their Omnicube CN-3000 platform and introduced two new platforms Omnicube CN-2000 and the Omnicube CN-5000. So what are these two new Omnicubes? Basically the CN-5000 is the big brother of the CN-3000 and the CN-2000 is its kid brother. I can understand why they introduced these as it will help expanding the target audience, “one size fits all” doesn’t work when the cost for “all” is the same and so the TCO/ROI changes based on your actual requirements, but in a negative way. One of the features that made SimpliVity unique that has had a major update is the OmniStack Accelerator, this is a custom designed PCIe card that does inline dedupe and compression. Basically an offload mechanism for dedupe and compression where others are leveraging the server CPU. Another nice thing SimpliVity added is support for VAAI. If you are interested in getting to know more, two white papers were released which are interesting to read: a deep dive by Hans de Leenheer and Stephen Foskett and one with a focus on “data management” by Howard Marks.

A bit older announcement, but as I spoke with these folks this week and they demoed their GA product I figured I would add them to the list. Ravello Systems developed a cloud hypervisor which abstracts your virtualization layer and allows you to move virtual machines / vApps between clouds (private and public) without the need to rebuild your virtual machines or guest OS’s. What I am saying is that they can move your vApps from vSphere to AWS to Rackspace without painful conversions every time. Pretty neat right? On top of that, Ravello is your single point of contact meaning that they are also a cloud broker. You pay Ravello and they will take care of AWS / RackSpace etc. of course they allow you to do stuff like snapshotting, cloning and create complex network configurations if needed. They managed to impress me during the short call we had, and if you want to know more I recommend reading this excellent article by William Lam or visit their booth during VMworld!

That is it for part 3, I bet I will have another part next week during or right after VMworld as press releases are coming in every hour at this point. Thanks for reading,

Top 3 Skills Your IT Team Needs to Prepare for the Cloud

I just wrote an article for the vCloud blog which is titled “Top 3 Skills Your IT Team Needs to Prepare for the Cloud“. Although it is far less technical then I normally post here, it might still be worth a read for those who are considering a private or hybrid cloud, and even consuming a public cloud. Below is a short out take from the post, but for the full post you will need to head over to the vCloud Blog.

When I am talking about skills, I am not only talking about your team’s technical competency. For a successful adoption of cloud, it is of a great importance that the silos within the IT organization are broken down, or at a bare minimum bridged. Now more than ever, inter- and intra-team communication is of utmost importance. Larger organizations have realized this over the years while doing large virtualization projects, leading many to introduce a so-called “Center of Excellence.” This Center of Excellence was typically a virtual team formed out of the various teams (network, storage, security, server, application, business owners), and would ensure everyone’s requirements were met during the course of the project. With cloud, a similar approach is needed.

What is static overhead memory?

We had a discussion internally on static overhead memory. Coincidentally I spoke with Aashish Parikh from the DRS team on this topic a couple of weeks ago when I was in Palo Alto. Aashish is working on improving the overhead memory estimation calculation so that both HA and DRS can be even more efficient when it comes to placing virtual machines. The question was around what determines the static memory and this is the answer that Aashish provided. I found it very useful hence the reason I asked Aashish if it was okay to share it with the world. I added some bits and pieces where I felt additional details were needed though.

First of all, what is static overhead and what is dynamic overhead:

  • When a VM is powered-off, the amount of overhead memory required to power it on is called static overhead memory.
  • Once a VM is powered-on, the amount of overhead memory required to keep it running is called dynamic or runtime overhead memory.

Static overhead memory of a VM depends upon various factors:

  1. Several virtual machine configuration parameters like the number vCPUs, amount of vRAM, number of devices, etc
  2. The enabling/disabling of various VMware features (FT, CBRC; etc)
  3. ESXi Build Number

Note that static overhead memory estimation is calculated fairly conservative and we take a worst-case-scenario in to account. This is the reason why engineering is exploring ways of improving it. One of the areas that can be improved is for instance including host configuration parameters. These parameters are things like CPU model, family & stepping, various CPUID bits, etc. This means that as a result, two similar VMs residing on different hosts would have different overhead values.

But what about Dynamic? Dynamic overhead seems to be more accurate today right? Well there is a good reason for it, with dynamic overhead it is “known” where the host is running and the cost of running the VM on that host can easily be calculated. It is not a matter of estimating it any longer, but a matter of doing the math. That is the big difference: Dynamic = VM is running and we know where versus Static = VM is powered off and we don’t know where it might be powered!

Same applies for instance to vMotion scenarios. Although the platform knows what the target destination will be; it still doesn’t know how the target will treat that virtual machine. As such the vMotion process aims to be conservative and uses static overhead memory instead of dynamic. One of the things or instance that changes the amount of overhead memory needed is the “monitor mode” used (BT, HV or HWMMU).

So what is being explored to improve it? First of all including the additional host side parameters as mentioned above. But secondly, but equally important, based on the vm -> “target host” combination the overhead memory should be calculated. Or as engineering calls it calculating “Static overhead of VM v on Host h”.

Now why is this important? When is static overhead memory used? Static overhead memory is used by both HA and DRS. HA for instance uses it with Admission Control when doing the calculations around how many VMs can be powered on before unreserved resources are depleted. When you power-on a virtual machine the host side “admission control” will validate if it has sufficient unreserved resource available for the “static memory overhead” to be guaranteed… But also DRS and vMotion use the static memory overhead metric, for instance to ensure a virtual machine can be placed on a target host during a vMotion process as the static memory overhead needs to be guaranteed.

As you can see, a fairly lengthy chunk of info on just a single simple metric in vCenter / ESXTOP… but very nice to know!

Automating vCloud Director Resiliency whitepaper released

About a year ago I wrote a whitepaper about vCloud Director resiliency, or better said I developed a disaster recovery solution for vCloud Director. This solution allows you to fail-over vCloud Director workloads between sites in the case of a failure. Immediately after it was published various projects started to implement this solution. As part of our internal project our PowerCLI guru’s Aidan Dalgleish and Alan Renouf started looking in to automating the solution. Those who read the initial case study probably have seen the manual steps required for a fail-over, those who haven’t read this white paper first

The manual steps in the vCloud Director Resiliency whitepaper is exactly what Alan and Aidan addressed. So if you are interested in implementing this solution then it is useful to read this paper new white paper about Automating vCloud Director Resiliency as well. Nice work Alan and Aidan!