In the last 6-12 months I have heard many people making statements around how the application landscape should change. Whether it is a VC writing on GigaOm about how server CPU utilization is still low and how Linux Containers and new application architectures can solve this. (There are various reasons CPU util is low by the way, ranging from being memory constraints to storage perf constraints and by design) Or whether it is network administrators simply telling the business that their application will need to cope with a new IP-Addresses after a disaster recovery fail-over situation. Although I can see where they are coming from I don’t think the world of applications is as simple as that.
Sometimes it seems that we forget something which is fundamental to what we do. IT as it exists today provides a service to its customers. It provides an infrastructure which “our / your customers” can consume. This infrastructure, to a certain point, should be architected based on the requirements of your customer. Why? What is the point of creating a foundation which doesn’t meet the requirements of what will be put on top of it? That is like building a skyscraper on top of the foundation of a house, bad things are bound to happen! Although I can understand why we feel our customers should change I do not think it is realistic to expect this will happen over night. Or even if it is realistic to ask them to change?
Just to give an example, and I am not trying to pick on anyone here, lets take this quote from the GigaOm article:
Server virtualization was supposed to make utilization rates go up. But utilization is still low and solutions to solve that will change the way the data center operates.
I agree that Server Virtualization promised to make the utilization rates go up, and they did indeed. And overall utilization may still be low, although it depends on who you talk to I guess or what you include in your numbers. Many of the customers I talk to are up to around 40-50% utilized from a CPU perspective, and they do not want to go higher than that and have their reasons for it. Was utilization the only reason for them to start virtualizing? I would strongly argue that it was not the only reason, there are many others! Reducing the number of physical servers to manage, availability (HA) of their workloads, transportability of their workloads, automation of deployments, disaster recovery, maintenance… the reasons are almost countless.
I guess you will need to ask yourself what all of these reasons have in common? They are non-disruptive to the application architecture! Yes there is the odd application that cannot be virtualized, but the majority of all X86 workloads can be, without the need to make any changes to the application! Clearly you would have to talk to the app owner as their app is being migrated to a different platform, but there will be hardly any work for them associated with this migration.
Oh I agree, everything would be a lot better when the application landscape was completely overhauled and magically all applications use a highly available and scalable distributed application framework. Everything would be a lot better when all applications magically were optimized for the infrastructure they are consuming, applications can handle instant ip-address changes, applications can deal with random physical servers disappearing. Reality unfortunately is that that is not the case today, and for many of our customers in the upcoming years. Re-architecting an application, which often for most app owners comes from a 3rd party, is not something that happens overnight. Projects like those take years, if they even successfully finish.
Although I agree with the conclusion drawn by the author of the article, I think there is a twist to it:
It’s a dynamic time in the data center. New hardware and infrastructure software options are coming to market in the next few years which are poised to shake up the existing technology stack. Should be an exciting ride.
Reality is that we deliver a service, a service that caters for the needs of our customers. If our customers are not ready, or even willing, to adapt this will not just be a hurdle but a showstopper for many of these new technologies. Being a disruptive (I’m not a fan of the word) technology is one thing, causing disruption is another.
If most servers sold are used for virtualisation (and since they stopped making 32bit servers, there are very few cases why they shouldn’t), perhaps server resources should be geared towards the way these virtualisation technologies work, rather than the tech trying to make the most of the hardware. I’m thinking that less effort should be put into making CPUs faster and more effort into faster bus speeds and making cheap RAM, etc…
There are other bottlenecks. Storage IO (Queue Depths etc). There are failure zone constraints (when I start having hosts with 100 Servers, in a 8 host cluster, having 100 servers rebooting is going to cause a crazy burst in resources, that is cheaper to spead down up front, by spreading out). There is also, the concern that the CPU is the most obnoxious thing in a server to upgrade. Its going to live the full life of the server, while Memory going from 86, to 192, to 256 may be a reality. New 64 bit applications (legacy apps are migrating every day) are reducing IO and compute with giant memory caches. The equasition is constantly moving, and going under spec on CPU (one of the cheaper parts of the server) can be a costly mistake.
My reaction ramble to the same article (http://thenicholson.com/containers-future/) sums up some of the concerns, and why I think he’s got the wrong KPI’s.
Great read!
Not to mention the real 800 LB gorilla in the room. Average utilization is a meaningless number. If your company is 80% US and 80% of the users work first shift then your servers are going to be underutilized from 8PM to 8AM eastern time.
The data center is designed for peak not average utilization and then there’s overhead. If we design a data center for 100% utilization at peak and demand is even a little more than projected we loose our jobs. If we design for 70% at peak average will probably be in the 25-30% range because the slow periods will be down around 10%.
Sorry not overhead but headroom.
Spot on. That said using public cloud to burst for the real outliers (the 2 days a year stuff) can help even things out in some use cases. AWS as an idea started as Amazon basically selling off their spare capacity they had to keep for Christmas season was the joke I heard.