Operational Efficiency (You’re not Facebook/Google/Netflix)

Duncan Epping · Dec 8, 2014 ·

In previous roles, also before I joined VMware, I was a system administrator and a consultant. The tweets below reminded me of the kind of work I did in the past and triggered a train of thought that I wanted to share…

@jtmcarthur56 That's only achievable when you have 50,000 servers running one application

— Howard Marks @[email protected] (@DeepStorageNet) December 3, 2014

Howard has a great point here. For some reason many people started using Google, Facebook or Netflix as the prime example of operational efficiency. Startups use it in their pitches to describe what they can bring and how they can simplify your life, and yes I’ve also seen companies like VMware use it in their presentations.When I look back at when I managed these systems my pain was not the infrastructure (servers / network / storage)… Even though the environment I was managing was based on what many refer to as legacy: EMC Clariion, NetApp FAS or HP EVA. The servers were never really the problem to manage either, sure updating firmware was a pain but not my biggest pain point. Provisioning virtual machines was never a huge deal… My pain was caused by the application landscape many of my customers had.

At companies like Facebook and Google the ratio of Application to Admin is different as Howard points out. I would also argue that in many cases the applications are developed in-house and are designed around agility, availability and efficiency… Unfortunately for most of you this is not the case. Most applications are provided by vendors which don’t really seem to care about your requirements, they don’t design for agility and availability. No, instead they do what is easiest for them. In the majority of cases these are legacy monolithic (cr)applications with a simple database which all needs to be hosted on a single VM and when you get an update that is where the real pain begins. At one of the companies I worked for we had a single department using over 80 different applications to calculate mortgages for the different banks and offerings out there, believe me when I say that that is not easy to manage and that is where I would spent most of my time.

I do appreciate the whole DevOps movement and I do see the value in optimizing your operations to align with your business needs, but we also need to be realistic. Expecting your IT org to run as efficient as Google/Facebook/Netflix is just not realistic and is not going to happen. Unless of course you invest deep and develop the majority of your applications in-house, and do so using the same design principles these companies use. Even then I doubt you would reach the same efficiency, as most simply won’t have the scale to reach it. This does not mean you should not aim to optimize your operations though! Everyone can benefit from optimizing operations, from re-aligning the IT department to the demands of todays world, from revising procedures… Everyone should go through this motion, constantly, but at the same time stay realistic. Set your expectations based on what lands on the infrastructure as that is where a lot of the complexity comes in.

Comments

Howard Marks (@DeepStorageNet) says

8 December, 2014 at 17:35

Duncan,

I’m glad my conversation with John McAurthur got you thinking about the delusion that corporate users can become Facebook overnight. I’ve written about this at NetworkComputing http://www.networkcomputing.com/data-centers/the-facebook-data-center-delusion/a/d-id/1316461?

Happy to continue the conversation where ever we can.

– Howard
rnelson0 says

8 December, 2014 at 18:12

Duncan,

I’d love to see people through the word “efficient” out of their dictionary. It’s about throughput. You can have massive throughput when it’s so interchangeable, and esp when it’s all your company does, and focusing on efficiency is the problem. Facebook’s bottleneck has been getting 50k servers to work, most companies’ bottlenecks are in the human components. Having all your servers run efficiently doesn’t the op your bottom line if people can’t make good use of them. Focus on the right thing.
Vince M. says

8 December, 2014 at 19:15

Agree wholeheartedly. The demands of our applications take up so much of our troubleshooting time and we have Citrix on top of the matter to complicate things even more. Deploying one application that can be scaled easily would be a cake walk.
Kyle Hailey says

8 December, 2014 at 20:59

Good explanation of some of the enterprise challenges with DevOps. Heard Andi Mann saying there should be an Enterprise DevOps movement and this helps explain why. Yes, DevOps movement has a lot to offer everyone but it will be more efficient learning lessons from someone who has a similar environment than different. If one is stuck with legacy 3rd party apps, then the approach will be different than someone who is building it all in house from scratch. I know people will say it’s all about CAMS, culture, automation, measurement and agile, which I agree with but the actual grit and gristle of how to get there will be different. I’m more interested in those hands getting dirty details than the lofty stories. Ok, the lofty stories are good a couple of times but after that it’s time to hear about when someone rolled their sleeves up and got working – what tools, what hardware, code examples, who took charge, where was resistance, how was it overcome, how long did it take, how much did it cost before there was a return on investment etc

– Kyle Hailey
Michael Ryom (@MichaelRyom) says

8 December, 2014 at 21:27

Well I clearly understand that normal business does have a lot of legacy app – and it doesnt matter if they are build in house or not – they are build crappy. But that has to be on the Architects and business units building them. Doing only infrastructure ops around vcloud suite – i dont belive that its true that people or legacy apps steal the most time.

We had a single hardware vendor providing the blades and they were all the same specs. We went on to mangement tell i’m them – we need to change hardware vendor cause q/a and support was way to bad. They couldnt see the problem from there reports (from servicedesk/sla). So we started doing stats on when and what the reason was for the PSOD. From our stats it was clear that we had a PSOD on avg. Once a week (only 100+ blades setup). But more import the stats showed that 90% of the PSOD was caused by driver/firmware, 6% by network(mainly due to nexus 1000v), 3% due to storage and only once did we have an issue which was due to an ESXI error.

every thing is on vmware HCL.

I disagre but cause not many things scale well or are build to the task it needs to perform – that is some thing we all could learn from Fb or google. I’m say that we should build our own network switches or anything like that.

But look at the forthcoming releases from vmware – for the first time vmware is gonna deliver scale out mangement platforms – this is a reason to stay with vmware and not go hyper-v or open sour ce.

They same is true for server/storage/app/network etc. If you think at scale you are much more likely have a platform which does great.
- Duncan Epping says
  
  8 December, 2014 at 23:10
  
  Depends on the separation of duty and your responsibilities right… if you only own the virt layer and are responsible for 5000 VMs yeah sure, faulty hardware would be a bigger painpoint. If you have 500 VMs and own everything up to the config / install / update of the app then things change.
lynxbat says

8 December, 2014 at 23:59

Few comments:

1) The scale of sysadmin to app or server is not the most important efficiency. It is but one depending on the size of the org. Some orgs care more about dev cycles saved than ops.

2) Efficiencies in skipping technology hurdles and going straight to delivering the application will always win. Google just realized this 10 years ago and made a massive effort to build on that idea. The most fanatical Docker customers are the ones that use it. Reminds me of VMware customers back when it was changing things.

3) Go back long enough and almost any technology was used in limited quantity by the ones with the resources or wisdom to do it first. Over time things commoditize out. We saw it with mainframe to x86, Microsoft stacks to composable open source, and more. We will start to see it now with the same tech Google started 8 years ago (containers + scheduling) becoming commodity pieces anyone can install. This will change the way people build apps internally and externally. And just like this was a huge part of how Google was able to achieve efficiency through low-friction app deployment; it will be the same for everyone else as a lesser scale.

4) Not all the crazy stuff the big guys do will happen for enterprise. But they will commoditize into things that are common for everyone. Scheduling, open source commodity switches, containerized apps, distributed pooling, cheap storage layers, and more.

It is not that we are all going to become Google or Facebook. It is some of the problems they have already solved are going to evolve into solutions we take advantage of in some respect. And yes, this will make thing easier just like virtualization did. Slowly and steadily things that make technology less of a roadblock will win. And then we start over building on top of the new foundation.

And for the record, I think VMware is a company extremely well suited to helping us achieve much of the above.

.nick
- Duncan Epping says
  
  9 December, 2014 at 14:34
  
  Thanks for your insightful comment Nick. Very much agree with you on this. Containerized apps do have a huge potential, but the challenge will be broad adoption of those who develop the applications. And agreed, a virtual infrastructure is an awesome landing place for containers, and with the right level of integration between storage / networking / compute / security could take things to the next level.
Victor da Costa says

9 December, 2014 at 12:38

Duncan, i have really appreciate your text and i do agree with some of your statements, especially with the vision about enterprises trying to be at the same level of Google/Facebook isn’t realistic.

What i could state with my return from the field is: the efficiency level is directly associated with the capacity of the company to transform himself. I’ve seen the 3 cases below in previous companies where i worked:

1- Big-size (multi billions) : Provisioning process of a single server taking weeks and months for a simple two-tier service (2x WEB, 2x DB) due the people and process chain made of multiple different administrators: Network, Storage, Compute, DC Facilities, Compute, Application, Database and finally non IT-aware project managers.
* It can be even worst with global projects.

2- Mid-size (Provider): Provision process more agile, as the administrators for all the different items are part of the same division/team.

3- Small-size: Offshoring of the management and hosting. Since most of the times the environment itself is monolithic, but can take advantages of providers processes and knowledge.

Sincerely,
Victor da Costa
- Duncan Epping says
  
  9 December, 2014 at 14:52
  
  Agreed, I was more talking about mid-size and up then SMB or smaller midsize market.
mimmus says

10 December, 2014 at 12:47

Similar arguments are valid for the growing hype about Docker: not all applications can be containerized and scale linearly simply adding other containers. Microservices apps still are a chimera, at least here in Italy.
Look at this post of mine:
https://groups.google.com/forum/#!topic/docker-user/pNaBYJkmnAA
- Duncan Epping says
  
  17 December, 2014 at 17:58
  
  Can not agree more, it is a conversation I have been having with more and more folks…

Related

Reader Interactions

Comments