vcdx

Underlying Infrastructure for your pets and cattle

Duncan Epping · Oct 10, 2014 ·

Last week on twitter Maish asked a question that got me thinking. Actually, I have been thinking about this for a while now. The question deals with how you design your infrastructure for the various types of workloads (pets and cattle). Whether your workload falls in the “pet” category or the “cattle” category. (If you are not familiar with the terms pets/cattle read this article by Massimo)

Pets vs. cattle – I think that people are mistaken as to what that exactly means for the underlying infrastructure.

— Maish Saidel-Keesing (@maishsk) October 5, 2014

I asked Maish what it actually means for your infrastructure, and at the same time I gave it some more thought over the last week. Cattle is the type of application architecture which handles failures by providing a distributed solution, it scales out instead of up typically, the VMs are disposable as they usually won’t hold state. With pets this is different, they typically scale up and resiliency is often provided by either a 3rd party clustering mechanism or the infrastructure under neath, in many cases contain state and recoverability is key. As you can imagine both types of workloads have different requirements of the infrastructure. Going back to Maish’s question, I guess the real question is if you can afford the “what it means for the underlying infrastructure”. What do I mean with that?

If you look at the requirements of both architectures, you could say that “pets” will typically demand more from the underlying infrastructure when it comes to resiliency / recoverability. Cattle will have less demands from that perspective but flexibility / agility is more important. You can imagine that you could implement two different infrastructure architectures for these specific workloads, but does this make sense? If you are Netflix, Google, Youtube etc then it may make sense to do this due to the scale they operate at and the fact that IT is their core business. In those cases “cattle” is what drives the business, and there are back-end systems. Reality is though that for the majority this is not the case. Your environment will be a hybrid, and more than likely “pets” will have the overhand as that is simply what the state of the world is today.

That does not mean they cannot co-exist. That is what I believe is the true strength of virtualization, it allows you to run many different types of workloads on the same infrastructure. Whether that is your Exchange environment or your in-house developed scale out web application which serves hundreds of thousands of customers does not make a difference to your virtualization platform. From an operational perspective the big benefit here is that you will not have to maintain different run books to manage your workloads. From an ops perspective they will look the same on the outside, although they may differ on the inside. What may change though is the services required for those systems, but with the rich ecosystem available for virtualization platforms these days that should not be a problem. Need extra security / microsegmentation? VMware NSX can provide the security isolation needed to run these applications smoothly. Sub milliseconds latency requirements? Plenty of storage / caching solutions out there that can deliver this!

Will the application architecture shift that is happening right now impact your underlying infrastructure? We have made these huge steps in operational efficiency in the last 5 years, and with SDDC we are about to take the next big step, and although I do believe that the application architecture shift will result in infrastructure changes lets not make the same mistakes we made in the past by creating these infrastructure silos per workload. I strongly believe that repeatability, consistency, reliability and predictability are key and this starts with a solid, scalable and trusted foundation (infrastructure).

RE: The VCDX candidates advantage over the panellists

Duncan Epping · Oct 6, 2014 ·

I was reading Josh Odger’s post on the VCDX Defense. Josh’s article can be summarised with the following part:

As a result, the candidate should be an expert in the design being presented and answering questions from the panel about the design should not be intimidating.

Having gone through the process myself, knowing many of the VCDX’s and having been on countless of panels I completely disagree with Josh. Sure, you do need to know your design inside/out… but, it is not about “who’s having an advantage”, the panel member is not there to fail or pass the candidate… they are there to assess your skills as an architect!

If you look at the defense day there are three parts:

Defend your design
Design scenario
Troubleshooting scenario

For the design and troubleshooting scenario you get a random exercise, so you have no prior knowledge of what will be asked. When it comes to defending your design of course you will know your design (hopefully) better then anyone else. However, the questions you get will not necessarily be about the specifics or details of your design. The VCDX panel is there to assess your skills as an architect and not your “fact cramming skills”. A good panel will ask a lot of hypothetical questions like:

Your design uses NFS based storage, how would FC connected storage have changed your design?
Your design is based on capacity requirements for 80 virtual machine, what would you have done differently when the requirement would be 8000 virtual machines?
Your design …

So when you do mock exams, prepare for these types of hypothetical questions. That is when you really start to understand the impact decisions can have, and when during your defense you get one of these questions and you do not know the answer make sure you guide the panel through your thought process. That is what differentiates someone who can learn facts (VCP exam) and someone who can digest them, understand them and apply them in different scenarios (VCDX exam).

As I stated, it may sound like that you knowing your design inside out means having a big advantage over the panel members but it probably isn’t… that is not what they are testing you on! Your ability to assess and adapt are put through the wringer, your skills as an architect are tested thoroughly and that is where you will need to do well.

Good luck!

It is all about choice

Duncan Epping · Sep 25, 2014 ·

The last couple of years we’ve seen a major shift in the market towards the software-defined datacenter. This has resulted in many new products, features and solutions being brought to market. What struck me though over the last couple of days is that many of the articles I have read in the past 6 months (and written as well) were about hardware and in many cases about the form factor or how it has changed. Also, there are the posts around hyper-converged vs traditional, or all flash storage solutions vs server side caching. Although we are moving towards a software-defined world, it seems that administrators / consultants / architects still very much live in the physical world. In many of these cases it even seems like there is a certain prejudice when it comes to the various types of products and the form factor they come in and whether that is 2U vs blade or software vs hardware is beside the point.

When I look at discussions being held around whether server side caching solutions is preferred over an all-flash arrays, which is just another form factor discussion if you ask me, the only right answer that comes to mind is “it depends”. It depends on what your business requirements are, what your budget is, if there are any constraints from an environmental perspective, hardware life cycle, what your staff’s expertise / knowledge is etc etc. It is impossible to to provide a single answer and solution to all the problems out there. What I realized is that what the software-defined movement actually brought us is choice, and in many of these cases the form factor is just a tiny aspect of the total story. It seems to be important though for many people, maybe still an inheritance from the “server hugger” days where hardware was still king? Those times are long gone though if you ask me.

In some cases a server side caching solutions will be the perfect fit, for instance when ultra low latency and use of existing storage infrastructure is a requirement. In other cases bringing in an all-flash array may make more sense, or a hyper-converged appliance could be the perfect fit for that particular use case. What is more important though is how these components will enable you to optimize your operations, how these components will enable you to build that software-defined datacenter and help you meet the demands of the business. This is what you will need to ask yourself when looking at these various solutions, and if there is no clear answer… there is plenty of choice out there, stay open minded and go explore.

Scale out building block style, or should I say (yellow) brick style!

Duncan Epping · Mar 2, 2012 ·

I attended VMware PEX a couple of weeks back and during some of the sessions and discussions I had after the sessions I realized that many customers out there still design using legacy concepts. Funny thing is that this mainly applies to server virtualization projects and to a certain extend to cloud environments.It appears that designing in building blocks is something that EUC side of this world has embraced a long time ago.

I want to use this post to get feedback about your environments. How you scale up / scale out. I discussed a concept with one of the PEX attendees which I want to share. (This is no rocket science or something revolutionary, let that be clear.) This attendee worked for one of our partners, a service provider in the US, and was responsible for creating a scalable architecture for an Infrastructure as a Service (IaaS) offering.

The original plan they had was to build an environment that would allow for 10.000 virtual machines. Storage, networking and compute sizing and scaling was all done with these 10k VMs in mind. However it was expected that in the first 12 months only 1000 virtual machines would be deployed. You can imagine that internally there was a lot of debate around the upfront investment. Especially the storage and compute platform was a huge discussion. What if the projections where incorrect, what if 10k virtual machines was not realistic in three years. What if the estimated compute and IOps requirements where incorrect? This could lead to substantial underutilization of the environment, especially in IaaS where it is difficult to predict how the workload will behave this could lead to a significant loss. On top of that, they were already floor space constraint… which made it impossible to scale / size for 10k virtual machines straight from the start,

During the discussion I threw the building block (pod, stack, block… all the same) method on the table, as mentioned not unlike what the VDI/EUC folks have been doing for years and not unlike some of you have been preaching. Kris Boyd mentioned this in his session at Partner Exchange and let me quote him on this as I fully agree with his statemenet “If you know what works well on a certain scale, why not just repeat that?!” The advantage to that would be that the costs are predictive, but even more important for the customers and ops team the result of the implementation would be predictive. So what was discussed and what will be the approach for this particular environment, or at least will be the proposed as a possible architecture?

First of all a management cluster would be created. This is the mothership of the environment. It will host all vCenter virtual machines, vCloud Director, Chargeback, Databases etc. This environment does not have high IOps requirements or high compute requirements. It would be implemented on a small storage device, NFS based storage that is. The reason it was decided to use NFS is because of the fact that the vCloud Director cells require NFS to transport files. Chris Colotti wrote an article about when this NFS share is used, might be useful to read for those interested in it. This “management cluster” approach is discussed in-depth in the vCloud Architecture Toolkit.

For the vCloud Director resource the following was discussed. The expectation was a 1000 VMs in the first 12 months. The architecture would need to cater for this. It was decided to use averages to calculate the requirements for this environment as the workload was unknown and could literally be anything. How did they come up with a formula in this case? Well what I suggested was looking at their current “hosted environment” and simply averaging things out. Do a dump of all data and try to come up with some common numbers. This is what it resulted in:

1000 VMs (4:1 core / VM, average of 6GB memory per VM)

Required cores = 250 (for example 21 x dual socket 6 core host)
Required memory = 6TB (for example 24 x 256GB host)

This did not take any savings due to TPS in to account and the current hardware platform used wasn’t as powerful as the new one. In my opinion it is safe to say that 24 hosts would cater for these 1000 VMs and that would include N+2. Even if it did not, they agreed that this would be their starting point and max cluster size. They wanted to avoid any risks and did not like to push the boundaries too much with regards to cluster sizes. Although I believe 32 hosts is no problem at all in a cluster I can understand where they were coming from.

The storage part is where it got more interesting. They had a huge debate around upfront costs and did not want to invest at this point in a huge enterprise level storage solution. As I said they wanted to make sure the environment would scale, but also wanted to make sure the costs made sense. On average in their current environment the disk size was 60GB. Multiply that by a 1000 and you know you will need at least 60TB of storage. This is a lot of spindles. Datacenter floor space was definitely a constraint, so this would be huge challenge… unless you use techniques like deduplication / compression and you have a proper amount of SSD to maintain a certain service level / guarantee performance.

During the discussion it was mentioned several times that they would be looking at the upcoming storage vendors like Tintri, Nimble and Pure Storage. There were the three specifically mentioned by this partner, but I realize there are many others out there. I have to agree that the solutions offered by these vendors are really compelling and each of them have something unique. It is difficult to compare them on paper though as Tintri does NFS, Nimble iSCSI and Pure Storage HC (and iSCSI soon) but is also SSD only. Especially Pure Storage intrigued them due to the power/cooling/rackspace savings. Also the great thing about all of these solutions is again that they are predictable from a cost / performance perspective and it allows for an easy repeatable architecture. They haven’t made a decision yet and are planning on doing an eval with each of the solutions to see how it integrates, scales, performs and most importantly what the operational impact is.

Something we did not discuss unfortunately was networking. These guys, being a traditional networking provider, did not have much control over what would be deployed as their network department was in charge of this. In order to keep things simple they were aiming for a 10Gbit infrastructure, the cost of networking ports was significant and they wanted to reduce the amount of cables coming out of the rack for simplicity reasons.

All in all it was a great discussion which I thought was worth sharing, although the post is anonymized I did ask their permission before I wrote this up :-). I realize that this is by far a complete picture but I hope it does give an idea of the approach, if I can find the time I will expand on this with some more examples. I hope that those working on similar architectures are willing to share their stories.

Cloud Infrastructure Architecture Case Study – vSphere + vShield App

Duncan Epping · Feb 29, 2012 ·

A white paper which I have worked on extensively has just been published. The case study takes a design / architecture approach and lists design considerations, requirements and assumptions throughout the document. I want to thank the people who worked with me on this document: Aidan Dalgleish, Frank Denneman, Matthew Northam, Venky Deshpande and Cormac Hogan. Below you can find more details… don’t forget to download it! I made sure it was available in various formats so each and everyone of you can read it on its favorite device.

Source – Cloud Infrastructure Architecture Case Study

Description: The VMware Cloud Infrastructure Architecture Case Study Series was developed to provide an understanding of the various components of the VMware Cloud Infrastructure Suite. The goal is to explain how these components can be used in specific scenarios, which are based on real-world customer examples and therefore contain real-world requirements and constraints. The VMware Cloud Infrastructure Suite consists of five technologies that together expand the capabilities and value that customers can realize from a virtualized infrastructure. This case study focuses on vSphere 5.0 and vShield App 5.0.

EPUB: http://www.vmware.com/files/pdf/techpaper/cloud-infrastructure-architecture.epub
MOBI: http://www.vmware.com/files/pdf/techpaper/cloud-infrastructure-architecture.mobi
PDF: http://www.vmware.com/files/pdf/techpaper/cloud-infrastructure-achitecture-case-study.pdf