architecture

Underlying Infrastructure for your pets and cattle

Duncan Epping · Oct 10, 2014 ·

Last week on twitter Maish asked a question that got me thinking. Actually, I have been thinking about this for a while now. The question deals with how you design your infrastructure for the various types of workloads (pets and cattle). Whether your workload falls in the “pet” category or the “cattle” category. (If you are not familiar with the terms pets/cattle read this article by Massimo)

Pets vs. cattle – I think that people are mistaken as to what that exactly means for the underlying infrastructure.

— Maish Saidel-Keesing (@maishsk) October 5, 2014

I asked Maish what it actually means for your infrastructure, and at the same time I gave it some more thought over the last week. Cattle is the type of application architecture which handles failures by providing a distributed solution, it scales out instead of up typically, the VMs are disposable as they usually won’t hold state. With pets this is different, they typically scale up and resiliency is often provided by either a 3rd party clustering mechanism or the infrastructure under neath, in many cases contain state and recoverability is key. As you can imagine both types of workloads have different requirements of the infrastructure. Going back to Maish’s question, I guess the real question is if you can afford the “what it means for the underlying infrastructure”. What do I mean with that?

If you look at the requirements of both architectures, you could say that “pets” will typically demand more from the underlying infrastructure when it comes to resiliency / recoverability. Cattle will have less demands from that perspective but flexibility / agility is more important. You can imagine that you could implement two different infrastructure architectures for these specific workloads, but does this make sense? If you are Netflix, Google, Youtube etc then it may make sense to do this due to the scale they operate at and the fact that IT is their core business. In those cases “cattle” is what drives the business, and there are back-end systems. Reality is though that for the majority this is not the case. Your environment will be a hybrid, and more than likely “pets” will have the overhand as that is simply what the state of the world is today.

That does not mean they cannot co-exist. That is what I believe is the true strength of virtualization, it allows you to run many different types of workloads on the same infrastructure. Whether that is your Exchange environment or your in-house developed scale out web application which serves hundreds of thousands of customers does not make a difference to your virtualization platform. From an operational perspective the big benefit here is that you will not have to maintain different run books to manage your workloads. From an ops perspective they will look the same on the outside, although they may differ on the inside. What may change though is the services required for those systems, but with the rich ecosystem available for virtualization platforms these days that should not be a problem. Need extra security / microsegmentation? VMware NSX can provide the security isolation needed to run these applications smoothly. Sub milliseconds latency requirements? Plenty of storage / caching solutions out there that can deliver this!

Will the application architecture shift that is happening right now impact your underlying infrastructure? We have made these huge steps in operational efficiency in the last 5 years, and with SDDC we are about to take the next big step, and although I do believe that the application architecture shift will result in infrastructure changes lets not make the same mistakes we made in the past by creating these infrastructure silos per workload. I strongly believe that repeatability, consistency, reliability and predictability are key and this starts with a solid, scalable and trusted foundation (infrastructure).

RE: The VCDX candidates advantage over the panellists

Duncan Epping · Oct 6, 2014 ·

I was reading Josh Odger’s post on the VCDX Defense. Josh’s article can be summarised with the following part:

As a result, the candidate should be an expert in the design being presented and answering questions from the panel about the design should not be intimidating.

Having gone through the process myself, knowing many of the VCDX’s and having been on countless of panels I completely disagree with Josh. Sure, you do need to know your design inside/out… but, it is not about “who’s having an advantage”, the panel member is not there to fail or pass the candidate… they are there to assess your skills as an architect!

If you look at the defense day there are three parts:

Defend your design
Design scenario
Troubleshooting scenario

For the design and troubleshooting scenario you get a random exercise, so you have no prior knowledge of what will be asked. When it comes to defending your design of course you will know your design (hopefully) better then anyone else. However, the questions you get will not necessarily be about the specifics or details of your design. The VCDX panel is there to assess your skills as an architect and not your “fact cramming skills”. A good panel will ask a lot of hypothetical questions like:

Your design uses NFS based storage, how would FC connected storage have changed your design?
Your design is based on capacity requirements for 80 virtual machine, what would you have done differently when the requirement would be 8000 virtual machines?
Your design …

So when you do mock exams, prepare for these types of hypothetical questions. That is when you really start to understand the impact decisions can have, and when during your defense you get one of these questions and you do not know the answer make sure you guide the panel through your thought process. That is what differentiates someone who can learn facts (VCP exam) and someone who can digest them, understand them and apply them in different scenarios (VCDX exam).

As I stated, it may sound like that you knowing your design inside out means having a big advantage over the panel members but it probably isn’t… that is not what they are testing you on! Your ability to assess and adapt are put through the wringer, your skills as an architect are tested thoroughly and that is where you will need to do well.

Good luck!

VMware vSphere Virtual SAN design considerations…

Duncan Epping · Sep 9, 2013 ·

I have been playing a lot with vSphere Virtual SAN (VSAN) in the last couple of months… I figured I would write down some of my thoughts around creating a hardware platform or constructing the virtual environment when it comes to VSAN. There are some recommended practices and there are some constraints, I aim to use this blog post to gather all of these Virtual SAN design considerations. Please read the VSAN introduction, how to install VSAN in your virtual lab and “How do you know where an object is located” to get a better understanding of the product. There is a long list of VSAN blogs that can be found here: vmwa.re/vsan

The below is all based on vSphere 5.5 Virtual SAN (public) Beta and my interpretation and thoughts based on various conversations with colleagues, engineering and reading various documents.

vSphere Virtual SAN (VSAN) clusters are limited to a maximum total of 32 hosts and there is a minimum of 3 hosts. VSAN is also currently limited to 100 VMs per host, resulting in a maximum of 3200 VMs in a 32 host cluster. Please note that HA currently has a limit of 2048 protected VMs in a single Datastore.
It is recommended to dedicate a 10GbE NIC port to your VSAN VMkernel traffic, although 1GbE is fully supported it could be a limiting factor in I/O intensive environments. Both VSS and VDS are supported.
It is recommended to have a VSAN VMkernel on every physical NIC! Ensure to configure them in a “active/standby” configuration so that when you have 2 physical NIC ports and 2 VSAN VMkernel’s each of them will have its own port. Do note that multiple VSAN VMkernel NICs on a single host on the same subnet is not a supported configuration, in different subnets it is supported.
IP Hash Load Balancing is supported by VSAN, but due to limited number of IP-addresses between source/destination load balancing benefits could be limited. In other words, an etherchannel formed out of 4x1GbE NIC will most likely not result in 4GbE.
Although Jumbo Frames are fully supported with VSAN they do add a level of operational complexity. When Jumbo Frames are enabled ensure these are enabled end-to-end!
VSAN requires at a minimum 1 SSD and 1 Magnetic Disk per diskgroup on a host which is contributing storage. Each diskgroup can have a maximum of 1 SSD and 7 magnetic disks. When you have more than 7 HDDs or two or more SSDs you will need to create additional diskgroups.
Each host that is providing capacity to the VSAN datastore has at least one local diskgroup. There is a maximum of 5 disk groups per host!
It can beneficial to create multiple smaller disk groups instead of larger diskgroups. More diskgroups means smaller failure domains and more cache drives / queues.
Ensure when sizing your environment to take data replicas in to account. If your environment needs N+1 or N+2 (etc) resiliency factor this in accordingly.
SSD capacity does not count towards total VSAN datastore capacity. When sizing your environment, do not include SSD capacity in your totalized capacity calculation.
It is a recommended practice to have a minimum 1:10 ratio of SSD capacity to HDD capacity in each disk group. In other words, when you have 1TB of HDD capacity, it is recommended to have at least 100GB of SSD capacity. Note that VMware’s recommendation has changed since BETA, new recommendation is:
- 10 percent of the anticipated consumed storage capacity before the number of failures to tolerate is considered
By default, 70% of the available SSD capacity will be used as read cache and 30% will be used as a write buffer. As in most designs, when it comes to cache/buffer –> more = better.
Selecting the SSD with the right performance profile can make a 5x-10x difference in VSAN performance easily, chose carefully and wisely. Both SSD and PCIe flash solutions are supported, but there are requirements! Make sure to check the HCL before purchasing new hardware. My tip Intel S3700, great price/performance balance.
VSAN relies on VM Storage Policies for policy based management. There is a default policy under the hood, but you cannot see this within the UI. As such it is a recommended practice to create a new standard policy for your environment after VSAN has been configured. It is recommended to start with all settings set to default, ensure “Number of failures to tolerate” is configured to 1. This guarantees that when a single host fails virtual machines can be restarted and recovered from this failure with minimal impact on the environment. Attach this policy to your virtual machines when migrating them to VSAN or during virtual machine provisioning.
Configure vSphere HA isolation response to “power-off” to ensure that virtual machines which reside on an isolated host can be safely restarted.
Ensure vSphere HA admission control policy (“host failures to tolerate” or the “percentage based) aligns with your VSAN availability strategy. In other words, ensure that both compute and storage are configured using the same “N+x” availability approach.
When defining your VM Storage Policy avoid unnecessary usage of “flash read cache reservation”. VSAN has internal read cache optimization algorithms, trust it like you trust the “host scheduler” or DRS!
VSAN does not support virtual machine disks greater than 2TB-512b, VMs which require larger VMDKs are not suitable candidates at this point in time for VSAN.
VSAN does not support FT, DPM, Storage DRS or Storage I/O Control. It should be noted though that VSAN internally takes care of scheduling and balancing when required. Storage DRS and SIOC are designed for SAN/NAS environments.
Although supported by VSAN, it is recommended practice to keep the hosts/disk configuration for a VSAN cluster similar. Non-uniform cluster configuration could lead to variations in performance and could make it more complex to stay compliant to defined policies after a failure.
When adding new SSDs or HDDs ensure these are not pre-formatted. Note that when VSAN is configured to “automatic mode” disks are added to existing disk groups or new disk groups are created automatically.
Note that vSphere HA behaves slightly different in a VSAN enabled cluster, here are some of the changes / caveats
- Be aware that when HA is turned on in the cluster, FDM agent (HA) traffic goes over the VSAN network and not the Management Network. However, when an potential isolation is detected HA will ping the default gateway (or specified isolation address) using the Management Network.
- When enabling VSAN ensure vSphere HA is disabled. You cannot enable VSAN when HA is already configured. Either configure VSAN during the creation of the cluster or disable vSphere HA temporarily when configuring VSAN.
- When there are only VSAN datastores available within a cluster then Datastore Heartbeating is disabled. HA will never use a VSAN datastore for heartbeating.
- When changes are made to the VSAN network it is required to re-configure vSphere HA.
VSAN requires a RAID Controller / HBA which supports passthrough mode or pseudo passthrough mode. Validate with your server vendor if the included disk controller has support for passthrough. An example of a passthrough mode controller which is sold separately is the LSI SAS 9211-8i.
Ensure log files are stored externally to your ESXi hosts and VSAN by leveraging vSphere’s syslog capabilities.
ESXi can be installed on: USB, SD and Magnetic Disk. Hosts with 512GB or more memory are only supported when ESXi is installed on magnetic disk.

That is it for now. When more comes to mind I will add it to the list!

Cloud Infrastructure Architecture Case Study – vSphere + vShield App

Duncan Epping · Feb 29, 2012 ·

A white paper which I have worked on extensively has just been published. The case study takes a design / architecture approach and lists design considerations, requirements and assumptions throughout the document. I want to thank the people who worked with me on this document: Aidan Dalgleish, Frank Denneman, Matthew Northam, Venky Deshpande and Cormac Hogan. Below you can find more details… don’t forget to download it! I made sure it was available in various formats so each and everyone of you can read it on its favorite device.

Source – Cloud Infrastructure Architecture Case Study

Description: The VMware Cloud Infrastructure Architecture Case Study Series was developed to provide an understanding of the various components of the VMware Cloud Infrastructure Suite. The goal is to explain how these components can be used in specific scenarios, which are based on real-world customer examples and therefore contain real-world requirements and constraints. The VMware Cloud Infrastructure Suite consists of five technologies that together expand the capabilities and value that customers can realize from a virtualized infrastructure. This case study focuses on vSphere 5.0 and vShield App 5.0.

EPUB: http://www.vmware.com/files/pdf/techpaper/cloud-infrastructure-architecture.epub
MOBI: http://www.vmware.com/files/pdf/techpaper/cloud-infrastructure-architecture.mobi
PDF: http://www.vmware.com/files/pdf/techpaper/cloud-infrastructure-achitecture-case-study.pdf