Various

Scale out building block style, or should I say (yellow) brick style!

Duncan Epping · Mar 2, 2012 ·

I attended VMware PEX a couple of weeks back and during some of the sessions and discussions I had after the sessions I realized that many customers out there still design using legacy concepts. Funny thing is that this mainly applies to server virtualization projects and to a certain extend to cloud environments.It appears that designing in building blocks is something that EUC side of this world has embraced a long time ago.

I want to use this post to get feedback about your environments. How you scale up / scale out. I discussed a concept with one of the PEX attendees which I want to share. (This is no rocket science or something revolutionary, let that be clear.) This attendee worked for one of our partners, a service provider in the US, and was responsible for creating a scalable architecture for an Infrastructure as a Service (IaaS) offering.

The original plan they had was to build an environment that would allow for 10.000 virtual machines. Storage, networking and compute sizing and scaling was all done with these 10k VMs in mind. However it was expected that in the first 12 months only 1000 virtual machines would be deployed. You can imagine that internally there was a lot of debate around the upfront investment. Especially the storage and compute platform was a huge discussion. What if the projections where incorrect, what if 10k virtual machines was not realistic in three years. What if the estimated compute and IOps requirements where incorrect? This could lead to substantial underutilization of the environment, especially in IaaS where it is difficult to predict how the workload will behave this could lead to a significant loss. On top of that, they were already floor space constraint… which made it impossible to scale / size for 10k virtual machines straight from the start,

During the discussion I threw the building block (pod, stack, block… all the same) method on the table, as mentioned not unlike what the VDI/EUC folks have been doing for years and not unlike some of you have been preaching. Kris Boyd mentioned this in his session at Partner Exchange and let me quote him on this as I fully agree with his statemenet “If you know what works well on a certain scale, why not just repeat that?!” The advantage to that would be that the costs are predictive, but even more important for the customers and ops team the result of the implementation would be predictive. So what was discussed and what will be the approach for this particular environment, or at least will be the proposed as a possible architecture?

First of all a management cluster would be created. This is the mothership of the environment. It will host all vCenter virtual machines, vCloud Director, Chargeback, Databases etc. This environment does not have high IOps requirements or high compute requirements. It would be implemented on a small storage device, NFS based storage that is. The reason it was decided to use NFS is because of the fact that the vCloud Director cells require NFS to transport files. Chris Colotti wrote an article about when this NFS share is used, might be useful to read for those interested in it. This “management cluster” approach is discussed in-depth in the vCloud Architecture Toolkit.

For the vCloud Director resource the following was discussed. The expectation was a 1000 VMs in the first 12 months. The architecture would need to cater for this. It was decided to use averages to calculate the requirements for this environment as the workload was unknown and could literally be anything. How did they come up with a formula in this case? Well what I suggested was looking at their current “hosted environment” and simply averaging things out. Do a dump of all data and try to come up with some common numbers. This is what it resulted in:

1000 VMs (4:1 core / VM, average of 6GB memory per VM)

Required cores = 250 (for example 21 x dual socket 6 core host)
Required memory = 6TB (for example 24 x 256GB host)

This did not take any savings due to TPS in to account and the current hardware platform used wasn’t as powerful as the new one. In my opinion it is safe to say that 24 hosts would cater for these 1000 VMs and that would include N+2. Even if it did not, they agreed that this would be their starting point and max cluster size. They wanted to avoid any risks and did not like to push the boundaries too much with regards to cluster sizes. Although I believe 32 hosts is no problem at all in a cluster I can understand where they were coming from.

The storage part is where it got more interesting. They had a huge debate around upfront costs and did not want to invest at this point in a huge enterprise level storage solution. As I said they wanted to make sure the environment would scale, but also wanted to make sure the costs made sense. On average in their current environment the disk size was 60GB. Multiply that by a 1000 and you know you will need at least 60TB of storage. This is a lot of spindles. Datacenter floor space was definitely a constraint, so this would be huge challenge… unless you use techniques like deduplication / compression and you have a proper amount of SSD to maintain a certain service level / guarantee performance.

During the discussion it was mentioned several times that they would be looking at the upcoming storage vendors like Tintri, Nimble and Pure Storage. There were the three specifically mentioned by this partner, but I realize there are many others out there. I have to agree that the solutions offered by these vendors are really compelling and each of them have something unique. It is difficult to compare them on paper though as Tintri does NFS, Nimble iSCSI and Pure Storage HC (and iSCSI soon) but is also SSD only. Especially Pure Storage intrigued them due to the power/cooling/rackspace savings. Also the great thing about all of these solutions is again that they are predictable from a cost / performance perspective and it allows for an easy repeatable architecture. They haven’t made a decision yet and are planning on doing an eval with each of the solutions to see how it integrates, scales, performs and most importantly what the operational impact is.

Something we did not discuss unfortunately was networking. These guys, being a traditional networking provider, did not have much control over what would be deployed as their network department was in charge of this. In order to keep things simple they were aiming for a 10Gbit infrastructure, the cost of networking ports was significant and they wanted to reduce the amount of cables coming out of the rack for simplicity reasons.

All in all it was a great discussion which I thought was worth sharing, although the post is anonymized I did ask their permission before I wrote this up :-). I realize that this is by far a complete picture but I hope it does give an idea of the approach, if I can find the time I will expand on this with some more examples. I hope that those working on similar architectures are willing to share their stories.

vCenter Plugin Survey

Duncan Epping · Mar 1, 2012 ·

This is just a short survey so I am hoping all of you are willing to provide feedback about the use of vCenter Plugins. It will only take you two minutes max to fill out, only 6 questions!

As VMware moves from the traditional desktop admin client, to the new Web Based client for vSphere administration, we’d like to get a feeling from you on which partner plug-ins are crucial for to make the upgrade to a new client

http://www.surveymonkey.com/s/Y3TR2CB

Resource pool shares don’t make sense with vCloud Director?

Duncan Epping · Feb 28, 2012 ·

I’ve had multiple discussions around Resource Pool level shares in vCloud Director over the last 2 years so I figured I would write an article about it. A lot easier to point people to the article instead, and it also allows me to gather feedback on this topic. If you feel I am completely off, please comment… I am going to quote a question which was raised recently.

One aspect of “noise neighbor” that seems to never be discussed within vCloud is the allocation of shares. An organization with a single VM has better CPU resource access per VM than an organization that has 100 VMs. The organization resource pools have equal number of shares, so each VM gets a smaller and smaller allocation of shares as the VM count in an organization virtual data center increases.

Before I explain the rationale behind the design decision around shares behavior in a vCloud environment it is important to understand some of the basics. An Org vDC is nothing more than a resource pool. The chosen “allocation model” for your Org vDC and the specified charateristics determine what your Resource Pool will look like. I wrote a fairly lengthy article about it a while back, if you don’t understand allocation models take a look at it.

When an Org vDC is created on a vSphere layer a resource pool is created and it will typically have the following characteristics. In this example I will use the “Allocation Pool” allocation model as it is the most commonly used:

Org vDC Characteristics –> Resource Pool Characteristics

Total amount of resources –> Limit set to Y
Percentage of resources guaranteed –> Reservation set to X

On top of that each resource pool has a fixed number of shares. The difference between the limit and the reservation is often referred to as the “bust space”. Typically each VM will also have a reservation set. If 80% of your memory resources are guaranteed this will result in a 80% reservation on memory on your VM as well. This means that when you start deploying new VMs in to that resource pool you will be able to create as many until the limit is reached. In other words:

10GHz/10GB allocation pool Org vDC with 80% guaranteed resources = Resource pool with a 10GHz/GB limit and an 8GHz/GB reservation. In this pool you can create as many VMs until you hit those limits. Resources are guaranteed up to 8GHz/8GB!

Now what about those shares? The statement is, will the Org vDC with 100 VMs have less resource access than the Org vDC with only 10 VMs? Lets use that previous example again:

10GHz/10GB allocation pool with 80% resource guaranteed. This results in a resource pool with a 10GHz/10GB limit and an 8GHz/GB reservation.

Two Org VDCs are deployed, and each have the exact same characteristics. In “Org VDC – 1” 10 VMs were provisioned, while in “Org VDC – 2” 100 VMs are provisioned. It should be pointed out that the provider charges these customers for their Org VDC. As both decided to have 8GHz/GB guaranteed that is what they will pay for and when they exceed that “guarantee” they will be charged for it on top of that. They are both capped at 10GHz/GB however.

If there is contention than shares come in to play. But when is that exactly? Well after the 8GHz/GB of resources has been used. So in that case Org VDCs will be fighting over:

limit - reservation

In this scenario that is “10GHz/GB – 8GHz/GB = 2GHz/GB”. Is Org VDC 2 entitled to more resource access than Org VDC 1? No it is not. Let me repeat that, NO Org VDC 2 is not entitled to more resources.

Both Org VDC 1 and Org VDC 2 bought the exact same amount of resource. The only difference is that Org VDC 2 chose to deploy more VMs. Does that mean Org VDC 1’s VMs should receive less access to these resources just because they have less VMs? No they should not have less access! A provider cannot, in any shape or form, decide which Org VDC is entitled to more resources in that burst space, especially not based on the amount of VMs deployed as this gives absolutely no indication of the importance of these workloads.Org VDC 2 should buy more resources to ensure their VMs get what they are demanding.

Org VDC 1 cannot suffer because Org VDC 2 decided to overcommit. Both are paying for an equal slice of the pie… and it is up to themselves to determine how to carve that slice up. If they notice their slice of the pie is not big enough, they should buy a bigger or an extra slice!

However, there is a a scenario where shares can cause a “problem”… If you use “Pay As You Go” and remove all “guarantees” (reservations) and have contention in that scenario each resource pool will get the same access to the resources. If you have resource pools (Org VDCs) with 500 VMs and resource pools with 10 VMs this could indeed lead to a problem for the larger resource pools. Keep in mind that there’s a reason these “guarantees” were introduced in the first place, and overcommitting to the point where resources are completely depleted is most definitely not a best practice.

Fling: Auto Deploy GUI

Duncan Epping · Feb 9, 2012 ·

Many of you probably know the PXE Manager fling which Max Daneri created… Max has been working on something really cool, a brand new fling: Auto Deploy GUI! I had the pleasure of test driving the GUI and providing early feedback to Max when he had just started working on it and since then it has come a long way! It is a great and useful tool which I hope will at some point be part of vCenter. Once again, great work Max! I suggest that all of you check out this excellent fling and provide Max with feedback so that he can continue to develop and improve it.

The Auto-Deploy GUI fling is an 8MB download and allows you to configure auto-deploy without the need to use PowerCLI. It comes with a practical deployment guide which is easy to follow and should allow all of you to test this in your labs! Download it it now and get started!

source
The Auto Deploy GUI is a vSphere plug-in for the VMware vSphere Auto Deploy component. The GUI plug-in allows a user to easily manage the setup and deployment requirements in a stateless environment managed by Auto Deploy. Some of the features provided through the GUI include the ability to add/remove Depots, list/create/modify Image Profiles, list VIB details, create/modify rules to map hosts to Image Profiles, check compliance of hosts against these rules and re-mediate hosts.

Top VMware/virtualization blogs 2012 voting starts today

Duncan Epping · Jan 24, 2012 ·

Yes, it is that time of the year again… vSphere-land.com’s voting for the Top 25 Blogs worldwide has started again. I had the honor of placing 1st four consecutive times, but the competition is huge this year with excellent newcomers like Chris Colotti, scripting warriors like William Lam and Alan Renouf and of course my long time rival/friend Chad Sakac.

I am hoping each of you will select the top-10 blogs based on quality, longevity and frequency. (I personally find length of the article irrelevant, content is King!) I did want to list my top 10 articles over the last 12 months:

The voting is very straight forward and will only take 2 minutes of your time, all you have to do is select your Top 10 favourite VMware related virtualization blog sites and then sort them in your order of preference (ie: 1 – 10) – it’s as easy as that! Don’t wait any longer, cast your vote now!