Lately I am having a lot of discussions with customers around sizing of their hosts. Especially Cisco UCS(with the 384GB option) and the upcoming Intel Xeon 5600 series with six cores per CPU takes the “Scale Up” discussion to a new level.
I guess we had this discussion in the past as well when 32GB became a commodity. The question I always have is how many eggs do you want to have in one basket. Basically do you want to scale up(larger hosts) or scale out(more hosts).
I guess it’s a common discussion and a lot of people don’t see the impact sizing your hosts correctly. Think about this environment, 250 VMs in total with the need of roughly 480GB of memory:
- 10 Hosts, each having 48GB and 8 Cores, 25 VMs each.
- 5 Hosts, each having 96GB and 16 Cores, 50 VMs each.
If you look at it from an uptime perspective; Would a failure occur in scenario 1 you will lose 10% of your environment. If you look at scenario 2 this is 20%. Clearly the associated cost with the down time for 20% of your estate is higher than for 10% of your estate.
Now it’s not only the associated cost with the impact of a host failure it is also for instance the ability of DRS to load balance the environment. The less hosts you will have the smaller the chances are DRS will be able to balance the load. Keep in mind DRS uses a deviation to calculate the imbalance and simulates a move to see if it results in a balanced cluster.
Another thing to keep in mind is HA. When you design for N+1 redundancy and need to buy an extra host the costs associated for redundancy is high for a scale up scenario. Not only the costs associated are high, the load when the fail-over needs to occur will also increase immense. If you only have 4 hosts and 1 host fails the added load on the 3 hosts will have a higher impact than it would have on for instance 9 hosts in a scale out scenario.
Licensing is another often used argument for buying larger hosts but for VMware it usually will not make a difference. I’m not the “capacity management” or “capacity planning” guru to be honest but I can recommend VMware Capacity Planner as it can help you to easily create several scenarios. (Or Platespin Recon for that matter.) If you have never tried it and are a VMware partner check it out and run the scenarios based on scale up and scale out principles and do the math.
Now, don’t get me wrong I am not saying you should not buy hosts with 96GB but think before you make this decision. Decide what an acceptable risk is and discuss the impact of the risk with your customer(s). As you can imagine for any company there’s a cost associated with down time. Down time for 20% of your estate will have a different financial impact than down time for 10% of your estate and this needs to be weighted against all the pros and cons of scale out vs scale up.