** Disclaimer: I am a VMware employee **
** I am not affiliated with Dell, just picked them as their website is straight forward **
About a year ago I wrote an article about scaling up. I have been receiving multiple requests to update this article as with vRAM many seem to be under the impression that the world has changed but did it really? Yes I know I am about to burn myself but then again I am Dutch and we are known for our bluntness so let me be that Dutch guy again. Now before this turns into a “burn the witch who dares to speak about vRAM” thread let me be clear, this article is not about vRAM per se. Of course I will touch upon it and explain why I don’t think there is a problem in the scenario I am describing, but that is not what this article is about.
In my previous article I discussed the benefits of both scaling up and scaling out. Now as I stated, I had that discussion with customers when hosts were moving towards 32GB per host and now we are moving towards 32GB dimms instead easily cramming 256GB in a host. The world is changing and so is your datacenter, with or without vRAM (there is that word again). Once again I am not going to discuss vRAM by itself as I am not an analyst or responsible for pricing and packaging within VMware but what I do want to discuss is if vRAM has an impact on Scale-out vs Scale-up discussion as some are under the impression it does.
Lets assume the following:
- To virtualize: 300 servers
- Average Mem configured: 3GB
- Average vCPU configured: 1.3
That would be a total of 900GB and 390 vCPUs. Now from a CPU perspective the recommended best practice that VMware PSO has had for the last years has been 5-8 vCPUs per core and we’ll come back to why this is important in second. Lets assume we will use 2U servers for now with different configurations. (When you do the math fiddle around with the RAM/Core/Server ratio, 96 vs 192 vs 256 could make a nice difference!)
- Config 1:
- Dell r710
- 2 x 4 Core – Intel
- 96GB of memory
- $ 5500 per host
- Config 2:
- Dell R810
- 2 x 8 Core – Intel
- 192GB of memory
- $13,000 per host
If we do the quick math that means if we look at it from a memory perspective and assume a roughly 20% TPS benefit (that is very conservative however) and round it up we need 10 x R710s or 5 x R810s. I noticed multiple people making statements about not recommending over-committing on memory because of vRAM, that doesn’t make any sense to me as memory techniques like TPS only lower the overall costs. As mentioned it is recommended to have 5-8 vCPUs per core… Let’s go for 6 vCPUs per core. That means from a vCPU perspective we will need 9 x R710s or 5 x R810s. Now we will take the worst case scenario into account and will go with the larger number for either RAM or CPU. So that results in:
- 10 x Dell R710 = 55k
- 5 x Dell R810 = 65k
Before anyone asks, I also looked at AMD 12-Core systems with 256GB and they come in around 16.5k with 256GB and you would need roughly 4 hosts to accomplish the same looking at the cost of those boxes and comparing it with Intel I would expect a broader adoption of AMD to be honest, but lets focus on the Intel comparison for now. So that is only a 10k difference when looking at hardware but the costs of managing it is lower for the R810s (fewer hosts) and not even talking about I/O ports, cooling and power. (Trying to keep things simple, but when adding these costs the difference will be even bigger.)
So what about that vRAM thingie? What about that huh! Well as I said this is not about vRAM but will it matter when buying large hosts? Well it might, but only when you buy more capacity than you need in this example and want to license all of it before hand… In this case, does it matter? 300 VMs x 3GB vRAM is 900GB vRAM (18.75 licenses for enterprise+), the type of host will not change this or will it? Well actually it will. If you look at the R710 you will need 20 (10 x 2) socket licenses assuming and lets assume Enterprise Plus is used. With the Dell R810 we will need 10 licenses from a socket perspective but 19 from a vRAM perspective using Enterprise Plus.
Lets place it in perspective:
- Scale out
- 20 Enterprise+ licenses required
- 10 Hosts required
- Estimated costs for hosts + licenses 105k
- Scale up
- 19 Enterprise+ licenses required
- 5 Hosts required
- Estimated costs for hosts + licenses 112.5k
Looking at the total costs of acquisition for Scale-Up only in terms of hardware and vSphere licenses in this scenario is indeed slightly more (7.5k) so should you go big?
As mentioned in my other posts there are a couple of things to keep in mind when making this decision and I cannot make it for you unfortunately but there are of course things to factor in. Many of these also have a substantial cost associated with it and I can guarantee that the costs associated with it will more than make up for that 7.5k!
- Cost of Guest Operating System and Applications (licensed per socket in some cases)
- Cost of I/O ports (storage + network)
- Cost of KVM / Rackspace
- Cost of Power / Cooling
- Cost of operating per host (think firmware etc)
- Cost of support (Hardware + Software)
- Total number of VMs
- Total number of vCPUs
- Total number of vRAM
- vCPUs per core ratio
- Redundancy, taking N+1 into account
- Impact of failure
- Impact on DRS (less hosts is less balancing options)
- Impact on TPS (less hosts means more memory sharing means less physical RAM needed)
Now once again I cannot make the call for you, it will depend on what you feel is most important. If you are concerned about placing all eggs in one basket you should probably go for scale out, but if your primary concern is cost and trust your hardware platform, scale up would be the way to go. I guess one thing to consider before you make your decision, how often does a server fail due to a hardware defect vs a human error? Would less servers also imply less chances of human error? But would it also imply a larger impact of a human error?
For those looking for more exact details I would recommend reading this excellent post by Bob Plankers! Bob and I exchanged a lot of DMs and emails on this topic over the last couple of days and I want to thank him for validating my logic and for the tremendous amount of effort he has put in to his article and spreadsheet!. I also want to thank Massimo Re Ferre for proof reading. This article by Aaron Delp is also worth reading, Aaron released it just after I finished this article. Talking about useful articles I would also like to refer to Massimo’s article which was published in 2009 but still very relevant! Scale-Up vs Scale-Out is a hot topic I guess.
Now looking at you guys to chip, and please keep the fight nice and clean, no clinching, spitting or cursing…