sizing

Sizing a vSAN Stretched Cluster

Duncan Epping · May 30, 2017 ·

I have had this question a couple of times already, how many hosts do I need per site when the Primary FTT is set to 1 and the Secondary FTT is set to 1 and RAID-5 is used as the Failure Tolerance Method? The answer is straight forward, you have a local RAID-5 set locally in each site. RAID-5 is a 3+1 configuration, meaning 3 data blocks and 1 parity block. As such each site will need 4 hosts at a minimum. So if the requirement is PFTT=1 and SFTT=1 with the Failure Tolerance Method (FTM) set to RAID-5 then the vSAN Stretched Clustering configuration will be: 4+4+1. Note, that also when you use RAID-1 you will need at minimum 3 hosts per site. This because locally you will have 2 “data” components and 1 witness component.

From a capacity stance, if you have a 100GB VM and do PFTT=1, SFTT=1 and FTM set to RAID-1 then you have a local RAID-1 set in each site. Which means 100GB requires 200GB in each location. So 200% required local capacity, 400% for the total cluster. Using the below table you can easily see the overhead. Note that RAID-5 and RAID-6 are only available when using all-flash.

I created a quick table to help those going through this exercise. I did not include “FTT=3” as this in practice is not used too often in stretched configurations.

Description	PFTT	SFTT	FTM	Hosts per site	Stretched Config	Single site capacity	Total cluster capacity
Standard Stretched across locations with local protection	1	1	RAID-1	3	3+3+1	200% of VM	400% of VM
Standard Stretched across locations with local RAID-5	1	1	RAID-5	4	4+4+1	133% of VM	266% of VM
Standard Stretched across locations with local RAID-6	1	2	RAID-6	6	6+6+1	150% of VM	300% of VM
Standard Stretched across locations no local protection	1	0	RAID-1	1	1+1+1	100% of VM	200% of VM
Not stretched, only local RAID-1	0	1	RAID-1	3	n/a	200% of VM	n/a
Not stretched, only local RAID-5	0	1	RAID-5	4	n/a	133% of VM	n/a
Not stretched, only local RAID-6	0	2	RAID-6	6	n/a	150% of VM	n/a

Hope this helps!

Virtual SAN Datastore Calculator

Duncan Epping · Jan 17, 2014 ·

Over a week ago I wrote an article around how to size your Virtual SAN Datastore. I included an equation to help everyone who is going through the same exercise. I figured I should be able to make life even easier by creating a simple Virtual SAN Datastore calculator based on this equation.

<EDIT> VMware has launched a great calculator. My tool served as a temporary solution as there was no calculator available, now that there is I highly recommend using the official tool: http://virtualsansizing.vmware.com/ </EDIT>

Scale Up/Out and impact of vRAM?!? (part 2)

Duncan Epping · Jul 21, 2011 ·

** Disclaimer: I am a VMware employee **
** I am not affiliated with Dell, just picked them as their website is straight forward **

About a year ago I wrote an article about scaling up. I have been receiving multiple requests to update this article as with vRAM many seem to be under the impression that the world has changed but did it really? Yes I know I am about to burn myself but then again I am Dutch and we are known for our bluntness so let me be that Dutch guy again. Now before this turns into a “burn the witch who dares to speak about vRAM” thread let me be clear, this article is not about vRAM per se. Of course I will touch upon it and explain why I don’t think there is a problem in the scenario I am describing, but that is not what this article is about.

In my previous article I discussed the benefits of both scaling up and scaling out. Now as I stated, I had that discussion with customers when hosts were moving towards 32GB per host and now we are moving towards 32GB dimms instead easily cramming 256GB in a host. The world is changing and so is your datacenter, with or without vRAM (there is that word again). Once again I am not going to discuss vRAM by itself as I am not an analyst or responsible for pricing and packaging within VMware but what I do want to discuss is if vRAM has an impact on Scale-out vs Scale-up discussion as some are under the impression it does.

Lets assume the following:

To virtualize: 300 servers
Average Mem configured: 3GB
Average vCPU configured: 1.3

That would be a total of 900GB and 390 vCPUs. Now from a CPU perspective the recommended best practice that VMware PSO has had for the last years has been 5-8 vCPUs per core and we’ll come back to why this is important in second. Lets assume we will use 2U servers for now with different configurations. (When you do the math fiddle around with the RAM/Core/Server ratio, 96 vs 192 vs 256 could make a nice difference!)

Config 1:
- Dell r710
- 2 x 4 Core – Intel
- 96GB of memory
- $ 5500 per host
Config 2:
- Dell R810
- 2 x 8 Core – Intel
- 192GB of memory
- $13,000 per host

If we do the quick math that means if we look at it from a memory perspective and assume a roughly 20% TPS benefit (that is very conservative however) and round it up we need 10 x R710s or 5 x R810s. I noticed multiple people making statements about not recommending over-committing on memory because of vRAM, that doesn’t make any sense to me as memory techniques like TPS only lower the overall costs. As mentioned it is recommended to have 5-8 vCPUs per core… Let’s go for 6 vCPUs per core. That means from a vCPU perspective we will need 9 x R710s or 5 x R810s. Now we will take the worst case scenario into account and will go with the larger number for either RAM or CPU. So that results in:

10 x Dell R710 = 55k
5 x Dell R810 = 65k

Before anyone asks, I also looked at AMD 12-Core systems with 256GB and they come in around 16.5k with 256GB and you would need roughly 4 hosts to accomplish the same looking at the cost of those boxes and comparing it with Intel I would expect a broader adoption of AMD to be honest, but lets focus on the Intel comparison for now. So that is only a 10k difference when looking at hardware but the costs of managing it is lower for the R810s (fewer hosts) and not even talking about I/O ports, cooling and power. (Trying to keep things simple, but when adding these costs the difference will be even bigger.)

So what about that vRAM thingie? What about that huh! Well as I said this is not about vRAM but will it matter when buying large hosts? Well it might, but only when you buy more capacity than you need in this example and want to license all of it before hand… In this case, does it matter? 300 VMs x 3GB vRAM is 900GB vRAM (18.75 licenses for enterprise+), the type of host will not change this or will it? Well actually it will. If you look at the R710 you will need 20 (10 x 2) socket licenses assuming and lets assume Enterprise Plus is used. With the Dell R810 we will need 10 licenses from a socket perspective but 19 from a vRAM perspective using Enterprise Plus.

Lets place it in perspective:

Scale out
- 20 Enterprise+ licenses required
- 10 Hosts required
- Estimated costs for hosts + licenses 105k

Scale up
- 19 Enterprise+ licenses required
- 5 Hosts required
- Estimated costs for hosts + licenses 112.5k

Looking at the total costs of acquisition for Scale-Up only in terms of hardware and vSphere licenses in this scenario is indeed slightly more (7.5k) so should you go big?

As mentioned in my other posts there are a couple of things to keep in mind when making this decision and I cannot make it for you unfortunately but there are of course things to factor in. Many of these also have a substantial cost associated with it and I can guarantee that the costs associated with it will more than make up for that 7.5k!

Cost of Guest Operating System and Applications (licensed per socket in some cases)
Cost of I/O ports (storage + network)
Cost of KVM / Rackspace
Cost of Power / Cooling
Cost of operating per host (think firmware etc)
Cost of support (Hardware + Software)
Total number of VMs
Total number of vCPUs
Total number of vRAM
vCPUs per core ratio
Redundancy, taking N+1 into account
Impact of failure
Impact on DRS (less hosts is less balancing options)
Impact on TPS (less hosts means more memory sharing means less physical RAM needed)

Now once again I cannot make the call for you, it will depend on what you feel is most important. If you are concerned about placing all eggs in one basket you should probably go for scale out, but if your primary concern is cost and trust your hardware platform, scale up would be the way to go. I guess one thing to consider before you make your decision, how often does a server fail due to a hardware defect vs a human error? Would less servers also imply less chances of human error? But would it also imply a larger impact of a human error?

For those looking for more exact details I would recommend reading this excellent post by Bob Plankers! Bob and I exchanged a lot of DMs and emails on this topic over the last couple of days and I want to thank him for validating my logic and for the tremendous amount of effort he has put in to his article and spreadsheet!. I also want to thank Massimo Re Ferre for proof reading. This article by Aaron Delp is also worth reading, Aaron released it just after I finished this article. Talking about useful articles I would also like to refer to Massimo’s article which was published in 2009 but still very relevant! Scale-Up vs Scale-Out is a hot topic I guess.

Now looking at you guys to chip, and please keep the fight nice and clean, no clinching, spitting or cursing…

RE: Maximum Hosts Per Cluster (Scott Drummonds)

Duncan Epping · Nov 29, 2010 ·

I love blogging because of the discussions you some times get into. One of the bloggers I highly respect and closely follow is EMC’s vSpecialist Scott Drummonds (former VMware Performance Guru). Scott posted a question on his blog about what the size of a cluster should be. Scott discussed this with Dave Korsunksy and Dan Anderson, both VMware employee, and more or less came to the conclusion that 10 is probably a good number.

So, have I given a recommendation? I am not sure. If anything I feel that Dave, Dan and I believe that a minimum cluster size needs should be set to guarantee that the CPU utilization target, and not the HA failover capacity, is the defining the number of wasted resources. This means a minimum cluster of something like four or five hosts. While neither of us claims a specific problem that will occur with very large clusters, we cannot imagine the value of a 32-host cluster. So, we think the right cluster size is somewhere shy of 10.

And of course they have a whole bunch of arguments for both Large( 12+) and small (8-) clusters… which I summarized below for your convenience

Pro Large: DRS efficiency. This was my primary claim in favor of 32-host clusters. My reasoning is simple: with more hosts in the cluster there are more CPU and memory resource holes into which DRS can place running virtual machines to optimize the cluster’s performance. The more hosts, the more options to the scheduler.
Pro Small: DRS does not make scheduling decisions based on the performance characteristics of the server so a new, powerful server in a cluster is just as likely to receive a mission-critical virtual machine as older, slower host. This would be unfortunate if a cluster contained servers with radically different–although EVC compatible–CPUs like the Intel Xeon 5400 and Xeon 5500 series.
Pro Small: By putting your mission-critical applications in a cluster of their own your “server huggers” will sleep better at night. They will be able to keep one eye on the iron that can make or break their job.
Pro Small: Cumbersome nature of their change control. Clusters have to be managed to a consistent state and the complexity of this process is dependent on the number of items being managed. A very large cluster will present unique challenges when managing change.
Pro Small: To size a 4+1 cluster to 80% utilization after host failure, you will want to restrict CPU usage in the five hosts to 64%. Going to a 5+1 cluster results in a pre-failure CPU utilization target of 66%. The increases slowly approach 80% as the clusters get larger and larger. But, you can see that the incremental resource utilization improvement is never more than 2%. So, growing a cluster slightly provides very little value in terms of resource utilization.

It is probably an endless debate and all the arguments for both “Pro Large” and “Pro Small” are all very valid although I seriously disagree with their conclusion as in not seeing the value of a 32-host cluster. As always it fully depends. On what in this case you might say, why would you ever want a 32-host cluster? Well for instance when you are deploying vCloud Director. Clusters are currently your boundary for your vDC, and who wants to give his customer 6 vDCs instead of just 1 because you limited your cluster size to 6 hosts instead of leaving the option open to go to the max. This might just be an exception and nowhere near reality for some of you but I wanted to use this as an example to show that you will need to take many factors into account.
Now I am not saying you should, but at least leave the option open.

One of the arguments I do want to debate is the Change Control argument. Again, this used to be valid in a lot of Enterprise environments where ESX was used. Now I am deliberately using “ESX” and “Enterprise” here as reality is that many companies don’t even have a change control process in place. (I worked for a few large insurance companies which didn’t!) On top of that there is a large discrepancy when it comes to the amount of work associated with patching ESX vs ESXi. I have spent many weekends upgrading ESX but today literally spent minutes upgrading ESXi. The impact and risks associated with patching has most certainly decreased with ESXi in combination with VUM and the staging options. On top of that many organizations treat ESXi as an appliance, and with with stateless ESXi and the Auto-Deploy appliance being around the corner I guess that notion will only grow to become a best practice.

A couple of arguments that I have often seen being used to restrict the size of a cluster are the following:

HA limits (different max amount of VMs when cluster are > 8 hosts)
SCSI Reservation Conflicts
HA Primary nodes

Let me start with saying that for every new design you create, challenge your design considerations and best practices… are the still valid?

The first one is obvious as most of you know by now that there is no such a thing anymore as an 8 host boundary with HA. The second one needs some explanation. Around the VI3 time frame cluster sizes were often limited because of possible storage performance issues. These alleged issues were mainly blamed on SCSI Reservation Conflicts. The conflicts were caused by having many VMs on a single LUN in a large cluster. Whenever a metadata update was required the LUN would be locked by a host and this would/could increase overall latency. To avoid this, people would keep the amount of VMs per VMFS volume low (10/15) and keep the amount of VMFS volumes per cluster low…. Also resulting in a fairly low consolidation factor, but hey 10:1 beats physical.

Those arguments used to be valid, however things have changed. vSphere 4.1 brought us VAAI; which is a serious game changer in terms of SCSI Reservations. I understand that for many storage platforms VAAI is currently not supported… However, the original mechanism which is used for SCSI Reservations has also severely improved over time (Optimistic Locking) which in my opinion reduced the need to have many small LUNs, which eventually would limit you from a max amount of LUNs per host perspective. So with VAAI or Optimistic Locking, and of course NFS, the argument to have small clusters is not really valid anymore. (Yes there are exceptions)

The one design consideration, which is crucial, that is missing in my opinion though is HA node placement. Many have limited their cluster sizes because of hardware and HA primary node constraints. As hopefully known, if not be ashamed, HA has a maximum of 5 primary nodes in a cluster and a primary is required for restarts to take place. In large clusters the chances of losing all primaries also increase if and when the placement of the hosts is not taken into account. The general consensus usually is, keep your cluster limited to 8 and spread across two racks or chassis so that each rack always has at least a single primary node to restart VMs. But why would you limit yourself to 8? Why, if you just bought 48 new blades, would you create 6 clusters of 8 hosts instead of 3 clusters of 16 hosts? By simply layering your design you can mitigate all risks associated with primary nodes placements while benefiting from additional DRS placement options. (Do note that if you “only” have two chassis, your options are limited.) Which brings us to another thing I wanted to discuss…. Scott’s argument against increased DRS placement was that hundreds of VMs in an 8 host cluster already leads to many placement options. Indeed you will have many load balancing options in an 8 host cluster, but is it enough? In the field I also see a lot of DRS rules. DRS rules will restrict the DRS Load Balancing algorithm when looking for suitable options, as such more opportunities will more than likely result in a better balanced cluster. Heck, I have even seen cluster imbalances which could not be resolved due to DRS rules in a five host cluster with 70 VMs.

Don’t get me wrong, I am not advocating to go big…. but neither am I advocating to have a limited cluster size for reasons that might not even apply to your environment. Write down the requirements of your customer or your environment and don’t limit yourself to design considerations around Compute alone. Think about storage, networking, update management, max config limits, DRS&DPM, HA, resource and operational overhead.