How to calculate what your Virtual SAN datastore size should be

I have had this question so many times I figured I would write an article about it, how to calculate what your Virtual SAN datastore size should be? Ultimate this determines which kind of server hardware you can use, which disk controller you need and which disks… So it is important that you get it right. I know the VMware Technical Marketing team is developing collateral around this topic, when that has been published I will add a link here. Lets start with a quote by Christian Dickmann one of our engineers as it is the foundation of this article:

In Virtual SAN your whole cluster acts as a hot-spare

Personally I like to work top-down, meaning that I start with an average for virtual machines or a total combined number. Lets take an example to go through the exercise, makes it a bit easier to digest.

Lets assume the average VM disk size is 50GB. On average the VMs have 4GB of memory provisioned. And we have 100 virtual machines in total that we want to run on a 4 host cluster. Based on that info the formula would look something like this:

(total number of VMs * average VM size) + (total number of VMs * average VM memory size) = total capacity required

In our case that would be:

(100 * 50GB) + (100 * 4GB) = 5400 GB

So that is it? Well not really, like every storage / file system there is some overhead and we will need to take the “failures to tolerate” in to account. If I set my “failures to tolerate” to 1 than I would have 2 copies of my VMs, this means I need 5400 GB * 2 = . Personally I also add an additional 10% in disk capacity to ensure we have room for things like: meta data, log files, vmx files and some small snapshots when required. Note that VSAN by default provisions all VMDKs as thin objects (note that swap files are thick, Cormac explained that here), so there should be room available regardless. Better safe than sorry though. This means that 10800 GB actually becomes 11880 GB. I prefer to round this up to 12TB. The formula I have been using thus looks as follows:

(((Number of VMs * Avg VM size) + (Number of VMs * Avg mem size)) * FTT+1) + 10%

Now the next step is to see how you divide that across your hosts. I mentioned we would have 4 hosts in our cluster. We have two options, we create a cluster that can re-protect itself after a full host failure or we create cluster that cannot. Just to clarify, in order to have 1 host of spare capacity available we will need to divide the total capacity by 3 instead of 4. Lets look at those two options, and what the impact is:

  • 12TB / 3 hosts = 4TB per host (for each of the 4 hosts)
    • Allows you re-protect (sync/mirror) all virtual machine objects even when you lose a full host
    • All virtual machines will maintain availability levels when doing maintenance
    • Requires an additional 1TB per host!
  • 12TB / 4 hosts = 3TB per host (for each of the 4 hosts)
    • If all disk space is consumed, when a host fails virtual machines cannot be “re-protected” as there would be no capacity to sync/mirror the objects again
    • When entering maintenance mode data availability cannot be maintained as there would be no room to sync/mirror the objects to another disk

Now if you look at the numbers, we are talking about an additional 1TB per host. With 4 hosts, and lets assume we are using 2.5″ SAS 900GB Hitachi drives that would be 4 additional drives, at a cost of around 1000 per drive. When using 3.5″ SATA drives the cost would be a lot lower even. Although this is just a number I found on the internet it does illustrate that the cost of providing additional availability could be small. Prices could differ though depending on the server brand used. But even at double the cost, I would go for the additional drive and as such additional “hot spare capacity”.

To make life a bit easier I created a calculator. I hope this helps everyone who is looking at configuring hosts for their Virtual SAN based infrastructure.

Confessions of a VMUG speaker

I started reading this book by Scott Berkun titled “Confessions of a public speaker”. After the first couple of chapters I felt I wasn’t alone… What I am talking about? Stage Fright / Fear of Public Speaking. Let me start with a quote first…

Mark Twain, who made most of his income from speaking, not writing, said, “There are two types of speakers: those that are nervous and those that are liars.”

For those considering speaking at a VMUG but are terrified, I hope you find comfort in knowing that the majority of people you see presenting at these events have (or had) similar feelings. I don’t know anyone who is not nervous when he goes up on stage. Those who say they are not probably indeed lie about it, yes there are some exceptions to the rule of course as always, but I can tell you that I am not one of those. I used to be terrified, stage fright is the right word.

Just to speak from my own experience, a lot of people seem to think that presenting is part of my role and is something I enjoy doing. I do enjoy it when the session is over, but the journey there I don’t enjoy. I am still nervous when I go up on stage, and depending on the size that is either nervous/excited or nervous/scared. Yes, like many of you reading this, the first couple of times presenting I wondered WHY am I doing this? It was painful being up on stage, it was painful doing dry-runs, and it even felt crap afterwards. WHY am I doing this?

Personally I believe I need to place myself in an uncomfortable situation to grow / learn. This applies to learning new skills, like public speaking, but also broadening the horizon from a job/career perspective. You can be a “virtualization admin” for the rest of your life and do it with your eyes closed… You can also take on a completely new set of responsibilities, yes you will feel uncomfortable for a couple of weeks or even months, but guess what after a while it all feels like you have been doing it for years… Same applies to public speaking, only way to get comfortable with that fear or nervous feeling is by doing it!

So what are some of the mistakes I made, and probably still make every once in a while, and what should you be doing or not doing?

  • Don’t over do it! Practicing will help your delivery, overdoing it will probably hurt it! I did this for a long time, and I noticed I get nervous about forgetting things, and guess what… You will forget things, but don’t worry about that because the audience typically doesn’t know what you are going to tell them anyway!
  • When practicing focus on your opinion, your story, your considerations. Don’t practice it “word by word”, think big and feel comfortable with the content.
  • Don’t cram your slide-deck! Less = more. Especially true in the case of a slide-deck, understand the deck is there to support your presentation. But still keep in mind that many people use the slide-deck afterwards as study notes, so keep it balanced. Typically when you have 60 minutes, aim for 50 minutes talking and 10 minutes QA. Believe me when I say that 30/40 slides is MORE than enough. 30 would probably be better, and if you can do with less you’ve mastered it!

Practice while you build your deck… I do this regularly to test the flow and see if the points / diagram / screenshot works in the presentation, and I will tweak the deck while doing a dry-run when something doesn’t work.

And it’s often the case that the things speakers obsess about are the opposite of what the audience cares about. They want to be entertained. They want to learn. And most of all, they want you to do well.

That is key to remember, they want you to do well! Now, please take the time in the upcoming days to think about what you would like to talk about at a local VMUG. Everyone has something interesting to tell, it doesn’t need to be a deepdive on Storage, not everyone is Cormac Hogan right… No, a presentation on your migration between storage systems or datacenters could be just as interesting! A presentation on the introduction of a Disaster Recovery tool and how it changed your life would be a good way to help people making the right decision. Many many things one can talk about without the need to go extremely deep.

Once again, think about what you would like to talk about, create a slidedeck, practice and more importantly go have fun and support your local VMUG!!

Just some random Virtual SAN tweets…

I’ve been following twitter fairly close around Virtual SAN / VSAN related tweets, I do this to track feedback / sentiment and forward it to the engineering teams when and where applicable. There were a couple I wanted to share with you, I just like the vibe of these or find them funny… Some easy pre-holiday / friday reading I guess. I’ll be taking a couple of weeks off myself, so it will be quiet around here.

What happens in a VSAN cluster in the case of an SSD failure?

The question that keeps coming up over and over again at VMUG events, on my blog and the various forums is: What happens in a VSAN cluster in the case of an SSD failure? I answered the question in one of my blog posts around failure scenarios a while back, but figured I would write it down in a separate post considering people keep asking for it. It makes it a bit easier to point people to the answer and also makes it a bit easier to find the answer on google. Lets sketch a situation first, what does (or will) the average VSAN environment look like:

In this case what you are looking at is:

  • 4 host cluster
  • Each host with 1 disk group
  • Each disk group has 1 SSD and 3 HDDs
  • Virtual machine running with a “failures to tolerate” of 1

As you hopefully know by now a VSAN Disk Group can hold 7 HDDs and requires an SSD on top of that. The SSD is used as a Read Cache (70%) and a Write Buffer (30%) for the components stored on it. The SSD is literally the first location IO is stored; as depicted in the diagram above. So what happens when the SSD fails?

When the SSD fails the whole Disk Group and all of the components will be reported as degraded or absent. The state (degraded vs absent) will depend on the type of failure, typically though when an SSD fails VSAN will recognize this and mark it as degraded and as such instantly create new copies of your objects (disks, vmx files etc) as depicted in the diagram above.

From a design perspective it is good to realize the following (for the current release):

  • A disk group can only hold 1 SSD
  • A disk group can be seen as a failure domain
    • E.g. as such there could be a benefit in creating 2 x 3HDD+1SSD versus 6HDD+1SSD diskgroup
  • SSD availability is critical, select a reliable SSD! Yes some consumer grade SSDs do deliver a great performance, but they typically also burn out fast.

Let is be clear that if you run with the default storage policies you are protecting yourself against 1 component failure. This means that 1 SSD can fail or 1 host can fail or 1 disk group can fail, without loss of data and as mentioned typically VSAN will quickly recreate the impacted objects on top of that.

Doesn mean you should try safe money on reliability if you ask me. If you are wondering which SSD to select for your VSAN environment I recommend reading this post by Wade Holmes on the VMware vSphere Blog. Especially take note of the Endurance Requirements section! If I had to give a recommendation though, the Intel S3700 seems to still be the sweet spot when it comes to price / endurance / performance!

for 2014 I predict…

John Troyer just blogged about how he doesn’t see much value in “2014 predictions” blog posts. I agree, but I do love predictions.

For 2014, I predict…

Pretty sure that those who know the song will be humming the tune the rest of the week “aahaahaa lalalalala I predict a riot…”