I have been playing a lot with vSphere Virtual SAN (VSAN) in the last couple of months… I figured I would write down some of my thoughts around creating a hardware platform or constructing the virtual environment when it comes to VSAN. There are some recommended practices and there are some constraints, I aim to use this blog post to gather all of these Virtual SAN design considerations. Please read the VSAN introduction, how to install VSAN in your virtual lab and “How do you know where an object is located” to get a better understanding of the product. There is a long list of VSAN blogs that can be found here: vmwa.re/vsan
The below is all based on vSphere 5.5 Virtual SAN (public) Beta and my interpretation and thoughts based on various conversations with colleagues, engineering and reading various documents.
- vSphere Virtual SAN (VSAN) clusters are limited to a maximum total of 32 hosts and there is a minimum of 3 hosts. VSAN is also currently limited to 100 VMs per host, resulting in a maximum of 3200 VMs in a 32 host cluster. Please note that HA currently has a limit of 2048 protected VMs in a single Datastore.
- It is recommended to dedicate a 10GbE NIC port to your VSAN VMkernel traffic, although 1GbE is fully supported it could be a limiting factor in I/O intensive environments. Both VSS and VDS are supported.
- It is recommended to have a VSAN VMkernel on every physical NIC! Ensure to configure them in a “active/standby” configuration so that when you have 2 physical NIC ports and 2 VSAN VMkernel’s each of them will have its own port. Do note that multiple VSAN VMkernel NICs on a single host on the same subnet is not a supported configuration, in different subnets it is supported.
- IP Hash Load Balancing is supported by VSAN, but due to limited number of IP-addresses between source/destination load balancing benefits could be limited. In other words, an etherchannel formed out of 4x1GbE NIC will most likely not result in 4GbE.
- Although Jumbo Frames are fully supported with VSAN they do add a level of operational complexity. When Jumbo Frames are enabled ensure these are enabled end-to-end!
- VSAN requires at a minimum 1 SSD and 1 Magnetic Disk per diskgroup on a host which is contributing storage. Each diskgroup can have a maximum of 1 SSD and 7 magnetic disks. When you have more than 7 HDDs or two or more SSDs you will need to create additional diskgroups.
- Each host that is providing capacity to the VSAN datastore has at least one local diskgroup. There is a maximum of 5 disk groups per host!
- It can beneficial to create multiple smaller disk groups instead of larger diskgroups. More diskgroups means smaller failure domains and more cache drives / queues.
- Ensure when sizing your environment to take data replicas in to account. If your environment needs N+1 or N+2 (etc) resiliency factor this in accordingly.
- SSD capacity does not count towards total VSAN datastore capacity. When sizing your environment, do not include SSD capacity in your totalized capacity calculation.
- It is a recommended practice to have a minimum 1:10 ratio of SSD capacity to HDD capacity in each disk group. In other words, when you have 1TB of HDD capacity, it is recommended to have at least 100GB of SSD capacity. Note that VMware’s recommendation has changed since BETA, new recommendation is:
- 10 percent of the anticipated consumed storage capacity before the number of failures to tolerate is considered
- By default, 70% of the available SSD capacity will be used as read cache and 30% will be used as a write buffer. As in most designs, when it comes to cache/buffer –> more = better.
- Selecting the SSD with the right performance profile can make a 5x-10x difference in VSAN performance easily, chose carefully and wisely. Both SSD and PCIe flash solutions are supported, but there are requirements! Make sure to check the HCL before purchasing new hardware. My tip Intel S3700, great price/performance balance.
- VSAN relies on VM Storage Policies for policy based management. There is a default policy under the hood, but you cannot see this within the UI. As such it is a recommended practice to create a new standard policy for your environment after VSAN has been configured. It is recommended to start with all settings set to default, ensure “Number of failures to tolerate” is configured to 1. This guarantees that when a single host fails virtual machines can be restarted and recovered from this failure with minimal impact on the environment. Attach this policy to your virtual machines when migrating them to VSAN or during virtual machine provisioning.
- Configure vSphere HA isolation response to “power-off” to ensure that virtual machines which reside on an isolated host can be safely restarted.
- Ensure vSphere HA admission control policy (“host failures to tolerate” or the “percentage based) aligns with your VSAN availability strategy. In other words, ensure that both compute and storage are configured using the same “N+x” availability approach.
- When defining your VM Storage Policy avoid unnecessary usage of “flash read cache reservation”. VSAN has internal read cache optimization algorithms, trust it like you trust the “host scheduler” or DRS!
- VSAN does not support virtual machine disks greater than 2TB-512b, VMs which require larger VMDKs are not suitable candidates at this point in time for VSAN.
- VSAN does not support FT, DPM, Storage DRS or Storage I/O Control. It should be noted though that VSAN internally takes care of scheduling and balancing when required. Storage DRS and SIOC are designed for SAN/NAS environments.
- Although supported by VSAN, it is recommended practice to keep the hosts/disk configuration for a VSAN cluster similar. Non-uniform cluster configuration could lead to variations in performance and could make it more complex to stay compliant to defined policies after a failure.
- When adding new SSDs or HDDs ensure these are not pre-formatted. Note that when VSAN is configured to “automatic mode” disks are added to existing disk groups or new disk groups are created automatically.
- Note that vSphere HA behaves slightly different in a VSAN enabled cluster, here are some of the changes / caveats
- Be aware that when HA is turned on in the cluster, FDM agent (HA) traffic goes over the VSAN network and not the Management Network. However, when an potential isolation is detected HA will ping the default gateway (or specified isolation address) using the Management Network.
- When enabling VSAN ensure vSphere HA is disabled. You cannot enable VSAN when HA is already configured. Either configure VSAN during the creation of the cluster or disable vSphere HA temporarily when configuring VSAN.
- When there are only VSAN datastores available within a cluster then Datastore Heartbeating is disabled. HA will never use a VSAN datastore for heartbeating.
- When changes are made to the VSAN network it is required to re-configure vSphere HA.
- VSAN requires a RAID Controller / HBA which supports passthrough mode or pseudo passthrough mode. Validate with your server vendor if the included disk controller has support for passthrough. An example of a passthrough mode controller which is sold separately is the LSI SAS 9211-8i.
- Ensure log files are stored externally to your ESXi hosts and VSAN by leveraging vSphere’s syslog capabilities.
- ESXi can be installed on: USB, SD and Magnetic Disk. Hosts with 512GB or more memory are only supported when ESXi is installed on magnetic disk.
That is it for now. When more comes to mind I will add it to the list!
Cormac Hogan says
Just as an fyi regarding VM Storage Policies. The default policy is “Number of failures to tolerate” set to 1. When you deploy a VM without explicitly selecting a policy, you automatically get this one, and you automatically get a mirrored copy of your VM.
Duncan says
I should have made that a bit clearer indeed. Edited the text, should make more sense now.
Andy G says
Hi Duncan
Thanks for the notes!
Can you please expand on the last bullet where you indicate “It should be noted though that VSAN takes care of storage services like Storage DRS and Storage IO Control internal to the platform.”
Any details you can share on the internal SDRS/SIOC implementation?
Duncan says
What I mean with that is that VSAN and the Host will take care of placement and scheduling / queueing. It is not like VSAN has a specific implementation of SDRS or SIOC, they solved things in a different way.
Agustín Malanco says
Great post Duncan!
iwan rahabok says
Very informative!
Does VSAN only present 1 datastore? If yes, this changes the datastore design. We normally don’t place 100 server VM in a single datastore. VDI VM yes, but not VDI VM.
What is the purpose of “disk groups” since VSAN will create 1 datastore anyway?
Since VSAN needs the whole disk, does it mean we will need at least 3 magnetic disk (2 for ESXi boot drive, which is the common practice)? As a side effect, this will encourage customers to move toward USB drive or auto-deploy.
Thanks from Singapore
e1
Duncan says
1) Yes VSAN presents a single Datastore to all hosts part of that VSAN cluster
2) 1 SSD and one or multiple HDDs form a Diskgroup. This Diskgroup will provide capacity to the VSAN datastore. You can create multiple diskgroups per host.
3) It means you need at least 1HDD, how you end up booting ESXi is besides the point. You can do USB, SD, Auto-Deploy etc.
Thomas Findelkind says
Hello Duncan,
i found that SSD capacity does count when it comes to the diskgroup capacity. In my case I use a 640GB ioDrive and the HD is only 250GB. So the max capacity is 640 GB.
Duncan says
That is a bug then… Can you please file this…
Ben says
Hello Duncan,
I am a newbie to VMware as well as VSAN. I would like to seek your advise whether I could use SSD instead of HDD for the VSAN datastore.
Jim Millard says
Duncan,
You advise to put a vsan-enabled vmkernel port on each physical nic; is there a limit to the number of vmkernel ports that can be enabled in this fashion? I ask because I get errors when enabling the 5th port on a host…
Duncan says
we are looking at those recommendations again as we speak. it appears that the level of load balancing won’t bring the expected benefits. I recommend using LACP instead at the moment!
http://www.yellow-bricks.com/2013/11/12/vsan-network-io-control-vds-part-2/
Jim Millard says
So what about the folks who—for whatever reason—are “stuck” with six or eight per-host 1Gbps uplinks and no way to do multi-chassis LAG+LACP?
I can predict that there will be MANY businesses that will have to make the choice to add the bare minimum for implementing VSAN, especially without any ability to factor in the licensing costs while we’re still in beta. Adding flash (and an HDD or two) is a pretty cheap upgrade compared to a network refresh that includes switches with cross-chassis aggregation.
Although I would agree that a 10Gbps uplift would benefit the entire datacenter, it’s a significant cost, especially if you’re also having to add higher-tier features like cross-chassis LAG+LACP.
Duncan says
I understand that and I gave the same Feedback. Not much I can change abouy it though. It is being looked at, and I recommend you file a support request.
Bill says
Hello! Amazing write up! I just started playing around with Vsan in my homelab, testing it out to see if it’ll help any of our clients. I did have a question pop up in my head which I can’t seem to find on Google.
If I create multiple vmkernel’s with Vsan enabled will LBT on my DVSwitch balance them across my uplinks?
Duncan Epping says
Yes they will be balanced when the utilization threshold is exceeded
Matteo T says
Hi Duncan excellent blog, just one question,
is infiniband supported in vsan? are there any plan to support RDMA implementations? can we re-use our ConnectX3 VPI Adapter?
Duncan Epping says
As far as I know infiniband is not supported today… Not sure if RDMA or ConnectX3 will be supported, I am far away from the certification process.
Srinivas says
Hey Duncan!!
I just wanted to check whether with the GA as well the limitation on the vmdk is still 2TB-512bytes!!
If so, what happens or what can be done if one needs more than that size of vmdk but still wants the vSAN capability!!
Would be great to know any flings that will work!!
duncan@yellow-bricks says
Yes, that is still the limit.
Rick Enright says
First all, I want to say how much I’ve loved your blogs on VSAN. You have done a terrific job documenting this exciting new feature.
I think the problem that customers are going to have is justifying the cost of the licenses. If you have 32 host with 8 PCPU’s, the cost of the licenses is $640,000. You can buy a lot of SAN for that kind of money. How can we justify that?
Duncan Epping says
To be honest, I don’t know many customers running 8 socket servers, 2 sockets still seem to be the sweet spot and may make more sense with VSAN… go higher in core count instead 🙂
Anyway, I am not focused on pricing and packaging and not the right person to comment.
CP says
Hi, For VSAN on Blade solution, most of the full blade can only support 4 HDD. Let’s assume the ESXi is SAN Booted and the remaining HDD is split to 1 x SSD + 3 x SAS/SATA. Would VMware recommend the use of VSAN?
Duncan Epping says
I guess it depends on what you are looking to do.
CP says
Hi, say if i just want to have my VMDK files on VSAN. VMs are just Windows OS with standard microsoft office products. Thanks.
GAP says
Do you recommend running smaller diskgroups such as 4 sets of 200GB SSD/ 3 600GB SAS or a 2 diskgroups 400GB SSD/ 6 600GB SAS? What would be the tradeoffs?
Tung Vu says
Hi Duncan,
Thanks for your great post.
I have a question regarding HBA requirement. Since VSAn leverage local disks, why it still requires HBA which usually is used for SAN?
Thank you.
hakalugi says
Tung, the LSI he gave as an example is your answer: “i” = internal sas adapter. When drives are put onto a raid card, drive health diagnostics are hidden by the raid card. and only the raid card’s virtual disks are seen by the host os/hypervisor. with an internal HBA in pass-through mode you see the drives themselves at the os/hypervisor level. (but it means you’re not doing raid1 on that card if it’s in passthrough mode). so consider booting off of another source.
gazi haber says
Tung, the LSI he gave as an example is your answer: "i" = internal sas adapter. When drives are put onto a raid card, drive health diagnostics are hidden by the raid card. and only the raid card's virtual disks are seen by the host os/hypervisor. with an internal HBA in pass-through mode you see the drives themselves at the os/hypervisor level. (but it means you're not doing raid1 on that card if it's in passthrough mode). so consider booting off of another source..
Shawn says
A quick question – can an SSD be in more than one disk group? Or does VSAN grab the entire SSD when you put it into a disk group? I have a server with an FIO card and four spindle drives in two H700 mirrors. I’d like to split the HDs into two separate disk groups, but I can use the FIO card in only one disk group.
Would partitioning the FIO card (not formatting, just partitioning) help at all?
Thank you!