Frequently asked questions about Virtual SAN / VSAN

Duncan Epping · Sep 16, 2013 ·

After I published the vSphere Flash Read Cache FAQ many asked if I would also do a blog post for frequently asked questions about Virtual SAN / VSAN. I guess it makes sense considering Virtual SAN / VSAN being such a hot topic. So here are the questions I have received so far, followed by the answers of course. If you have a question do not hesitate to leave a comment.

** updated to reflect VSAN GA **

Can I add a host to a VSAN cluster which does not have local disks?
- Yes a VSAN cluster can consist of hosts which are not contributing to VSAN storage. You will need to create a VSAN VMkernel and simply add it to the cluster. Note that you will need at a minimum 3 hosts which contribute storage to VSAN
VSAN requires an SSD, what is it used for?
- The SSD is used for read caching (70%) and write buffering (30%). Every write will go to SSD first and will be destaged to HDD later.
When creating my VSAN VM Storage Policy, when do I use “failures to tolerate” and when do I use “stripe width”?
- Failures to tolerate is all about availability, this is what you define when your virtual machine will need to be available when a host or disk group has failed. So if you want to take 1 host failure in to account, you define the policy to 1. This will then create 2 data objects and 1 witness in your cluster. Stripe width is about performance (read performance when not in cache and write destaging). Setting it to two or higher will result in data being striped across multiple disks. When used in conjunction with “failures” to tolerate this could potentially result in data of a single VM stored on multiple disks on multiple hosts.
Is there a default storage policy for VSAN?
- Yes there is a policy applied by default to all VMs on a VSAN datastore but you cannot see this policy within the vSphere UI. You can see that a default policy is defined to various classes using the following command: esxcli vsan policy getdefault. By default an N+1 failures to tolerate policy is applied so that even in the case where user forgets to create and set a policy objects are made resilient. It is not recommended to change the default policy.
How is data striped across multiple disks on a host when stripe width is set to 2?
- When stripe width is set to 2 first of all there is no guarantee that the data is striped across disks within a host. VSAN has it’s own algorythm to determine where data should be placed and as such it could happen that although you have sufficient disks in all host your data is striped across multiple hosts instead of disks within a host. When data is striped this is done in chunks of 1MB.
What is the purpose of “disk groups” since VSAN will create one datastore anyway?
- A disk group defines the SSD that is used for caching/buffering in front of a set of HDDs. Basically a disk groups is a way of mapping HDDs to an SSD. Each disk group will have 1 SSD and a maximum of 7 disks.
How many disks can a single host contribute to VSAN?
- Max 5 diskgroup
- Each disk group needs 1 SDD and 1 HDD at a mininum and 7 HDDs at a maximum
- HDD count max per host = 5 x 7 = 35
- SSD count max per host = 5 x 1 = 5
Are both SSD and PCIe Flash cards supported?
- Yes both are supported but check the HCL for more details around this as there are guidelines and requirements
Is 10GbE a hard requirement for VSAN?
- 10GbE is not a hard requirement for VSAN. VSAN works perfectly fine in smaller environments, including labs, with 1GbE. Do note that 10GbE is a recommendation.
Why is it recommended for HA’s isolation response to be configured to “powered-off”?
- When VSAN is enabled vSphere HA uses the VSAN VMkernel network for heartbeating. When a host does not receive any heartbeats, it is most likely that the host is also isolated/partitioned from a VSAN perspective from the rest of the cluster. In this state it is recommended to power-off the virtual machine as a new copy will be powered-on by HA on the remaining hosts in the cluster automatically. This way when the host comes out of isolation the situation where 2 VMs with the same identity are on the network does not occur.
Can I partition my SSD or disks so that I can use them for other (install ESXi / vFlash) purposes?
- No you cannot partition your SSD or HDD(s). Virtual SAN will only, and always, claim entire disks. With VSAN it probably makes most sense to install ESXi on an internal USB/SD card, this to maximize the capacity for VSAN.
Does VSAN support deduplication or compression?
- In the current version VSAN does not support deduplication or compression. The most expensive resource in your VSAN cluster is SSD/Flash, hence duplication of data is most relevant on that layer. While having multiple copies of your data results in two copies on HDDs, and two temporary copies in the distributed write buffer (30% of the SSDs), the distributed read cache portion of the Flash (70%) will only contain a single copy of any cached data.
Can VSAN leverage SAN/NAS datastores?
- VSAN currently does not support the use of SAN/NAS datastores. Disks will need to be “local” and directly passed to the host.
I was told VSAN does thin disks by default, if I set Object Space Reservation to 100% does that mean the VMDK will be eager zero thick provisioned?
- No it does not mean the VM will be thick provisioned, or a portion for that matter, when you define Object Space Reservation. Object Space Reservation is all about the numbers used by VSAN when calculation used disk space / available disk space etc. When Object Space Reservation is set to 100% on a disk of 25GB then this disk will be a thin provisioned disk but VSAN will do its math with 100% used of 25GB. I guess you can compare it to a memory reservation.
Does VSAN use iSCSI or NFS to connect hosts to the datastore?
- VSAN does not use either of these two to connect hosts to a datastore. It uses a proprietary mechanism.
What is the impact of maintenance mode in a VSAN enabled cluster?
- There are three ways of placing a host which is providing storage to your VSAN datastore in maintenance mode:
  1) Full Data Migration – All data residing on the host will be migrated. Impact: Could take a long time to complete.
  2) Ensure accessibility – VSAN ensures that all VMs will remain accessible by migrating the required data to other hosts. Impact: Potentially availability policies are violated.
  3) No Data Migration – No data will be migrated. Impact: Depending on the “failures to tolerate” policy defined some VMs might become unusable.
  The safest option is option 1, with option 2 being the preferred and default as it is the fastest to complete. I guess the question is why you are placing the host in maintenance mode and how fast it will become available again. Option 3 is a fall back, in caseyou really need to get into maintenance mode fast and don’t care about potential data loss.
Are there any features of vSphere which aren’t supported/compatible with VSAN?
- Currently vSphere Distributed Power Management, Storage DRS and Storage IO Control are not supported with VSAN.
How do I add a Virtual SAN / VSAN license?
- VSAN licenses are applied on a cluster level. Open the Webclient click on your VSAN enabled cluster, click the “Manage” tab followed by “Settings”. Under “Configuration” click “Virtual SAN Licensing” and then click “Assign License Key”.
How will Virtual SAN be priced / licensed?
- VSAN is licensed per socket, the price is $ 2495 per socket or $ 50,- per VDI user. Note that the license includes the Distributed Switch and VM Storage Policies, even when using a vSphere license lower than Enterprise Plus!
If a host has failed and as such data is lost and all VMs were protected N+1, how long will it take before VSAN starts rebuilding the lost data?
- VSAN will identify which objects are out of compliance (those which had N+1 and were stored on that host) and starts a time-out period of 60 minutes. It has a time-out period to avoid an unnecessary and costly full sync of data. If the host returns within those 60 minutes then the differences will copied to that host. When a VM has multiple mirrors it doesn’t notice the failure, this 60 minute period is all about going back to full policy compliance, i.e. being able to satisfy additional failures may they occur.
When a virtual machines moves around in a cluster will its objects follow to keep IO local?
- No, objects (virtual disks for instance) do not follow the virtual machine. Just imagine what the cost/overhead of moving virtual disks between hosts would be each time DRS suggests a migration. Instead IO can be done remotely. Meaning that although your virtual machine might run on host-1 from a CPU/Mem perspective, its virtual disks could be physically located on host-2 and host-3.
When a Virtual Machine is migrated to another host, is the situation such that after a vMotion the SDD cache is lost (temporary performance hit) and the cache will be rebuilt over time?
- No cache will not be lost and there is no need to rebuilt/warm the cache up again. Cache will be accessed remotely when needed.
Does VSAN support Fault Tolerance aka FT?
- No, VSAN does not support Fault Tolerance in this release.
The SSD in my host is being reported in vSphere as “non-SSD”. According to support this is a known issue with the generation of server I am using. Will this “mis-reporting” of the disk type affect my ability to configure a VSAN?
- Yes it will, you will need to tag the SSD as local using (example below is what I use in my lab, your identifier will be different). And in this case I claim it as being “local” and as “SSD”.
  esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device mpx.vmhba2:C0:T0:L0 –option “enable_local enable_ssd”
It was mentioned that it will take 60 minutes after a failure before VSAN starts the automatic repair. Is it possible to shorten this time-out value?
- **disclaimer: Although I do not recommend changing this value, I was told it is supported**
  Yes it is possible to shorten this time-out value by configuring the advanced setting named “VSAN.ClomRepairDelay” on every host in your VSAN cluster.
Why can’t I use datastore heartbeat functionality in VSAN only cluster?
- There is no requirement for heartbeat datastores. The reason you do not have this functionality when you only have a VSAN datastore is because HA will use the VSAN network for heartbeats. So if a host is isolated from the VSAN network and cannot send heartbeats, it is safe to say that it will also not be able to update a heartbeat region remotely as such making it pointless to enable this feature in a VSAN only environment.
Are there specific Best Practices around deploying View on VSAN?
- Yes there are, primarily around availability / caching and capacity reservations. Andre Leibovici wrote an article on this topic, read it!
Can the VSAN VMkernel of hosts in a cluster be part of a different subnet?
- VSAN VMkernel’s need to be part of the same subnet. Different subnet for one (or multiple) hosts within a VSAN cluster is not supported. When using multiple VMkernel interfaces per host each interface needs to be part of a different subnet!
Does VSAN support being stretched across multiple geographical locations?
- In the current version VSAN will not support “metro” clustering.
Is there a difference between a host failing and a disk gradually failing?
- Yes there is a difference. There are various failure stated and depending on the state it also determines how fast VSAN will spin up a new mirror. The two failure states are “absent” and “degraded”. Degraded is where a disks has failed and the system has recognized this as such and knows it isn’t coming back. In this case VSAN recognizes this “degraded” state and will create a new mirror of the impacted objects immediately, as there is no point in waiting for 60 minutes when you know it isn’t coming back soon. The “absent” state means that VSAN doesn’t know if it is coming back any time soon, this could be a host that has failed or for instance when you yank a disk, in this case the 60 minute time-out starts.
Is there any explanation around how VSAN handles disk failures or host failures?
- Yes, I wrote an article on this topic. Please read “How VSAN handles a disk or host failure” for more details.
What happens when an SSD fails in a VSAN cluster?
- An SSD sits in front of a Disk Group as the read cache / write buffer. When the SSD fails then the disk group and all the components stored on it are marked as degraded. VSAN will then instanties new mirror copies where applicable and when sufficient disk capacity is available. For more details read this post.
Does vSphere support TRIM for SSDs?
- No, TRIM is currently not supported/leveraged.
What are the Maximum Numbers for Virtual SAN GA?
- 32 hosts per cluster
- 100 VMs per host maximum
- 3200 VMs per cluster maximum
- 2048 VMs HA protected per cluster maximum
- 2 million IOPS tested
How do I size a VSAN datastore / cluster?
- I developed a sizing calculator which can be found here.
How do I monitor VSAN performance?
- Performance can easily be monitored using the VSAN Observer tool. This has been discussed by various people: here, here and here, here.
What’s likely to affect VSAN performance ?
- Performance is most likely affected by leveraging cheap flash devices or incorrectly configured policies. In the case a workload is highly random and has a large “working set” it could be that many of the IOs will need to come from disk, this can also impact performance depending on the disk type used and the number of disk stripes.
Why is Storage DRS not supported in VSAN ?
- VSAN only provides a single datastore and has its own placement and balancing algorithms.
What will happen when the whole environment goes down and power back on again ? Do we run some sort of integrity check ?
Is VSAN dependent on vCenter ? Can I configure VSAN if vCenter is down ?
- VSAN is not dependent on vCenter. It can be configured from the console using “esxcli” and can even be configured and used before vCenter is up and running. William Lam wrote two articles around how to bootstrap vCenter on a single host running VSAN. (here and here)
Could you have locality in VSAN ? Does locality make sense at all compared to other solutions ?
- By default VSAN does not have a “data locality” concept as I explained here. However, for View environments CBRC is fully supported and that provides a local read cache for desktops.
Is vCops aware of VSAN datastore?
- The current version of VC Ops has limited functionality in its current release. The upcoming version of VC Ops will include more statistics and ways of monitoring a VSAN datastore.
How do you backup your VM’s in VSAN ? Just usual existing backup procedures ?
- VDP supports VSAN and various storage vendors are going through testing/releasing a new version of their product as we speak. VMs stored on a VSAN database should not be treated differently then regular VMs.
Does VSAN support any data reduction mechanisms like deduplication or compression?
- In the current version deduplication or compression is not included.
x

If you have a question, please don’t hesitate to ask… Over time I will add more and more to this list so come back regularly.

Comments

Derek Seaman says

16 September, 2013 at 21:24

Can fault tolerant VMs run on VSAN?
- Duncan Epping says
  
  17 September, 2013 at 09:03
  
  I haven’t tested it if it will work, but it is definitely not supported in this first release.
  - Victor says
    
    28 October, 2013 at 09:28
    
    detail how to test the FT on storage vSAN
    - Duncan Epping says
      
      28 October, 2013 at 10:39
      
      VSAN does not support Fault Tolerance in this release.
- Joe says
  
  19 September, 2013 at 14:29
  
  Does Vshere support TRIM for the SSD’s
  - Duncan Epping says
    
    19 September, 2013 at 18:57
    
    No, Trim is currently not supported.
Kevin Kelling says

16 September, 2013 at 21:41

Understood that data on HDD does not stay with the host owning the VM. As for SSD is the situation such that after a vMotion the SDD cache is lost (temporary performance hit) and the cache will be rebuilt over time?
- Duncan Epping says
  
  17 September, 2013 at 07:48
  
  Cache is not lost, cache will remain where it sits… on the “remote” host.
Jeff Howard says

16 September, 2013 at 23:56

Favoring capacity over resiliency, what’s the best case scenario & configuration for VSAN? Using RAID as an example, a RAID-1/10 is the worst-case scenario with parity consuming 50% of your raw available storage. Then on the polar-opposite side of the scale a over-sized RAID-5 across an entire 24-disk shelf where parity only consumes 5% of your available raw storage.

So my question is, if you’re looking for capacity over resiliency, what’s the best case scenario for VSAN in terms of usable capacity of your underlying raw storage?

Thanks,

– Jeff
- Duncan Epping says
  
  17 September, 2013 at 09:10
  
  Not sure I understand the question correctly… For availability you configure the “host failures” option within your VM policy.
  “Failures to tolerate = 0” –> no copies of your object, single host failure will result in data loss
  “Failures to tolerate = 1” –> single copy of your object, a single host failure will not result in data loss
  “Failures to tolerate = 2” –> two copies of your object, two host failures will not result in data loss
  - Jeff Howard says
    
    17 September, 2013 at 18:00
    
    Sorry, to clarify – Say you have three hosts, each with 4x600GB HDDs and 1x400GB SDD. That yields:
    
    600x4x3 = 7,200GB RAW HDD Capacity, and 1×3 = 1,200GB RAW SDD Capacity.
    
    So you’re saying the best-case scenario having the minimum level of protection is “Failures to tolerate = 1”. So in a three-node cluster, is this analogous to to a RAID-5 of three disks where you get 66% usable raw capacity and 33% is used for parity striping? Meaning your datastore would be 66% of 7,200 = 4,752GB of usable space?
    
    Then logically expanding that out, does that mean in an 8-node cluster with “Failures to tolerate = 1” would yield 87.5% usable raw capacity and 12.5% reserved for parity striping? My end-goal is trying to understand what the “usable capacity” of X number of physical disks in X number of hosts would be if used in a VMWare vSAN.
    
    Thanks!
    - Jeff Howard says
      
      17 September, 2013 at 18:19
      
      More eloquently said, I’m making the broader assumption (based off the limited info in the vSAN whitepaper) that “RAIN” is similar to a “nested” RAID-50 where each disk group is a RAID-0, then the disk groups are all combined into a nested N+number_of_failures_to_tolerate RAIN. I’m probably over-simplifying, but that was my take from this line of the Whitepaper:
      
      “NOTE: Any disk failure on a single host is treated as a “failure” for this metric. Therefore, the object cannot persist if there is a disk failure on host A and another disk failure on host B when you have NumberOfFailuresToTolerate set to 1”
      
      Hopefully that makes sense.
    - Duncan Epping says
      
      17 September, 2013 at 19:01
      
      No it wouldn’t be like RAID-5. Let me give a simple example:
      3 host cluster
      1x 100GB HDD per host
      VM with 30GB
      
      Now when you have failures to tolerate set to 1, then two hosts will hold the 30GB VMDK and a third host will hold a “witness”, which is a ~2MB file. So in total 60GB will be stored, when the VMDK is fully consumed of course that is.
      
      Also, let it be clear… by default no data is striped across disks. If you want this, then you will need to define the stripe width.
      
      Hope that helps,
      - Jeff Howard says
        
        17 September, 2013 at 19:37
        
        Makes sense, thanks. In my layman’s terms, I would call this 50% raw storage utilization. Meaning, if Production’s forecast growth next quarter is 5TB, then I would need to purchase (at a minimum) 10TB of raw storage to accommodate. Our core product is a DSS application that storage 100’s of GBs per client, so “raw storage utilization” is a hot topic in terms of $/GB as it’s usually over half our annual IT spend.
        
        Thanks,
Daniel Johns says

17 September, 2013 at 08:51

The SSD in my host is being reported in vSphere as “non-SSD”. According to support this is a known issue with the generation of server I am using. Will this “mis-reporting” of the disk type affect my ability to configure a VSAN?
- Duncan Epping says
  
  17 September, 2013 at 09:05
  
  Yes it will, you will need to tag the SSD as local using (example below is what I use in my lab, your identifier will be different). And in this case I claim it as being “local” and as “SSD”.
  esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device mpx.vmhba2:C0:T0:L0 –option “enable_local enable_ssd”
Daniel Johns says

17 September, 2013 at 09:08

Awesome, thanks Duncan!
Hakim Attari says

17 September, 2013 at 09:11

is it as good as Pernixdata FVP ?
- Duncan Epping says
  
  17 September, 2013 at 09:49
  
  PernixData has a different implementation including write-back caching. It is difficult to say for me vFRC is “as good as vendor X” as that will depend on:
  1) what you are looking for in the solution (block vs file for instance)
  2) what kind of workload you are running
  3) what your budget is
  
  I think it is fair to say that vendors like Flashsoft and PernixData have a more flexible implementation because of the fact they do not pre-format the SSD drive and create “cache files per VM” like vFRC does. I would suspect, but I have not tested this, that their implementation will result in better performance. Also, vFRC will also need to be enabled on a per virtual machine/disk level where most other solutions allow you to “accelerate” a full VM.
  
  Having that said, both Flashsoft and PernixData will need to be licensed separately, meaning that budget is also a factor here. And there are of course caveats to every solution, for instance PernixData today doesn’t support VMs running on NFS datastores. I also haven’t seen either of them yet on the list of supported solutions.
  
  So is vFRC as good as Vendor X? It depends.
- Duncan Epping says
  
  17 September, 2013 at 10:13
  
  Also, I am assuming you wanted to post this in the vFlash FAQ instead… as VSAN is a completely different concept then PernixData.
Hakim Attari says

19 September, 2013 at 10:19

Thanks for correcting. Yes it should be under vFlash.
Well, Budget not an issue, i was looking for solution for our File Servers as we are redirecting User Proflie (500 vms) and common shares to FS.
Christian Hansen says

20 September, 2013 at 21:50

I know it is a bit overkill at the moment, but is mellanox 40Gbe adapters supported for Vsan configurations? espcially if you use highspeed PCIe SSDs i can see it has its uses.
- erikbussink says
  
  3 October, 2013 at 15:37
  
  Hello Christian. I’m setting up my lab to test just that, a backbone IPoIB (IP over InfiniBand) for my VSAN config. While I won’t have a 40Gbps QDR switch, I will start with Dual 10Gbps SDR links. But a dual 40Gbps per host is very appealing…. (think of VSAN, vMotion, FT backplane speeds)
  
  You can configure the Mellanox drivers with IPoIB just like a normal network.
  
  Now the gotcha with vSphere 5.5, and InfiniBand, are as follow.
  – vSphere 5.5 ships with the Mellanox 1.9.7 drivers.
  – vSphere 5.5 does not come with the Mellanox’s implementation of OpenFabrics Enterprise Distribution (OFED) for vSphere yet. Seems the target is for 2013Q3 or 2014Q1.
  
  A few are trying to get the previous InfiniBand stack that was working on 5.1 to work on 5.5.
  Hope this helps a bit.
  Erik
Tom says

26 September, 2013 at 04:38

I’d like to play with vSAN on a cluster of virtual (nested) ESXi hosts. Is there some way to emulate the SSD requirement?
- Duncan Epping says
  
  26 September, 2013 at 07:42
  
  Yes, you can tell ESXi that a disk is an SSD by using:
  
  esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device mpx.vmhba2:C0:T0:L0 –option “enable_local enable_ssd”
  
  where mpx.vmhba2…. should be replaced with your disk identifier.
  
  More detailed info to be found here: http://www.yellow-bricks.com/2013/09/02/testing-virtual-san-in-your-lab-with-vsphere-5-5/
  - Tom says
    
    26 September, 2013 at 13:38
    
    Ahh… I figured you had addressed that but couldn’t find the post. Thanks!
Frank Boeye says

1 October, 2013 at 09:15

It seems there is no TRIM support, however, isthere (now or in the future) a possibility to overprovision the SSD’s?
Consumer-level SSD’s (I’d imagine the Samsung 840 EVO will be used a lot in small deployments/labs due to price) will have their performance tank quite heavily when full. Overprovisioning will help with this a lot.
- Duncan Epping (@DuncanYB) says
  
  1 October, 2013 at 17:21
  
  There is no TRIM support today indeed. Can’t comment on any futures.
John Cronin says

1 October, 2013 at 17:06

One thought: Is SSD dedicated for cache use required if ALL internal disks are SSD?

An all-SSD solution would obviously not benefit much from having an SSD cache, though having a PCIe SSD cache and less expensive SAS/SATA SSDs for data might be a reasonable configuration. This should provide very good performance in conjunction with 10 gigabit Ethernet or Infiniband, and still be less expensive than using a SAN.
- Duncan Epping (@DuncanYB) says
  
  1 October, 2013 at 17:23
  
  Today you need “magnetic disks” and SSDs for VSAN. However you can of course just label your SSDs as non-SSD, that works fine in my lab and gives me screaming performance 🙂
  
  We’ve had requests for supporting all-flash VSAN, and it is something we are considering.
- Tom says
  
  3 October, 2013 at 21:03
  
  Hello, would you please put a short FAQ about the disk controller requirements??
  I remember seeing something somewhere about pass-through something or other being required…and it’s hard to relate this to different companies’ products…Thank you…
  - Duncan says
    
    7 October, 2013 at 17:48
    
    Pass through / HBA mode… Not sure how else to describe it. Just check the HCL before you buy / try.
Bjørn-Tore says

18 October, 2013 at 23:37

Does vSAN support wan replication betwen 2 vSans ?
- Duncan says
  
  19 October, 2013 at 01:09
  
  Using vSphere Replication: yes.
  - Georgios says
    
    22 September, 2014 at 15:48
    
    But is a WAN stretched-Cluster scenario supported ? Is it possible to have a full article on this ?
    - Georgios says
      
      22 September, 2014 at 16:06
      
      I found this article in one of your articles 🙂
      http://www.yellow-bricks.com/2013/10/31/vsphere-metro-storage-cluster-using-virtual-san-can/
      I am not sure if this is still valid.
David says

24 October, 2013 at 23:28

Are jumbo frames recommended?
Also, of the non-SSD dives, should they all be the same make/model/size/type, or can they be mixed?
for example, could a single VSAN host contain one SSD, one SAS drive and one SATA drive?
- Duncan Epping says
  
  25 October, 2013 at 08:59
  
  If you can ensure Jumbo Frames are implemented “end-to-end” then yes I would recommend them. VSAN can handle various drive types: SAS / SATA / Speeds / Capacity. However, for simplicity and predictability I would go with a consistent configuration every time.
nataraj says

9 January, 2014 at 13:17

i create a virtual environment with 3 esx hosts nodea0, nodeb0, nodec0
Disk allocation as below
25GB for OS
20GB SSD
50 GB for VSAN Disk
i followed the same for all 3 machines created a VSAN datastore of 125 GB, created a VM on it , created storage profile. Everything worked but i wanted to increase the size of the store so increased the size of the disk from 50 GB to 100 GB on all the machines so could i grow the volume ?
Andreas Paulsson says

5 February, 2014 at 17:02

Interresting concept. Will VSAN support Metro Clusters?
For example, we have three datacenters. DC-A has half of the hosts/storage, DC-B has the other half. In DC-C we have our management (vCenters) and storage witness (we use Lefthand and FOM). How would we incorporate this setup in to VSAN? Will it be possible at all?

I haven’t played around with VSAN except for the demo at VMworld Europe this autumn (which we had to leave early, it was freezing in the labb area!), so I am not sure how we can set policies for where to store copies of the data? If DC-A goes down, we would want the copy to exist at DC-B so HA can fail over all machines.

We do this with Lefthand today, it would be cool if we were able to do the same with VSAN as well. Less space to waste in the datacenters 🙂
- Duncan Epping says
  
  5 February, 2014 at 18:32
  
  http://www.yellow-bricks.com/2013/10/31/vsphere-metro-storage-cluster-using-virtual-san-can/
  - Andreas Paulsson says
    
    5 February, 2014 at 18:54
    
    Ah, perfect! Somehow I missed that URL, my bad 🙂
    
    Too bad about MSC support, hope for that in future releases then!
Mike says

23 March, 2014 at 23:54

Hello,

Customer and big VMware fan.

Just trying to sort out in my head a reasonable analogy between SAN/NAS and VSAN when it comes to putting a host in mainteamce mode. Most example configs I have seen show failures to tolerate of 1. Out on a limb here but if you are patching and updating your cluster, probably 2-4 times per year? I have never timed it but i am guessing 5-15 minutes to update each host. So at best 2×5=10 or at worst 4×15=60 minutes per year of time where x number of VMs data is not protected!

Is there an analogy for this on a SAN/NAS?

Is a number of failures to tolerate of one acceptable? Obviously it depends. Non-persistent VDI sure. But real production VMs, I think the minimum is 2.

Have not done any cost calculations yet. Server disks are a lot cheaper than SAN/NAS. I am curious to compare RAID5 4+1 losing 20% to parity to the cost of VSAN Failure to tolerate of 2 losing 66%. And that is probably not fair to VSAN. VSAN is yielding 66% of its space. But in theory that space will still contribute to uncached read performance???

So much to consider.
- Duncan Epping says
  
  25 March, 2014 at 19:13
  
  Sure the analogy is upgrading an array with 2 controllers… This is a rolling upgrade, if your remaining controller fails you are done.
  
  Also FTT=1 has nothing to do with doing maintenance in a proper way. You can still do maintenance if you have 4 hosts and sufficient disk capacity. When you go in to maintenance mode you just migrate all data of your host.
Eric Krejci (@ekrejci) says

25 March, 2014 at 16:12

Hi,

First let me thank you for all of your great posts that can be found on your blog.
I’ve a question regarding the mirroring, especially the path of an I/O from VM to its disk (vmdk).
As the VSAN is using SSD for cache in every disk group, how is handled replication and writ acknowledgment in the mirroring?
When a write I/O is performed, when does the VMs has its ack? The the Io arrive in the SSD of 1 host, of every hosts participating the N+1 of its component? Can you describe a little bit more this mechanism?

Thank you

Eric
- Duncan Epping says
  
  25 March, 2014 at 19:14
  
  http://www.yellow-bricks.com/2013/10/11/pretty-pictures-friday-vsan-edition/
  - Eric Krejci (@ekrejci) says
    
    27 March, 2014 at 17:49
    
    great! thank you
Doug says

2 April, 2014 at 21:56

The HCL for Cisco HDD drives only lists the 10k 900GB drive. Is this just because that is all they have tested or is this list definitive? I would much rather put 15k drives in if possible. I cannot imagine that VSAN would really care about the underlying drives, I can understand needing specific controllers though.
- Duncan Epping says
  
  2 April, 2014 at 22:24
  
  Indeed… Just ask Cisco when / if they will certify those!
Dima says

22 April, 2014 at 17:44

Hi, Duncan

I have a testing configuration with 3 physical hosts. Each hosts contribute 1SSD and 1HDD to the single diskgroup, network status is success. Everything from configuration point of view looks great.

But I only can run VM on 2 host of 3. When i choose the third host to run the VM a “A specified parameter was not correct.” error appears.

I opened ssh connection to my “wrong” host. And ls /vmfs/volumes/vsanDatastore returns multiple errors similar to “ls: vsanDatastore/8c574653-3499-bf06-07f3-002590c5857c: Function not implemented”
When i do the same on other “normal” host, I’m able to browse the vsan datastore and see all human readable folders (which actually are softlinks to GUID named folders)

Any ideas how to make it works?
Kelly says

22 April, 2014 at 22:11

Duncan,

Thanks for the great info. I’m considering implementing VSAN and I was told by a consultant that the CPUs in all the hosts need to be nearly identical but I am doubting this.

In my case I have two hosts that have Xeon E5-2660 chips with 8 cores per socket and I’m looking at ordering two more hosts that have Xeon E5-2670 v2 processors with 10 cores per socket. Will this make any sort of difference to VSAN? Does the amount of memory in each host matter either as long as each has the amount required by VSAN?

So far I have yet to read any VMware documentation that mentions this being an issue at all. My thinking is that as long as I make sure the storage configurations are the same in all the hosts then I should be fine. VSAN doesn’t care about CPU configurations in the hosts does it?

Thanks!
- Duncan Epping says
  
  22 April, 2014 at 22:18
  
  That is a requirement for VSAN at all… I would say that it is a “nice to have” for HA and DRS, but VSAN absolutely doesn’t care.
aaronwsmith143 says

23 June, 2014 at 15:41

When “Add disks to storage” is set to automatic, and at least 1 disk group per host is already formed, if you later add new HDDs disks to the host, is automatic expected to integrate those new drives into an existing disk group (assuming the requirements for automatic consumption are met, e.g. no existing partitions, etc?)

I would expect the answer to be “no”, but wanted to confirm. Otherwise, if a host already has 2 disk groups, how can VSAN know which group to add new disks to?
yousuf says

24 June, 2014 at 09:55

a) Describe a scenario in which you would prefer a write-back cache to a write-through one. Explain why a write-back cache should perform better.
You can describe the scenario in English, pseudo-code, actual code or any other way that would describe it clearly and unambiguously.
- Duncan Epping says
  
  24 June, 2014 at 11:01
  
  Not sure I am following your question…
Santo says

23 July, 2014 at 14:28

Hi Duncan,

Need to get a clear picture on how does the HA work in VSAN Cluster, does it still use the Master and Slaves mechanism? I understand that VSAN does not need a DataStore Heartbeat

Thanks!
m0nk3yrun says

13 August, 2014 at 07:05

Hi All,

Great FAQ thank you.

Is there a way to have an all flash nodes and is the solution supported by VMWare. Looking at the comments you can run command to mark the non caching SSDs as “non-SSD” which would allow you to add the disk into the cluster for data storage. Is there any other impact to marking the disk as a non-SSD e.g. the way data is written to the disks?

e.g. would be 1x cache SSD and 4x data SSDs for strorage per node.

Thanks,
Vaibhav says

4 September, 2014 at 09:58

Hi Frank,

Can you please confirm what the maximum size of SSD and of a Magnetic disk supported by VSAN?

Thanks
- Duncan Epping says
  
  4 September, 2014 at 10:05
  
  Not sure who you are asking, but let me (Duncan) answer it. The max size supported is the max size of the disks on the VMware VSAN HCL.
Bill C says

23 September, 2014 at 03:49

Hi Duncan,

Question on calculating available space or raw disk space required in vsan when changing the stripe policy? Is it considered the same as failures to tolerate or is it a different equation. I am trying to calculate total raw storage needed for vsan when all vms are set to FFT set to two and stripes set to two.
- Duncan Epping says
  
  23 September, 2014 at 12:14
  
  FTT will make a difference, striping should not.
Shagbark says

10 October, 2014 at 17:01

Does the VDS that is included with VSAN license only allow VSAN traffic? Reason being, we are trying to bring our cluster out of evaluation mode but cant because our vsphere license doesn’t include VDS. We have already applied the purchased VSAN license.
- Duncan Epping says
  
  10 October, 2014 at 18:08
  
  No it allows all sort of traffic!
  - Shagbark says
    
    10 October, 2014 at 19:14
    
    Great because we use all sorts! But we can’t get the licensing issue figured out? I found on another blog, that when you activate the VSAN license there is an “option” to enable VDS. Sound familiar? Maybe we missed activating it?
    
    Thank you so much Duncan. I love your site.
    
    4 Hours of fighting the VMware Licensing department over this has driven me up a wall.
Paul Abke says

16 October, 2014 at 17:50

Lots of great questions and replies. Has anyone adopted vSAN in productiion?
zazzn says

21 October, 2014 at 07:18

Excuse me for being new, I’m a complete newb to ESXi, i’ve been using vm workstation for about 10 years and VPC before it became Hyper V ect. Anyways, i’m trying to understand if it’s a stripe set that I want for disk resilience.

In this case say I have 7 HDD’s and 1 SSD the SSD is a 1TB samsung EVO (tlc so probably VERY bad to be used in the Vsan as it will probably degrade the drive very quickly).

Anyways, my 7 HD’s are various sizes and my goal is to pool the storage while providing resiliency sort of like MS storage spaces.

I have 1 ESXi host with a LSI sas+raid controller + 6 GB intel controller built in to the motherboard.

I simply want to make the local storage behave as a single datastore using Vsan, but I’d like the datastore to have the ability to be resilient so that if one disk fails there will be a back up of the data on another drive.
Georgios says

21 October, 2014 at 14:24

does a VM expands to more than one disk groups in VSAN ?

Related

Reader Interactions

Comments