I was reading the Virtual SAN Data Locality white paper. I think it is a well written paper, and really enjoyed it. I figured I would share the link with all of you and provide a short summary. (http://blogs.vmware.com/vsphere/files/2014/07/Understanding-Data-Locality-in-VMware-Virtual-SAN-Ver1.0.pdf)
The paper starts with an explanation of what data locality is (also referred to as “locality of reference”), and explains the different types of latency experienced in Server SAN solutions (network, SSD). It then explains how Virtual SAN caching works, how locality of reference is implemented within VSAN and also how VSAN does not move data around because of the high cost compared to the benefit for VSAN. It also demonstrates how VSAN delivers consistent performance, even without a local read cache. The key word here is consistent performance, something that is not in the case for all Server SAN solutions. In some cases, a significant performance degradation is experienced minutes long after a workload has been migrated. As hopefully all of you know vSphere DRS runs every 5 minutes by default, which means that migrations can and will happen various times a day in most environments. (Seen environments where 30 migrations a day was not uncommon.) The paper then explains where and when data locality can be beneficial, primarily when RAM is used and with specific use cases (like View) and then explains how CBRC aka View Accelerator (in RAM deduplicated read cache) could be used for this purpose. (Does not explain how other Server SAN solutions leverage RAM for local read caching in-depth, but sure those vendors will have more detailed posts on that, which are worth reading!)
Couple of real gems in this paper, which I will probably read a couple of times in the upcoming days!
Doug B says
As a someone who has dealt with storage for a long time, I was initially concerned about data locality with respect to VSAN.
However, once I made the connection that the synchronous writes required to ensure data consistency traverse the “storage network” — whatever network you define and configure for the “Virtual SAN Traffic” vmkernel interface — I realized that it technically does not matter which host in the cluster contains the blocks unless I am running without a replica of my data.
When I write a block to a VSAN datastore, if the host my VM is running on happens to have a replica, I might get a really fast write acknowledgement (ACK) from the local SSD to indicate that the write has been safely committed, but I still need to wait for the response from any remote hosts that contain a replica. This is by design to protect the integrity of my data, and is definitely not unique to VSAN. All synchronous replication technologies that I have encountered to date implement a very similar mechanism. So, either way, you take the “hit” of going across the network. This just tells me that I need to design that network to have low latency and enough bandwidth to handle my workloads.
When you think about it, if I tried to implement a “lazy ACK” model to speed things up, that comes at the cost of data integrity. I am no longer within the realm of “synchronous” data replication, so there is a chance that my data is not consistent across my cluster. In this model, when a host goes offline without warning, I might lose data. This is not something we tolerate for local storage. For data that is replicated to a DR site, the laws of nature get involved and often require us to make concessions 🙂
Stefan Gourguis says
Page 5:
“Every time application data is read by a virtual machine, Virtual SAN saves a
copy of the data in the Read Cache portion of the flash device associated with
the disk group where the copy of the data resides. Temporal locality implies
that there is high probability that said data will be accessed again before long.”
But how does the vSAN Raid1 Algorithm works in Read ?
Will Data be accessed in an stripped Manner ( Like Raid0 does) .
Cause the Copy of the Data resides in 2 Places (Disk Groups)
Will the ReadCached Data be in both Flash Devices (of the Disk Groups that are forming the Raid1)?
Best Regards
Stefan