• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

vSphere Metro Storage Cluster storage latency requirements

Duncan Epping · Feb 5, 2013 ·

I received some questions today around the storage latency requirements for vSphere Metro Storage Cluster (vMSC) solutions. In the past the support limits were strict:

  • 5ms RTT for vMotion for Enterprise license and lower, 10ms RTT for vMotion for Enterprise plus
  • 5ms RTT for storage replication

RTT stands for Round Trip Time by the way. Recently, and today I noticed I never blogged about this, the support limits have changed. For instance EMC VPLEX supports up to 10MS RTT for vMotion (not fully tested for stretched cluster / vSphere HA). Which indeed makes a lot of sense to have it aligned with the vMotion limits as more than likely the same connection between sites is used for both storage replication and vMotion traffic.

So I would recommend anyone who is considering implementing a vMSC environment (or architecting one) to contact your storage vendor about their support limits when it comes to storage latency.

Share it:

  • Tweet

Related

Server, Storage latency, Storage, vmsc

Reader Interactions

Comments

  1. James Hess says

    6 February, 2013 at 02:42

    That’s useful… all too often, when people are talking about cold sites and geographical distribution, backups, all at the same time as wanting not a second of data loss, or more than a few seconds of failover time in case of a disaster: I find lots of people seem to totally ignore (or forget) about issues such as latency and concerns such as split-brain (total break in datacenter connectivity between a two site stretched cluster, with none holding quorum is also disasterous). A stretched cluster “sounds so appealing” on the surface, because the theoretical benefits are so great; If it weren’t for ugly little engineering limitations, including the speed of light, and necessary failover delays, every enterprise should have one, otherwise: if the clustering concept itself weren’t a significant risk of failure and data loss.

    >5ms may be technically supported, but should it be recommended? Nope.
    Especially for virtualized applications, where the added up IO queue latency and synchronous write latency increase due to commit-on-replicate could be significant.

    I think folks rapidly forget some risks involved, and it becomes difficult to convince folks that 1, or 10 gigabits of bandwidth across the continent won’t work so well. For now I would say current stretched cluster options seem to be a cute hack at best. [A cool hack, but a hack.]

    Then there’s the matter of Layer 2 networking extension resulting in inefficient routing, and extension of certain failure domains across sites in stretched cluster designs — and horrendous new-failure-mode-inducing complications to routing to attempt to correct.

    People also think the network must be broke, if they can’t FTP or Rsync a file over TCP between datacenters at 30ms+ latency, and get a transfer at the full 1000 Megabit speed rate of that private link (without realizing the need to adjust protocol choices or buy expensive TCP optimization solution).

    And then there’s a lot of the thinking that disk drive performance is MB/sec — latency gets forgotten there too.
    “Wow, this must be the best 2TB SATA disk drive ever… I can get 500 MB/sec out of it..
    This will be perfect for virtualizing SQL servers… I’ll just put a few of them in RAID5, to improve performance and make sure the data is very safe.”
    “At what latency, though?”
    “Huh? Latency? I’m sure it doesn’t matter. I can copy a 5 Gig file to it in 10 seconds.”
    “How many mean random 4K write IOPS, and what standard deviation?”
    “What’s a write IOP?”

    See… someone could probably write a book on things that the vendor should have been asked about, but you erroneously assumed weren’t important; or didn’t understand to ask. I would favor proactive vendors — a responsible vendor should inform _you_ of this fact, before they let you buy their product for a stretched cluster 🙂

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007) and the author of the "vSAN Deep Dive" and the “vSphere Clustering Technical Deep Dive” series.

Upcoming Events

04-Mar-21 | Polish VMUG – Roadshow
09-Mar-21 | Austria VMUG – Roadshow
16-Mar-21 | VMUG Turkey – Roadshow
18-Mar-21 | St Louis Usercon Keynote
26-Mar-21 | Hungary VMUG – Roadshow
08-Apr-21 | VMUG France – Roadshow

Recommended reads

Sponsors

Want to support us? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2021 · Log in