I received a question a while back about the bandwidth requirements for long distance vMotion, aka live migration across distance. I was digging through some of the KBs around stretched clusters and must say they weren’t really clear, or at least not consistently clear…
Thanks everyone. Is Long Distance vMotion still requiring a minimum of 622 (1Gb) in current versions? /cc @duncanyb
— Kurt Bales (@networkjanitor) October 3, 2012
I contacted support and asked them for a statement but have had no clear response yet. The following statements is what I have been able to validate when it comes to “long distance vmotion”. So this is no VMware support statement, but my observations:
- Maximum latency of 5 milliseconds (ms) RTT (round trip time) between hosts participating in vMotion, or 10ms RTT between hosts participating with Enterprise Plus (Metro vMotion feature).
- <update>As of 2013 the official required bandwidth is 250Mbps per concurrent vMotion</update>
- Source and destination vSphere hosts must have a network interface on the same IP subnet and broadcast domain.
There are no longer any direct bandwidth requirements as far as I have been able to validate. The only requirement VMware seems to have are the ones mentioned above around maximum tolerated latency and layer 2 adjacency. If this statement changes I will update this blog post accordingly.
PS: There are various KBs that mention 622Mbps, but there are also various that don’t list it. I have requested our KB team to clear this up.
Massimo Re Ferre' says
As far as I remember 622MB was the requirement called out in the IBM solution for vSphere stretched clusters. Similar docs from NetApp or EMC didn’t call out a minium bandwidth requirement.
Duncan Epping says
Old VPLEX and Metro docs call it out Massimo, the new ones don’t. Ambiguity all over the place.
Harry says
Is enhanced vmotion feature not available in vSphere Standard and Enterprise
Duncan Epping says
Sorry that should have said “Metro vMotion” which is only part of Enterprise Plus!
Michael McNamara says
Hi Duncan,
While it’s probably not supported by VMware, Vmotion between hosts via Layer 3 is possible correct?
Thanks,
Mike
Duncan Epping says
It is possible indeed. I have done this in the past, and it has worked okay for me.
James says
Why the same subnet requirement? I have a 1gb connection spanning two offices 40 miles or so with 1-2 ms latency and this would be fun to try. My MPLS is using numbered interfaces as of now and not sure I can get it changed.
Duncan Epping says
This is what has been tested and certified by VMware. It doesn’t mean going across multiple L2 domains won’t work. I tested it various times and it worked fine with vMotion / DRS. But stuff like DPM (wake on lan) might not work as expected.
Dmitri Kalintsev says
Hi Duncan,
Would it be correct to assume that the 5 / 10 ms refers to the round trip time (RTT)? Would be also nice to have a statement on the packet size for the 5/10 ms figure, but that would be splitting hairs somewhat…
Duncan Epping says
RTT indeed. let me clarify that in the post,
Ravindra Neelakant says
vMotion in vSphere 4.1 was limited to 5ms RTT (~400 KMS – assuming a ~1ms RTT within the two data centers or locations) between ESX hosts and to an OC12 (622 Mbps) bandwidth per vMotion.
In vSphere 5.0 this limits were changed to 10ms RTT (~950 KMS) and the bandwidth requirement was reduced to 250 Mbps/vMotion. This is a licensed feature under vSphere 5.0 Enterprise Plus.
These requirements are along with the existing requirements for the VM that is being migrated to stay in the same L2 domain.
In order to use this in a realistic manner there are 4 components that need to be considered to ensure the proper working of Metro vMotion.
Component 1 : Storage needs to be shared across the distance 10ms RTT. As of now VPLEX-Metro will support this. Officially VPLEX-Metro supports only 5ms RTT, but for the Metro vMotion use case, the EMC rep will get an RPQ approved for 10ms RTT.
Component 2 : Migration network, can be an L3 network as long as the vSphere hosts can communicate on this network
Component 3 : The IP network on which the migrating VM resides should be available on both the vSphere hosts
Component 4 : Ensuring clients connections from outside to one data center before migration is not terminated after migration and new connections are routed to the new data center intelligently in the network layer. By far this is the most complicated part of the solution.
VMware has tested this solution end-to-end with several vendors and has solution white papers to show how the four components of the solution can be implemented.
There were questions about packet size, for this as long as the bandwidth for migration is available, and migration does not time out, it does not matter what packet size you use 1500 or Jumbo frames. The recommendation is with standard packet size of 1500 MTU.
Duncan Epping says
To be clear, not just EMC VPLEX. There are various vendors out there like NetApp and HP who have been offering stretched storage for a long time now.
mB says
thanks for the very detailed explanation, ravindra!
and thanks duncan, for this useful post.
regards.
mb says
How does vmotion measure latency?
Let’s say you start a vmotion on a stretched layer 2 link and the latency is 1ms when starting but doing the vmotion increases the latency to over 10ms, will vmotion fail or be canceled by vcenter?
And how does that enterprise license feature to enable over 5ms behave? Will vcenter deny you to do a vmotion of the link is measured over 5ms? Seems kind of weird to have a license restriction on network latency :-S
Duncan Epping says
There is no license restriction on latency, but the vMotion process may fail. Now if you have Enterprise Plus the Metro vMotion functionality can kick in and change the size of the socket buffer to allow for a successful fail-over.
note that 5ms and 10ms are support statements, no hard-stops.
mb says
Thanks for clearing that up 🙂
It’s good to know it isn’t a “hard limit”