VXLAN requirements

Duncan Epping · Oct 4, 2012 ·

When I was writing my “Configuring VXLAN” post I was trying to dig up some details around VXLAN requirements and recommendations to run a full “VMware” implementation. Unfortunately I couldn’t find much, or at least not a single place with all the details. I figured I would gather all I can find and throw it in to a single post to make it easier for everyone.

Virtual:

vSphere 5.1
vShield Manager 5.1
vSphere Distributed Switch 5.1.0
Portgroups will be configured by vShield Manager, recommend to use either “LACP Active Mode”, “LACP Passive Mode” or “Static Etherchannel”
- When “LACP” or “Static Etherchannel” (Cisco only) is configured note that a port/ether channel will need to be created on the physical side
- “Fail Over” is supported, but not recommended
- You cannot configure the portgroup with “Virtual Port ID” or “Load Based Teaming”, these are not supported
Requirement for MTU size of 1600 (Kamau explains why here)

Physical:

Recommend to have DHCP available on VXLAN transport VLANs, fixed IP also works though!
VXLAN port (UDP 8472) is opened on firewalls (if applicable)
Port 80 is opened from vShield Manager to the Hosts (used to download the “vib / agent”)
For Link Aggregation Control Protocol (LACP), 5- tuple hash distribution is highly recommended but not a hard requirement
MTU size requirement is 1600
Strongly recommended to have IGMP snooping enabled on L2 switches to which VXLAN participating hosts are attached. IGMP Querier must be enabled on router or L3 switch with connectivity to the multicast enabled networks when IGMP snooping is enabled.
If VXLAN traffic is traversing routers –> multicast routing must be enabled
- The recommended Multicast protocol to deploy for this scenario is Bidirectional Protocol Independent Multicast (PIM-BIDIR), since the Hosts act as both multicast speakers and receivers at the same time.

That should capture most requirements and recommendations. If anyone has any additions please leave a comment and I will add it.

** Please note, proxy arp is not a requirement for a VXLAN / VDS implementation, only when Cisco Nexus 1000v is used this is a requirement **

References:
VXLAN Primer by Kamau
vShield Administration Guide
Internal training ppt
KB 2050697 (note my article was used as the basis for this KB)

Comments

Justin G. says

4 October, 2012 at 15:49

Is there any documentation available that specifies why Virtual Port ID and LBT are not supported?
- Duncan Epping says
  
  4 October, 2012 at 15:56
  
  Not that I have seen. By the way, when configuring VXLAN they are not even presented as an option.
  - Justin G. says
    
    4 October, 2012 at 22:43
    
    Right. I think the confusion lies with LACP is tied to Uplink Port Group whereas the failover mode for the VXLAN port group is initially set to “Use explicit failover order” when created. If I enable LACP on the Uplink Port Group, then I’d want to update the VXLAN port group to use “Route based on IP Hash”.
    
    I’m curious is LACP absolutely required on the Uplink Port Profile, because so far, there have been no warnings saying this needs to be enabled when setting up vCloud.
Kamau Wanguhu says

4 October, 2012 at 15:55

@JustinG: Not supported ( as in not an option) as Virtual Port ID and LBT only applies when you have more than one active NIC attached to a portgroup. You do have to run IP Hash load balancing when using teamed NICs though to get a good distribution of traffic across your team.
Karam Abuataya says

4 October, 2012 at 16:28

@Duncan, what i don’t understand yet is that as per the requirement for all the vCloud Networking and security editions is that enterprise plus is a major requirement. although some customers might have the enterprise plus editions to make the software services works. other customers might have standard or enterprise editions. i am saying this because all of the documentations are talking about the distributed switch requirement and setup.

i am just thinking about the support compatibility.
- Duncan Epping says
  
  4 October, 2012 at 16:48
  
  Distributed vSwitch is a requirement for VXLAN –> so you need Enterprise Plus. Not sure what the confusion is?
Patrick Fagen says

4 October, 2012 at 17:00

Do you know what the fallback/consequences may be with upping the MTU to 1600 on all switches? Consider a flat network and only 1 VLAN and subnet. You’d have to up the MTU on all production switches and firewall routers on both sides (each datacenter) – I feel like this may cause some packet fragmentation with all the other devices (pc, thin client, printers) that share some of these switches/firewalls.
- Duncan Epping says
  
  4 October, 2012 at 18:41
  
  I am not sure I am following it. But you can set MTU on a port level. I also think it is very unlikely that a datacenter with a flat VLAN and a single subnet is going to implement VXLAN. VXLAN is targeted at large environments.
  
  Some more details about fragmentation can be found here: https://supportforums.cisco.com/thread/2062337
  - Patrick Fagen says
    
    4 October, 2012 at 20:31
    
    Well the gateway port on the firewall coming into the network will have to be 1600 – a lot of data passes through this port right? Both originating from user traffic and VXLAN traffic. We are a small shop with alot of sensitive data so this is kind of our setup and we will be implementing VXLAN.
Ray Budavari says

4 October, 2012 at 20:36

Another requirement if VXLAN is traversing routers (providing L2 adjacency over L3 networks) at least today, is that you enable Proxy ARP on the first hop routers. This is because the VXLAN does not use the host routing table to ensure the VXLAN vmknic is used for traffic between VTEPs.
- Ray Budavari says
  
  26 October, 2012 at 21:16
  
  After some internal discussion and testing I have to update the statement above. Proxy ARP is only required in the Nexus 1000v implementation of VXLAN. VXLAN on a VMware VDS does not require Proxy ARP if VTEPs are in different L2 networks.
  - Duncan Epping says
    
    29 October, 2012 at 11:09
    
    Thanks Ray!
Patrick Fagen says

4 October, 2012 at 21:38

I guess a better question is – can this even be used over a VPN?
- Duncan Epping says
  
  4 October, 2012 at 22:01
  
  I am not sure I am following it. You want to do VXLAN from where to where across what using what?
  - Patrick Fagen says
    
    4 October, 2012 at 22:47
    
    Assume my stretched cluster was from OfficeA to an Colo over 100mbit link – but the link is not Metro-E it’s a VPN.
    
    VXLAN offers the following benefits:
    
    Flexibility: Datacenter server and storage utilization and flexibility is maximized through the support of “stretched clusters” that cross switching and pod boundaries
    Streamlined Network Operations: VXLAN runs on standard Layer 3 IP networks, eliminating the need to build and manage a large Layer 2 underlying transport layer.
    Investment Protection: VXLAN runs over standard switching hardware, with no need for software upgrades or special code versions on the switches.
Ron says

4 October, 2012 at 22:40

Hi Duncan;

Correct me if I’m wrong but the recommendation to have DHCP on the VXLAN transport VLANS is only a requirement of the Vmkernel interfaces (vmknics) that get auto-magically added to each ESXi Host participating in the VXLAN. Correct?

Also, the VXLAN attributes, such as “LCAP”, “Static Etherchannel” and “Failover” affect the “vDS” Teaming Policy (not Port group) and are configured within vShield Manager whereas the requirement to use “explicit failover order” or “route based on IP hash” is really part of the “Port Group” Teaming and Failover Policy and is configured via the Web/vSphere Client interface only..

Thx
- Duncan Epping says
  
  4 October, 2012 at 23:14
  
  Hi Ron,
  
  With regards to DHCP you are correct, this is for the vmknics that carry the VXLAN traffic. They are configured with DHCP by default, but static also works… it is just a matter of correcting the config at that point.
  
  With regards to the teaming policy, the selected teaming policy (lacp / static / failover) dictates (as far as I have seen) how the VDS is configured. I will validate that tomorrow when I have the time.
  - Brandon says
    
    23 October, 2012 at 01:01
    
    Meaning you fix it directly on the vmkernel interface… right? Just checking — I’m configuring this now, and want to make sure I understand. We’re not going to do cross-site VXLAN (yet) or probably ever — so to get all these things talking, its probably best to use a /8 network since we have the freedom of doing it that way. In our case, it doesn’t need to be routable…
Ron says

9 October, 2012 at 17:42

Hi again Duncan;

I have one more question related to VXLAN requirements that I’m hoping you can answer as well.

When values for the “Pool of Segment IDs” and “Multicast Address Range” are identified within vShield Manager, can this be an arbitrary pool and range or do “static/pre-determined” entries need to be identified by the Network Ops Team? Reason I ask is because we, as vCloud Admins, do not have visibility or access to the physical network and merely want to ensure we are aware of and communicate all backend requirements in advance.

Thx.
Lee says

8 November, 2012 at 15:07

Hi Duncan, another invaluable article, thanks.

I’ve just configured VXLAN on our setup, a couple of observations/comments:

1. We are using a pair of 10Gb adapters for our dVS which also carry iSCSI. As the requirement on an iSCSI vmk is to only have a single NIC, this means our dVS/adapters cannot participate in any kind of switch adapter teaming. Therefore the *only* possible configuration is “Fail Over”. I do hope it stays as supported 🙂

2. Why DHCP on the new vmk used for transport? DHCP doesn’t really go hand in hand with datacentre infrastructure, and what if your DHCP server was unavailable? I have configured static IP, again I do hope I don’t get caught out !
- Duncan Epping says
  
  8 November, 2012 at 16:50
  
  1) as far as I am aware “failover” will stay supported
  2) this is the default they selected, but if no DHCP is available fixed is fully supported / tested and should not give any problems
  - Lee says
    
    8 November, 2012 at 16:54
    
    Many thanks. I just found an update from Kendrick “I had some DHCP lease issues today. Changing them to Static fixed the issue right up.”.
    
    Static addressing FTW !
Kulin says

8 February, 2013 at 03:14

Great post Duncan. Just a couple points wanted to throw out there. An all software based VXLAN, certainly doable, does cause a level degradation in performance (Look for the VXLAN VMware ESX performance evaluation paper by VMware) and hence in some cases it makes perfect sense to introduce hardware VTEPs. Arista 7150 series switches support VXLAN & provide two fold benefits. 1. Offloading compute resource consumption from ESX hosts & 2. bringing bare metal servers, appliances etc that are non-vxlan aware into the vxlan domain by serving as a gateway.

Interesting joint demo between VMware & Arista at VMworld (http://www.aristanetworks.com/media/system/pdf/VMworld_Demo_Brief.pdf)
Lee says

19 April, 2013 at 17:34

Hi Duncan.

Here’s a good’un. When you create a virtual wire you’re allowed to name it “nicely”, then a dvs portgroup is created based on that name. You can later on rename the virtual wire if you wish inside VCNS. I have been asked whether its OK to rename the dvs portgroups though. Certainly in my testing I’ve found no issues with renaming them and it makes life for vSphere Admins when choosing networks to hook vNICs upto. But I wondered if you had any thoughts on this like “Hell no!! Leave them alone!!” 🙂
cuongds says

22 August, 2013 at 04:36

Hi Bricks, thank for your great post. I have one strange issue with vxlan communicate via vshield edge even everything met this post’s requirements.
This is my scenario:
vSphere 5.1
vShield Manager 5.1
vSphere Distributed Switch 5.1.0
vCloud Director 5.1
One organization have two VMs and one Edge switch same virtualwire. MTU set to 1600 on both Virtual network vCenter and Cisco UCS blade vNIC
When ping default packet: VM1 and VM2 can communicated as well even VM1 and VM2 on sit on same host or difference host (blade UCS Cisco) no matter where does Edge Switch sit.
When ping packet size over 1422: VM1 and VM2 sit same host is ok, but if VM1 and VM2 (or Edge Switch) sit on difference host, all packets will be drop (time out ping result)
Could you know what happen? Thanks.
Regard,
- Kamau says
  
  22 August, 2013 at 09:57
  
  You need to make sure on the physical switches that the blades are attached to that you have a minimum MTU of 1600 also. The MTU change needs to happen along the whole path that the VM packets will traverse. Symptoms are usually you can send small packets just fine but any sizable data is lost. An example would be you can start an SSH session, log in and lockup if you tried to do a simple thing like “ls -alF” on a directory.
- cuongds says
  
  24 August, 2013 at 11:45
  
  Hi, have anyone experienced this issue? I search one topic on google about ICMP multicat packets, this maybe is root cause of that problem? https://supportforums.cisco.com/thread/2013429 Do you think so?
  Regard,
cuongds says

22 August, 2013 at 11:30

Hi Kamau,
Thank for your help .But as i post at first comment. UCS Cisco vNIC (act as layer 2 switch) also set to 1600.
Regard,
David Pasek says

10 September, 2013 at 10:36

Hi Duncan. Very informative blog post but In vxlan requirements for physical switches you have strict requirement “5- tuple hash must be enabled”. This is IMO very strict requirement not achievable on most access switches. AFAIK this hash algorithm is available on core switches like Nexus 7k or Force-10 E Series. It confused me especially because there is also KB article about it based on your blog post. Can you revalidate and update it to avoid confusion of other folks? Thanks.
- Lee Christie says
  
  10 September, 2013 at 11:43
  
  David,
  
  Duncan has simply copied these from the VMware requirements KB2050697.
  
  5-tuple hashing is a requirement of LACP. However you do not need to run LACP in order to use VXLAN. We are using VXLAN with Failover policy just fine on simple “access” switches. I don’t see any confusion here, just match the requirements/implementation to your infrastructure.
  Lee.
  - Duncan Epping says
    
    10 September, 2013 at 12:35
    
    Euuh, I did not simply copy anything from the KB. The KB folks have leveraged my material as a source for their KB article.
    - Lee Christie says
      
      10 September, 2013 at 12:42
      
      Ah my bad, apologies. I had assumed that an official VMware KB article would be the authority not the copy!
      - Duncan Epping says
        
        10 September, 2013 at 12:47
        
        As a VMware employee many of my articles (same applies to Cormac Hogan for instance) are used to create new KB entries. No problem, I can see where the confusion comes from 🙂
- Duncan Epping says
  
  11 September, 2013 at 09:40
  
  Just commenting here for completeness, 5-tupple hash is not a requirement but a recommendation. Sorry for the confusion.
  - David Pasek says
    
    11 September, 2013 at 10:01
    
    I’ve got the same information from @fojta and I was happy with that response but the same information about 5-tuple hash is on VMware vShiled Administration Guide (http://www.vmware.com/pdf/vshield_51_admin.pdf) page 48.
    “For Link Aggregation Control Protocol (LACP), 5- tuple hash distribution must be enabled”
    That’s probably where you found it initially so you are not guilty 🙂
    Is it the bug in doc? If so then the bug in DOC started confusion. Correct?
    
    I have engaged customer’s VMware TAM to give me some official VMware statement. Sorry for making troubles but I hope you understand I want to be safe and give my customer the value he is paying for me 🙂
    
    Hope this will help other folks designing vxlans and avoid confusion.
    - Duncan Epping says
      
      11 September, 2013 at 10:44
      
      I don’t know where this piece of info came from, could have been from the doc or a different internal doc. Typically all of these come from a common set of engineering deepdives / presentations.
      
      Anyway, we have filed a bug for the documentation and it will be updated over time. As I am a VMware employee also and have validated this with PM feel free to have the TAM contact me directly via email.
David Pasek says

11 September, 2013 at 00:49

Lee and Duncan, thanks for quick responses.
But to be honest I’m still confused.

I was writing the follow up comment directly here but it was somehow accidentially deleted 🙁
It inspired me to rewrote it in VCDX like style 🙂 and also published it in better format on my blog at http://blog.igics.com/2013/09/what-type-of-nic-teaming-loadbalancing.html

Here we go …

VMware VXLAN Information sources:
S1/ VMware vShiled Administration Guide
S2/ VMware KB 2050697
S3/ Duncan Epping blog post here.
S4/ VMware VXLAN Deployment Guide

Design decision point:
What type of NIC teaming, loadbalancing and physical switch configuration to use for VMware’s VXLAN?

Requirements:
R1/ Fully supported solution
R2/ vSphere 5.1 and vCloud Director 5.1
R3/ VMware vCloud Network & Security (aka vCNS or vShield) with VMware distributed virtual switch
R4/ Network Virtualization and multi-tenant segmentation with VXLAN network overlay
R5/ Leverage standard access datacenter switches like CISCO Nexus 5000, Force10 S4810, etc.

Constraints:
C1/ LACP 5-tuple hash algorithm is not available on current standard access datacenter physical switches mentioned in requirement R5
C2/ VMware Virtual Port ID loadbalancing is not supported with VXLAN – Source: S3
C3/ VMware LBT loadbalancing is not supported with VXLAN – Source: S3
C4/ LACP must be used with 5-tuple hash algorithm – Source: S1 on Page 48

Available Options:
Option 1/ Virtual Port ID
Option 2/ Load based Teaming
Option 3/ LACP

Option comparison:
Option 1: not supported because of C1
Option 2: not supported because of C2
Option 3: not supported because of C1 & C4

Design decision and justification:
Based on available information neither option complies with requirements.

Other alternatives not compliant with all requirements:
1/ Use physical switches with 5-tuple hash loadbalancing. That means high-end switch models like Nexus 7000, Force10 E Series, etc.
2/ Use CISCO Nexus 1000V with VXLAN. They support LACP with any hash algorithm. 5-tuple is also recommended but not required.

I hope some information in constraints C2, C3, and C4 are wrong and will be clarified by VMware.
- Lee Christie says
  
  11 September, 2013 at 08:53
  
  So snipping away at the above, you’re asking “What type of configuration to use with my access switches (that don’t support 5-tuple hash)?”
  
  VXLAN teaming policy can be
  a) Fail Over
  b) Static Etherchannel
  c) LACP (Active or Passive)
  
  As you cannot use LACP, if NIC teaming is still a requirement, use Static Etherchannel. This is a fairly basic bit of functionality, although cross switch etherchannel starts to get a bit tricky depending on your vendor. I’d thoroughly recommend Arista to anyone reading since their 10Gb performance and latency is excellent, cross switch etherchannelling is implemented via MLAG so you still get separate control planes per switch and they aren’t at all expensive!
  
  In my implementation because we use a pair of 10Gb adapters for all kinds of traffic but specifically iSCSI, the NICS cannot be teamed (iSCSI wants separate paths, not teaming), so we went with Failover.
  
  Your post currently reads as if there are no options available for VXLAN unless you have high end switches, which simply isn’t the case so I’d recommend a re-wording.
  Lee.
  - David Pasek says
    
    11 September, 2013 at 11:37
    
    Lee,
    
    First of all Duncan already stated that any hash algorithm can be used for LACP. “5-tuple hash” is highly recommended because it is obviously the best load balancing algorithm but it is not strict requirement.
    
    So next lines are little bit off topic but maybe I’m missing something and you can explain something I don’t know.
    
    To be honest I don’t understand where are benefits of static etherchannel in this particular situation. I’m very familiar with CISCO vPC and Force10 VLT. I can imagine how Arista MLAG or Juniper Virtual Chassis work.
    
    In static etherchannel i need hash algorithm anyway because it is channel of links. If I have switches where I can use static etherchannel then switches usually also support LACP. LACP has lot of benefits against static port channel.
    
    I have to use virtual distributed switch 5.1 for VXLAN anyway and it supports LACP. So I don’t see any benefit how static etherchannel help me against LACP and why to choose this option.
    
    Thanks for this discussion anyway.
    - Lee Christie says
      
      11 September, 2013 at 11:46
      
      I’m not sure there are many differences/benefits to either to be honest. They are different, but the end result is nearly the same. You won’t get better throughput or balancing I think.
      
      http://wahlnetwork.com/2012/05/09/demystifying-lacp-vs-static-etherchannel-for-vsphere/
      
      The decision on which to use probably has more to do with your current setup and your network administrators experience. LACP is relatively new to VMware I think so some might just stay “stick to whats tried and tested”. Others say that etherchannelling is easy to get wrong and create switching loops. Some might say that LACP is an added overhead or something extra to go wrong.
      
      Horses for courses and YMMV territory.
- Duncan Epping says
  
  11 September, 2013 at 08:59
  
  5-tuple hash is not a requirement but a recommendation. I have updated my post to reflect that. I have also asked one of our VXLAN/VCD guru’s to respond to this thread. He is part of the Network BU as a PM, authoritative source, and he is going to get all docs updated accordingly,
Ray Budavari says

12 September, 2013 at 01:18

Duncan is correct – the use of 5-tuple hashing with LACP is a recommendation for VXLAN but definitely not a technical requirement. It is recommended because it will provide improved distribution across all uplinks by using the additional entropy of the source UDP port rather than just the source and destination IP with a 3-tuple hash.

Thanks for highlighting this inconsistency, the official product documentation and KB articles will be updated as a result.

Related

Reader Interactions

Comments