I was asked a question on LinkedIn about the different virtualization networking strategies from a host point of view. The question came from someone who recently had 10GbE infrastructure introduced in to his data center and the way the network originally was architected was with 6 x 1 Gbps carved up in three bundles of 2 x 1Gbps. Three types of traffic use their own pair of NICs: Management, vMotion and VM. 10GbE was added to the current infrastructure and the question which came up was: should I use 10GbE while keeping my 1Gbps links for things like management for instance? The classic model has a nice separation of network traffic right?
Well I guess from a visual point of view the classic model is nice as it provides a lot of clarity around which type of traffic uses which NIC and which physical switch port. However in the end you typically still end up leveraging VLANs and on top of the physical separation you also provide a logical separation. This logical separation is the most important part if you ask me. Especially when you leverages Distributed Switches and Network IO Control you can create a great simple architecture which is fairly easy to maintain and implement both from a physical and virtual point of view, yes from a visual perspective it may be bit more complex but I think the flexibility and simplicity that you get in return definitely outweighs that. I definitely would recommend, in almost all cases, to keep it simple. Converge physically, separate logically.
Darren says
I always reply that I’ll take whatever network ports you’re willing to give to my hosts, as extra network ports are never a bad thing. That is to say that you can never have too much bandwidth. All my hosts are at remote sites, and some are where a regional technician might have to drive a couple hundred miles to get to. So having a 1gb network just as stand by is helpful in case of a failed or misconfigured 10gb switch, or for temporary use during major network migrations, etc. The 1gb ports are generally not much more cost if your hosts have the availability to add them. In newer dense blade chassis they seem to be going all 10gb, however.
Duncan Epping says
in terms of bandwidth, have you monitored your daily usage? I don’t see too many customers reaching their limits, or even 10% of their capacity 🙂
John Nicholson. says
Duncan your forgetting that Networking teams have capital budgets to spend. You can’t call them out on 2% utilization! On a more serious note, its bizzare how many UCS environments I see with 3-5 hosts designed around trying to justify buying a pair Nexus 7K. The switch vendors stock price would be cut in half overnight if senior management realized how badly overcapitalized networking infastruture is. Hopefully SDN and NFV will fix this, but I’m not very hopeful as this is really a people and management visibility problem.
Duncan Epping says
LOL
Lee says
We use 2 x 1G for management on a basic vSwitch, single access VLAN. We then use 2 x 10G for everything else through vDS with bells and whistles enabled. Although we’ve never needed that safety net, we kinda like the fact that we will always be able to get access to a host or vCenter even if the vDS plays up in some way.
mb says
We do the exact same thing, even if many Vsphere bloggers say it’s safe to use vDS all the way…
Duncan Epping says
Yes, if you are concerned about that, then using a “2x1Gbps” pair just for management is a good solution.
DWP says
The real question 😉 is what to do with our new blade farm with 4x 20Gb. Due to regulations, we must use 2 for all management traffic and 2 for VM traffic. With a nice 420Gb leaf/spine behind it, we are having difficulty pushing line rate tests. For the time being, highly regulated environments will maintain “physical” separation between management and VM traffic.
mwilmsen says
I totally agree with you. But what about LAG? Will you go for 2 x 10 Gbps or 1 x 20 Gbps?
John Nicholson. says
Link Agregation is a lot dumber than people realize. A single TCP session end to end will never use 2 links.
Ariel says
Well depending… Lacp and static lag will give you a load balancing across the aggregated Nics,based on ip or Mac. When you have many guest os’s this works in it’s favor.
So you can calculate based on the algorithm.
You can employ strategies to give you balance across the aggregated Nics.
However I think it balances out in the end. And as we know the links are not usually saturated. Main benefit is resliency and availability.
But your right people assume that they add the bandwidth of all the Nics.
richpo says
Duncan,
I wonder if that linkedin message was from one of the folks I was having a discussion with on why they needed to change their design 😉
Last month I started a discussion, with one of our “partners” with their networking design that not only mixed 1gb with 10 Gb but they also were adamant about physical separation of Test /DEV,Prod and Backup traffic. It was much worse than just the management network going on 1 Gb. They wanted to put vMotion on 1 Gb Network. With these hosts having 1 TB of memory, can you imagine how long it would take to vMotion the VMs on a single 1 Gb link ? After much resistance I was finally able to get them to accept tagging all the traffic on all ports with the exception of the backup network. (I was overruled by someone above me which didn’t make any sense because they will be backing the VMs up over fibre). It was really sad to see someone pitch a networking design like that with really no understanding or justification and no way to defend on why they were doing it that way.
There’s still so many BAD networking designs that people do for VMware. If you are using 10 Gb and there are no silly physical separation policies (and there is not at were I work. If there are fight them if you can), tag all you networks that you need and add them to all the 10 Gb ports. As Duncan said this simplifies your design and makes it easier for the networking guys to do the work correctly in the first place. Use the VDS (enterprise +) and turn on your health check to verify all vlans are tagged correctly on each port. It also make it easier for the next guy who has to take over someday. Obviously there’s more to it than this, but setting up the physical network this way gives you the flexibility to logically redesign the network without having to “re-wire” the ESX servers.
Wish me luck on my next discussion on why we need separate VMware farms for DMZ access.
-Rich
DWP says
Good luck. Fought the DMZ fight before – many scars and nightmares.