• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

Distributed vSwitches and vCenter outage, what’s the deal?

Duncan Epping · Feb 8, 2012 ·

Recently my colleague Venky Deshpande released  a whitepaper around VDS Best Practices. This white paper describes various architectural options when adopting a VDS only strategy. A strategy of which I can see the benefits. On Facebook multiple people made comments around why this would be a bad practice instead of a best practice, here are some of the comments:

“An ESX/ESXi host requires connectivity to vCenter Server to make vDS operations, such as powering on a VM to attach that VM’s network interface.”

“The issue is that if vCenter is a VM and changes hosts during a disaster (like a total power outage) and then is unable to grant itself a port to come back online.”

I figured the best way to debunk all these myths was to test it myself. I am confident that it is no problem, but I wanted to make sure that I could convince you. So what will I be testing?

  • Network connectivity after Powering-on a VM which is connected to a VDS while vCenter is down.
  • Network connectivity restore of vCenter attached to a VDS after a host failure.
  • Network connectivity restore of vCenter attached to a VDS after HA has moved the VM to a different host and restarted it.

Before we start I think it is useful to rehash something, which is different types of portgroups which is described in more depth in this KB:

  • Static binding – Port is immediately assigned and reserved for it when VM is connected to the dvPortgroup through vCenter. This happens during the provisioning of the virtual machine!
  • Dynamic binding – Port is assigned to a virtual machine only when the virtual machine is powered on and its NIC is in a connected state. The Port is disconnected when the virtual machine is powered off or the virtual machine’s NIC is disconnected. (Deprecated in 5.0)
  • Ephemeral binding – Port is created and assigned to a virtual machine when the virtual machine is powered on and its NIC is in a connected state. The Port is deleted when the virtual machine is powered off or the virtual machine’s NIC is disconnected. Ephemeral Port assignments can be made through ESX/ESXi as well as vCenter.

Hopefully this makes it clear straight away that their should be no problem at all, “Static Binding” is the default and even when vCenter is down a VM which has been provisioned before vCenter went down can easily be powered on and will have network access. I don’t mind spending some lab hours on this, so lets put this to a test. Lets use the defaults and see what the results are.

First I made sure all VMs were connected to a dvSwitch. I powered of a VM and checked the “Network settings and this is what it revealed… a port already assigned even when powered off:

This is not the only place you can see port assignments, you can verify it on the VDS’s “ports” tab:

Now lets test this, as that is ultimately what it is all about. First test, Network connectivity after Powering-on a VM which is connected to a VDS while vCenter is down:

  • Connected VM to dvPortgroup with static binding (is the default and best practice)
  • Power off VM
  • Power off vCenter VM
  • Connect vSphere Client to host
  • Power on VM
  • Ping VM –> Positive result
  • You can even see on the command line that this VM uses its assigned port:
    esxcli network vswitch dvs vmware list
    Client: w2k8-001.eth0
    DVPortgroup ID: dvportgroup-516
    In Use: true
    Port ID: 137

Second test, Network connectivity restore of vCenter attached to a VDS after a host failure:

  • Connected vCenter VM to dvPortgroup with static binding (is the default and best practice)
  • Power off vCenter VM
  • Connect vSphere Client to host
  • Power on vCenter VM
  • Ping vCenter VM –> Positive result

Third test, Network connectivity restore of vCenter attached to a VDS after HA has moved the VM to a different host and restarted it.

  • Connected vCenter VM to dvPortgroup with static binding (is the default and best practice)
  • Yanked the cable out of the ESXi host on which vCenter was running
  • Opened a ping to the vCenter VM
  • HA re-registered the vCenter VM on a different host and powered it on
    • The re-register / power-on took roughly 45 – 60 seconds
  • Ping vCenter VM –> Positive result

I hope this debunks some of those myths floating around. I am the first to admit that there are still challenges out there, these will hopefully be addressed soon, but I can assure you that your virtual machines will regain connection as soon as they are powered on through HA or manually… yes even when your vCenter Server is down.

 

Related

Server 5.0, dvswitch, vcenter, vds, vSphere, vswitch

Reader Interactions

Comments

  1. Steffan Røstvig says

    8 February, 2012 at 16:00

    Using static binding. vCenter crashes. ESXi boots without access to vCenter. Will you be able to powerup your VMs and have network access?

  2. forbsy says

    8 February, 2012 at 16:16

    Nice article. One caveat would be with linked clones. It’s a best practice to set the dvPortGroup Port Binding to either Dynamic or Ephemeral (no binding) – or you could end up with errors on refresh, recompose or rebalance:

    http://myvirtualcloud.net/?p=1012

    If an event occurs that brings down vCenter and the virtual desktops, just fire up vCenter and then the virtual desktops (since vCenter has a static binding).

  3. Carlos says

    8 February, 2012 at 16:25

    A great article. I’ve had this debate many times before and I feel most VMWare admins stick to the standard switch for MGMT because they don’t fully understand the vDS. In my opinion why implement something you obviously don’t fully trust!

  4. Tomi Hakala says

    8 February, 2012 at 16:31

    Duncan, excellent topic! I just completed an vSphere 5 What’s New training delivery and we had discussion about this exact topic.

  5. Duncan Epping says

    8 February, 2012 at 16:44

    @Steffan: Yes even if the ESXi host went down and vCenter is still down and the ESXi host comes up will your VMs regain network connection when you power them on.

    @Forbsy: vCenter is not needed for Ephemeral.

  6. John says

    8 February, 2012 at 16:44

    I have seen this issue with dVswitches and have been concerned about it to the point that I only put vCenter onto a standard vswitch. The issue arises with a catastrophic failure such as a power outage where all hosts go down. The dVswitch configuration is cached on each host and unless vCenter is up and running the hosts cannot pull down the dVswitch config. So if vCenter is on a dVswitch port group it does not regain network access unless you connect directly to a host and place it on a standard vswitch. Is it possible that there have been changes made in vSphere 5.0? I have not yet seen this issue on 5.0 but only have a few implementations completed and I’ve always put vCenter on a standard switch. I have seen this issue several times with 4.1.

  7. Jason Nash says

    8 February, 2012 at 16:45

    Good stuff Duncan. This WAS an issue in vSphere 4.1. I had my lab be unable to recover after a total outage a couple of times with the vDS. All networking on the VMs, including vCenter, would show “Invalid Backing” and would not boot until flipped to a standard vSwitch. I’ve tried several times to reproduce this with 5.0 and haven’t been able to do it. Not sure if it was truly a problem or something in my environment but I, and several others I know, hit this in 4.1. Now with 5.0 I have no problems recommending virtual vCenter and vDS.

  8. Gabrie van Zanten says

    8 February, 2012 at 17:04

    dvSwitch config is ALSO stored on the datastore accessible to the ESXi host

  9. Duncan Epping says

    8 February, 2012 at 17:10

    I have also seen that in the past Jason, but also have never been able to reproduce this with 5.0.

  10. Thomas Ruth says

    8 February, 2012 at 19:09

    To properly test, you need to do the following:

    1) Power off vCenter Server
    2) Connect vSphere client to the host that has the vCenter Server registered on it.
    3) Remove the vCenter Server from inventory
    4) Connect the vSphere client to a different host in the cluster.
    5) Add the vCenter Server to inventory
    6) Attempt to power on the vCenter Server.
    7) See if it works.

    This will simulate a host failure in which HA re-registers vCenter on a new host.

  11. Duncan Epping says

    8 February, 2012 at 19:31

    @Thomas: I even pulled the power cord from my host that was running vCenter… it came back up within a minute literally, with full network connection. No problem at all.

  12. Ben Thomas says

    8 February, 2012 at 19:33

    As Thomas mentions the issue is when vCenter moves to another host (via HA or manually). Provided that it is staying on the same host then there will not be an issue.

  13. forbsy says

    8 February, 2012 at 19:57

    Duncan. What I’m saying is that it’s a bst practice to have Ephemeral ports for linked clones – according to that blog. If you decide to have Ephemeral ports and an event occurs that brings your linked clones AND vCenter down, how will those linked clones get restarted?
    You stated at the top of your blog:

    An ESX/ESXi host requires connectivity to vCenter Server to make vDS operations, such as powering on a VM to attach that VM’s network interface.

    If vCenter is down how will the linked clones get powered on if the ephemeral port(s) have been deleted during the poweroff?

    While I understand vCenter is not required to create Ephemeral ports, my point had to do with what happens if your VM (linked clones in this case) has the Ephemeral port and gets powered off – along with vCenter.

  14. Loren says

    8 February, 2012 at 20:51

    I’m still curious about what happens if someone modifies the VLAN # on the distributed port group connected to the ESXi hosts’ management interface.

  15. Joerg Bold says

    8 February, 2012 at 21:10

    Duncan,
    good article. Is the new advanced option “autoExpand” described in the KB-Article experimental or could it be used in production?

  16. Duncan Epping says

    8 February, 2012 at 21:17

    @Ben: Just tested that as well 3 times by literally killing the host which has vCenter running on it. Works as a champ.

  17. Duncan Epping says

    8 February, 2012 at 21:31

    @Forbsy: That is not my text, it is a comment from Facebook. I am saying that what they are claiming is simply not true. Ephemeral ports CAN be created on an ESXi layer. Eric Gray wrote about that one also a while back: http://www.vcritical.com/2011/05/the-secret-of-ephemeral-port-groups/

  18. Mike says

    8 February, 2012 at 22:12

    We never had the debate but assumed it is fine. We did however have an outage and we did end up with a problem. vDS and port groups were created using default setting (apart from changing the failover to IP Hash due to us using channel groups on Cisco switches) but when the host with the vCenter crashed we ended up with network connectivity problems. VMs could not be powered on as for some reason multiple VMs ended up sharing the same port and had to be changed manually in order to power on those VMs. Might be just one of those things but for now the management decided to run vcenter on a physical server as a result. We even re-tested on a new cluster and vcenter in conjunction with vcloud director. We even lost vApps and experienced the same ‘port’ issue.

  19. forbsy says

    8 February, 2012 at 23:25

    Thanks for clearing that up Duncan. Looking at Eric Gray’s post (as you mentioned above) Ephemeral ports should also be a valid selection – but would require a bit more work to bring the VM back online.

  20. Dave Convery says

    9 February, 2012 at 00:10

    WOW! Two lifesavers in one week! Ever since reading Jason Boche’s post a few years ago ( http://www.boche.net/blog/index.php/2009/10/09/virtualizing-vcenter-with-vds-catch-22/ ) and testing his scenario, I have recommended vSS for VMK ports.

    I have always wanted to test it in vSphere 5, but didn’t get around to it.

    Thanks again!
    Dave

  21. Dave Convery says

    9 February, 2012 at 00:13

    @forbsy

    I always recommend vCenter Heartbeat for View if Composer is being used. It would eliminate any vDS issues. I also recommend it with vCloud Director and anything else that depends on vCenter being up to serve the end users in any fashion.

    Dave

  22. forbsy says

    9 February, 2012 at 00:22

    Dave. Heartbeat seems a little over the top to me if the intent is to ensure the Composer service (running on vCenter) is always up. To me, HA should be good enough to restart vCenter on another host and the composer service would only be down for the time it takes for vCenter to restart.
    Shouldn’t be a big deal for VMware View though. It would just mean you cannot create linked clones or do recomposes or refreshes as long as the composer service is down – but the desktops will still be running fine.

  23. Matt Meyer says

    9 February, 2012 at 02:31

    I have been able to reproduce the “Invalid Backing” on ESXi 4.1 and 5.0 but you have to lose storage as well.

    To reproduce:
    Power off the entire infrastructure (including the SAN)
    Power on the ESXi servers
    Power on the storage
    vCenter VM should have an invalid backing

    No matter what you do, you cannot connect the vNIC to the vDS
    Now if you reboot the ESXi host AFTER the storage is available, everything works just fine.

    It seems like the ESXi host only tries to sync the vDS state with the storage at boot and never again afterward. You would think that it would keep trying, but it does not.

    Moral of the story, if you have a site outage, bring up the storage before the hosts. This was just my testing, but I don’t see above where the storage was taken offline as well.

  24. Duncan Epping says

    9 February, 2012 at 08:28

    @Matt: Maybe it is me… but why on earth would anyone turn on their infrastructure in the wrong order after a catastrophic power outage like that? Yes, VDS would probably have a problem then :-s

  25. Michael Webster says

    9 February, 2012 at 11:14

    I’ve experienced the same issue as Matt. There are multiple reasons why storage would be down at boot time or at the time HA kicks in, such as with iSCSI NIC failure across entire clusters. Which I have seen multiple times with certain hardware. There are also plenty of configuration and change issues that could cause storage to not be available to the host at boot. So the moral of the story is you must have storage at boot time else vDS will not function if vCenter is also down. This is unless VM is using ephemeral ports. Can also be easily resolved by fixing storage and then rebooting the hosts again. No need to go back to vSS. Once you know this issue it’s easy to resolve, design around and limit. It should be very rare to have this perfect storm of faults.

  26. Tom Miller says

    9 February, 2012 at 20:30

    Thanks Duncan, always appreciate your work. Done these test an agree with your results. My concern with vDS as a whole is a total dependency on a healthy vCenter DB. Yes, vCenter can be down for a while and VM’s can shutdown, reboot, and reconnect to the network. That’s how we do an in-place vCenter upgrade from 4.1 to 5, right. The problem is if the vCenter DB gets corrupted we are in trouble. I know, we should have DB backups and I’m sure some folks do. But, if we are honest how many of us perform vCenter DB backups? If vCenter needs to be rebuilt because of DB corruption I think we are going to have big issues? To me the important issue is vCenter DB backups.

    Interested in your thoughts?
    Tom

  27. Jeremy Pollack says

    9 February, 2012 at 21:15

    Thanks Duncan, appreciate you posting your results.

    The one case that I have experienced that you did not test for was where we had a host fail that was hosting vCenter, and HA was not enabled. After manually bringing the guest back by opening the vmx on another host, I was then unable to get connectivity because the newly added guest could not find/add the vDS network.

    This was in vCenter 4.1, and I have not attempted to replicate this test in 5.0, or with any of the newer redundant solutions/options.

  28. Ray says

    9 February, 2012 at 23:51

    Why would one NOT backup his database or even better use a DB cluster. Why would one NOT enable HA? If there is an issue with your storage, wouldnt that be the first problem to solve?

  29. Matt Meyer says

    9 February, 2012 at 23:59

    @Duncan. Absolutely agreed. A simple operational procedure would prevent such an issue. I’ve had the issue come up only once ever myself.

  30. Usulsuspct says

    10 February, 2012 at 01:34

    So what’s happening here that allows vms to power on and receive a vds port while not having access to vcenter? (assuming static binding)

    Is it that the vds info and port group assignment that occurred at initial vm provisioning is saved to the data store?

    Is this a new function of vsphere 5?

    Can anyone point to documentation of the autoexpand property of static vds port groups and wether or not it is officially supported? Weary of enabling this in prod without it being a blessed option (although I can see benefit in my environment to having that functionality.)

    Thanks Duncan – applaud your contributions!

  31. yipperzz says

    10 February, 2012 at 04:54

    Is it any different for the 1000v? We took vCenter down and couldn’t assign a VM to any networks from the 1000v. They were all missing. We’re still on 4.1 update 2. I found it funny that it happened today since I read your entry just the other day.

  32. Duncan Epping says

    10 February, 2012 at 08:31

    @ Usulsuspct: I will answer that in a blog article soon. Working on it as we speak. ( static binding)

    I will ask about the auto-expand feature if this is supported or not. Will also answer that in the blog article.

  33. Andy says

    10 February, 2012 at 16:45

    Nice article Duncan…

    But does this change, that you still need a ephemeral port group for disaster recovery?
    If you have to recreate your vCenter or only have to re register it – you will not be able to get it online again. (Without creating a vSwitch)

    Am i still right with this on 5.0?

  34. kepler says

    11 February, 2012 at 23:22

    I’ve had a similar issue as Jeremy Pollack, in my Test lab, where the vCenter was trashed and attempting to create a new VM to replace it hit a brick wall as I wasn’t able to add the VM to the vDS. I didn’t have any other option but to switch back to a vSS to allow me to get the VM network connectivity. Unless there is some other option I didn’t consider at the time?!?!?

  35. kepler says

    11 February, 2012 at 23:31

    I forgot to mention, I was NOT using ephemeral ports when the vCenter failure occurred

  36. Katelin says

    24 February, 2012 at 06:43

    Good article but I have run into this problem with version 5. Perhaps you can tell me what is wrong.

    vCenter is a VM and I have a two node cluster and everything is set to static binding just like in your article.

    vCenter is shutdown and all of the ESXi hosts were powered down, as we could do it on the weekend without problem.

    However when the ESXi hosts came back up it was impossible to get vCenter or any other VM to come up on the network.

    We had to create a regular vSwitch, boot vCenter, and then move it back onto the distributed switch.

    What happens if you should down vCenter and then power down all of the ESXi hosts. Can you get vCenter or other VMs to come up in your lab?

  37. Chris says

    2 May, 2012 at 18:36

    So.. This article is geared specifically for vSphere 5.0? Or does this apply to 4.1( Update 2 (ESXi hosts))?

    We run our vCenters as vm’s. I’d like to move to physical if I have to worry about an outage that cannot be rectified by HA in a few minutes.. We are running hybrid.. mgmt and vmotion are on standard switches and vm traffic on vDS.

    Thanks for this article!

  38. Davide Sitta says

    24 September, 2012 at 00:06

    great article! but now we are facing that issue: what about this:

    http://www.virtualizationteam.com/virtualization-vmware/vsphere-virtualization-vmware/virtual-machines-vds-network-interfaces-configuration-in-vmx-file-is-lost-upon-removing-it-from-inventory.html

    In other words is there a workaround for the missed vDS in vmx deal?
    any idea???
    bye
    vSitta

    • Duncan says

      24 September, 2012 at 01:18

      So you have a scripted DR solution? If so, what kind of script language?

  39. Cory says

    23 October, 2012 at 22:55

    @Duncan: How would things proceed with VDS, if say your VC DB got totally hosed and you didn’t have a backup? Situation is client running VC with sql_express and VM got trashed with no backup and they were using VDS.

  40. Gene says

    31 October, 2012 at 18:10

    Great write up and test! My impression was that when using VDS, it still creates hidden standard switches on each host, so technically VMs would still be able to power on and get onto the network. Is that a true statement?

  41. Eiad Al-Aqqad says

    15 November, 2012 at 21:24

    @Davide Sitta I have just noticed your comment about my blog post. I thought just to make sure to point out that vCenter Site Recovery Manager was built to handle that. If you are using SRM then you got nothing to worry about. Though if you are doing a manual scripted failover then you have to take care of it in your script.

    Thanks,
    Eiad

  42. Vikas Dubey says

    27 August, 2013 at 11:29

    Hi ,

    I am facing some issue with DVswitch.
    1: I have two node cluster ESXi 5.1 Cluster.
    2: vCenter 5.1 is running on VM.
    3: I restart the ESXi Host in which Vcenter & other VMx are running
    4: I got the ping of all my VMs Except the Vcenter.

    Please suggest any possible solution.or shall i use vSwitch

  43. Raj says

    18 September, 2013 at 16:30

    Hi

    I am planning to move a vCenter 5.0 running on a physical server to a Virtual Machine , keeping the same IP address, we have v1000 for all the networking. I am planning to restore the vCenter Database to the new machine, including ADAM database , and install same version of vCenter software. Will the nexus v1000 connect to the new vCenter ( same IP, same version, same database ) without any issues? Please advise.

    Thanks,

  44. Ray Garcia says

    8 October, 2013 at 18:45

    Duncan, this is a great read. In which sort of mimics what is happening to us now. We have 4 Hosts running ESX 4.0. Recently our vcenter server crashed. We removed it from inventory and readded the vmx from a restore. Now their is no way to add a Network Label. I have even moved it to a different Host with the same results. The latest was to add a new VM to another Host and when we get to the Networking it also states “the host does not have any virtual machine networks.” All other VM’s are working fine, and I can ping the other hosts from each host. Any ideas?

  45. Aivariokas says

    18 February, 2014 at 12:54

    Hello i also have a question aboud vDS, but it is about more about migration.. Lets say i dont want to migrate vcenter with its old SQL database ( database is on express 2005 ) .. I want to install fresh new and clean server.. If i disconnect esx from old vcenter and connect to new one will vm’s still can work… Do i need to reconfigure vDS switch./?

  46. Bist says

    18 March, 2014 at 13:35

    This is exactely what happens on a Nutanix block. CVM doesn’t come online because there is no way to connect to the dvs. If the CVM doesn’t come online there is no shared storage on the nutanix block so port assignement on the dvs is not possible. So yes, ,there exist a situation wheren the “san” is not line when vcenter starts.

    • @vcdxnz001 says

      18 March, 2014 at 22:08

      Disclosure: I’m a Solutions and Performance Engineer for Nutanix. This bug has now long since been fixed by VMware. The communications between hosts and the CVM is on a private network with no uplinks, so the dvs is not going to impact that. The links between the other CVM’s however are on the storage network. From vSphere 5.0 U2 and 5.1U1 this bug was fixed and it is no longer the case that a loss of access to storage is going to nuke your VM’s connecting to the network. If you were concerned about it still for some reason you could use ephemeral port binding for the CVM’s instead of static port binding, for the ports that are on the dvs. Also ephemeral port binding may be recommended for the vCenter itself, if it’s running in the same cluster.

      So in a Nutanix environment you’re not going to loose the ability for your VM’s to communicate on the network, even if the entire cluster is stopped and restarted. I tested this extensively and I run dvs exclusively in my lab environment. If you have a specific error or situation where you think there is something related to a dvs then you need to raise that as a support request.

      • Bist says

        20 March, 2014 at 17:30

        hi, indeed. I should have stated that we are running 5.1. Sorry for the confusion. And thanks for the answer 😉

  47. Matt Tehonica (@vmMattT) says

    10 April, 2014 at 19:17

    I know this is an old article, but it’s still very pertinent. We have been able to consistently reproduce the “invalid backing” issue that Jason described above in our 5.1 environment. Backing up and restoring a VDS from Windows vCenter Server 5.1 to Linux vCenter Appliance 5.5 causes the VDS to show no uplinks to connected to it from the hosts, VMs show invalid backing, but everything communicates fine and no outage is caused. It also causes an error relating to proxy switches not existing.

  48. Michael Cook says

    28 April, 2014 at 17:24

    I have been able to reproduce the invalid backing unfortunately not in a test environment. We had to shut down all the hosts in our cluster for a datacenter outage. When we brought them back up the host that vCenter was on before the power outage failed to boot. We have HA enabled but it did not move the VMs off that failed host when the other hosts booted up. I am assuming this is due to HA requiring vCenter if hosts were offline. We had to add the vCenter vmx to inventory on a healthy server but at that point the network adapter shows the correct dvSwitch and a port but is disconnected. Trying to connect it causes an invalid backing error. All the VMs on the live hosts can connect via the dvSwitches but not the vCenter VM that was manually moved. I think this is a unique failure scenario but one that needs to be planned for.

  49. Mitra says

    30 July, 2014 at 08:26

    Dear Sir ;

    When we remove ESXi Host from vCenter Inventory , the Host is also removed from vDistributed Switch configuration , after readding Host in vCenter Inventory the Host was unable to connect with previous vDS Switch configuration automatically ..

    Please shared Any solution to avoid this

    Regards;
    Mitrajeet H.

  50. Johannes says

    11 September, 2014 at 16:15

    Hi,

    We have the same problem here with 5.5 Update1: full power shutdown of all servers and no dvPort can connect after starting the environment again.

    Has anyone opened a service request?

    BR Johannes

  51. Michael Cook says

    11 September, 2014 at 17:24

    The solution we found was to create a new dvportgroup with ephemeral ports. To this dvportgroup we attached the vcenter, DCs and SQL servers hosting vcenter databases. We tested this in our lab environment by replicating the issue I mentioned in my comments above and then proving ephemeral ports resolves this issue.

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive", the “vSphere Clustering Technical Deep Dive” series, and the host of the "Unexplored Territory" podcast.

Upcoming Events

May 24th – VMUG Poland
June 1st – VMUG Belgium

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2023 · Log in