Revised: Service Console Redundancy

I have been requested by several people to do an update of my original Service Console Redundancy article. Although personally, I am still of the opinion that the three options stated in the article are still valid I have rewritten them and dropped one option, as now a days the majority of companies now have a decent infrastructure with vlan’s.

Both configurations are supported by VMware and each has their own pros and cons:

  1. Requirements: 2 physical nics, vlan’s and vlan trunking:
    • vSwitch0: 2 Physical nics(vmnic0 & vmnic2)
    • 2 Portgroups(Service Console & VMkernel)
    • Service Console active on vmnic0 and standby on vmnic2
    • VMkernel active on vmnic2 and standby on vmnic0

    Each portgroup has a vlan ID assigned and runs dedicated on its own physical NIC, only in the case of a fault it’s switched over to the standby NIC. It will return to the original NIC when the physical NIC is up and running again. This is achieved by setting “Failback” to Yes!

    Pros: Only need 2 NICs in total for the Service Console and VMkernel, especially useful in Blade environments. Simple setup.

    Cons: If the connection is dropped several times it will cause the NIC to fail-over each time which can cause HA to kick in eventually. You will need to set the Failure Detection Time (das.failuredetectiontime) to 60 seconds opposed to the default of 15 seconds. (According to VMware best practices.)

  2. Requirements: 3 physical NICs, vlan’s and vlan trunking:
    • vSwitch0 – 1 Physical NIC(vmnic0) – 1 Portgroup(Service Console)
    • vSwitch1 – 2 Physical NICs(vmnic1 & vmnic3)
    • 2 Portgroups (VMkernel & Secondary Service Console)

    The primary Service Console runs dedicated on a physical NIC, vmnic0, with a vlan assigned on either the physical switch port or the portgroup. (I would prefer portgroup for consistency).

    The second vSwitch, vSwitch1, will run the VMkernel active on vmnic1 and standby on vmnic3. The secondary Service Console will be active on vmnic3. For the secondary Service Console I would prefer to set the vmnic1 to “unused”, this way you are 100% sure that the Service Console will run only on vmnic0 and vmnic3.

    Pros: a lower “failure detection time”(das.failuredetectiontime) can be defined this is because of the fact that the service console is already active. Here the Failure Detection Time can be set to 20 seconds, further no Spanning Tree problems will occur as the setup contains two service consoles, and subsequently 2 MAC addresses. (This is also the reason why I would set vmnic1 to unused for this portgroup)

    Cons: Need to set extra isolation addresses(das.isolationaddress), and secondary Service Console preferably in a different subnet.

Thanks goes out to mr Tom Howarth for a sanity/grammar check.

Be Sociable, Share!

    Comments

    1. Tom says

      Would this config work well with 4 pNICs??

      Service Console, vmotion, vmkernel portgroups to pnic0 to vswif0 to vswitch0, 172.16.10.x/24 subnet, its own VLAN number for VST

      Service Console, vmotion, vmkernel portgroups to pnic1 to vswitch0, 172.16.11.x/24 subnet, its own VLAN number for VST

      The above is for failover etc.

      AND

      VM network to pnic3 to vswitch1
      VM network to pnic4 to vswitch1
      pnic3 and pnic4 teamed up in ESX
      assumed to be on the HP switch’s Default_VLAN

      What are your recommendations in this configuration for the various HA settings that you mention above??

      I’m still trying to figure out the HP switch trunking etc., hopefully that will work itself out with trial and error?????

      Thank you…

    2. Jim says

      What does option #1 look like in the Virtual Center networking section per se??

      I daresay it would be easier for people if you posted diagrams of their setup, it would be easier for people new to ESX.

    3. Hannes says

      Option 1:
      What is the advantage of having SC active on vmnic0 and VMKernel on vmnic2? Why not making both vmnics active for both portgroups?

      I can not currently test this, but I can remember from an earlier test, that when a NIC fails in an active/active setup the failover was transparent (i.e. no Ping lost). In an active/passive scenario however you could always see packet loss when pinging until the standby NIC has taken over, which could be enough for HA to kick in …

    4. says

      If you set it to active/active you could end up with having both on the same vmnic which I wouldn’t prefer.

      if you miss a ping it will trigger HA. it will retry to ping the isolation address. And if you set das.failuredetectiontime to 60 seconds, which is the best practice, the chance of having a false positive is really small.

    5. Helle Thomson says

      Additional information regarding the article written by Med Yones, a global business advisor, states that according to American management association “about 50% of businesses that suffer from a major disaster without a disaster recovery plan in place never reopen for business”. It shows that there should be better practices on BC/DR. The writer proposes “sponsor and launch BC/DR project management initiative to guarantee alignment, integration and quality results of your BC/DR activities with business goals” as one method of BC/DR Management best practices. More proposals can be found at

      http://www.iim-edu.org/executivejournal/WhitePaperBusinessContinuityDisasterRecoveryBCDRBestPractices.htm

    6. J Stone says

      VMware VirtualCenter 2.5 Update 3 Release Notes

      Suppress “No management Network Redundancy” warning

      This release introduces an option to suppress the warning message “Host xxx currently has no management network redundancy” for a host configured as a node in an HA cluster. Set the advanced option das.ignoreRedundantNetWarning to “true” to suppress the warning on hosts not configured in an HA cluster. If the warning appears for host already configured in a cluster, set the option and reconfigure HA on that host to clear the configuration issue.

    7. esx4u says

      Duncan,

      What address i can use for the das.isolationadress2?

      I want to configure a second service console on a a seperate physical switch.
      First i want to know if i have to configure a dasisolationaddress1 for the 1st servcice console. I’ve read that if u configure a dasisolationadress2 u automatically have to configure a dasisolationaddress for the first service console. By default HA pings the default GW of the first service console. Does the address of the 2nd service console have to be an address that is in the same subnet of the first service console address?

    8. says

      In the case where we’re using iSCSI and NFS, does it make sense at all to ping the SAN IP as the second das.isolationaddress? In this case, if we can’t see the storage, the locks expire and we restart on another host. FYI, watch out for the old NetApp practice of disabling NFS locking!

    9. says

      Yes it would make sense. especially when you’ve already got a secondary service console setup for iSCSI purposes!

    10. Cor Meurs says

      Duncan,

      We are having problems with the following configuration

      nic0 –>VSwitch0 –> ServiceConsole1

      nic1 +nic2—->vSwitch1—> Virtual machine portgroups vlans 2000- 4000 (trunk based on IP-hash)

      nic3 +nic4 +nic5 —–> vSwitch2 —IPStorage trunk based on IP-Hash

      For reduncancy I added a extra Service Console2 to vSwitch1

      But as soon as we disable nic0 of Service Console1 the connection to the host is lost.

      I have added the das.isolation2 to the advanced properties. I have changed the native vlan vor the Service console, but still no success.

      It look like the second service console is unavailable

      To you have a idea where the problem is?

      Cheers

      Cor

      • says

        What do mean with the connection is lost? From a vCenter perspective? As that would be correct. The secondary service console is only used for HA purposes and not to reconnect to vCenter.