Revised: Service Console Redundancy

Duncan Epping · Feb 17, 2009 ·

I have been requested by several people to do an update of my original Service Console Redundancy article. Although personally, I am still of the opinion that the three options stated in the article are still valid I have rewritten them and dropped one option, as now a days the majority of companies now have a decent infrastructure with vlan’s.

Both configurations are supported by VMware and each has their own pros and cons:

Requirements: 2 physical nics, vlan’s and vlan trunking:
- vSwitch0: 2 Physical nics(vmnic0 & vmnic2)
- 2 Portgroups(Service Console & VMkernel)
- Service Console active on vmnic0 and standby on vmnic2
- VMkernel active on vmnic2 and standby on vmnic0
Each portgroup has a vlan ID assigned and runs dedicated on its own physical NIC, only in the case of a fault it’s switched over to the standby NIC. It will return to the original NIC when the physical NIC is up and running again. This is achieved by setting “Failback” to Yes!

Pros: Only need 2 NICs in total for the Service Console and VMkernel, especially useful in Blade environments. Simple setup.

Cons: If the connection is dropped several times it will cause the NIC to fail-over each time which can cause HA to kick in eventually. You will need to set the Failure Detection Time (das.failuredetectiontime) to 60 seconds opposed to the default of 15 seconds. (According to VMware best practices.)
Requirements: 3 physical NICs, vlan’s and vlan trunking:
- vSwitch0 – 1 Physical NIC(vmnic0) – 1 Portgroup(Service Console)
- vSwitch1 – 2 Physical NICs(vmnic1 & vmnic3)
- 2 Portgroups (VMkernel & Secondary Service Console)
The primary Service Console runs dedicated on a physical NIC, vmnic0, with a vlan assigned on either the physical switch port or the portgroup. (I would prefer portgroup for consistency).

The second vSwitch, vSwitch1, will run the VMkernel active on vmnic1 and standby on vmnic3. The secondary Service Console will be active on vmnic3. For the secondary Service Console I would prefer to set the vmnic1 to “unused”, this way you are 100% sure that the Service Console will run only on vmnic0 and vmnic3.

Pros: a lower “failure detection time”(das.failuredetectiontime) can be defined this is because of the fact that the service console is already active. Here the Failure Detection Time can be set to 20 seconds, further no Spanning Tree problems will occur as the setup contains two service consoles, and subsequently 2 MAC addresses. (This is also the reason why I would set vmnic1 to unused for this portgroup)

Cons: Need to set extra isolation addresses(das.isolationaddress), and secondary Service Console preferably in a different subnet.

Thanks goes out to mr Tom Howarth for a sanity/grammar check.

Comments

Tom says

17 February, 2009 at 22:49

Would this config work well with 4 pNICs??

Service Console, vmotion, vmkernel portgroups to pnic0 to vswif0 to vswitch0, 172.16.10.x/24 subnet, its own VLAN number for VST

Service Console, vmotion, vmkernel portgroups to pnic1 to vswitch0, 172.16.11.x/24 subnet, its own VLAN number for VST

The above is for failover etc.

AND

VM network to pnic3 to vswitch1
VM network to pnic4 to vswitch1
pnic3 and pnic4 teamed up in ESX
assumed to be on the HP switch’s Default_VLAN

What are your recommendations in this configuration for the various HA settings that you mention above??

I’m still trying to figure out the HP switch trunking etc., hopefully that will work itself out with trial and error?????

Thank you…
Jim says

17 February, 2009 at 23:02

What does option #1 look like in the Virtual Center networking section per se??

I daresay it would be easier for people if you posted diagrams of their setup, it would be easier for people new to ESX.
Duncan says

17 February, 2009 at 23:10

@Jim : I will post a couple diagrams soon

@Tom : I will email you, is easier…
Hannes says

18 February, 2009 at 16:54

Option 1:
What is the advantage of having SC active on vmnic0 and VMKernel on vmnic2? Why not making both vmnics active for both portgroups?

I can not currently test this, but I can remember from an earlier test, that when a NIC fails in an active/active setup the failover was transparent (i.e. no Ping lost). In an active/passive scenario however you could always see packet loss when pinging until the standby NIC has taken over, which could be enough for HA to kick in …
Duncan says

18 February, 2009 at 17:47

If you set it to active/active you could end up with having both on the same vmnic which I wouldn’t prefer.

if you miss a ping it will trigger HA. it will retry to ping the isolation address. And if you set das.failuredetectiontime to 60 seconds, which is the best practice, the chance of having a false positive is really small.
Helle Thomson says

24 February, 2009 at 17:06

Additional information regarding the article written by Med Yones, a global business advisor, states that according to American management association “about 50% of businesses that suffer from a major disaster without a disaster recovery plan in place never reopen for business”. It shows that there should be better practices on BC/DR. The writer proposes “sponsor and launch BC/DR project management initiative to guarantee alignment, integration and quality results of your BC/DR activities with business goals” as one method of BC/DR Management best practices. More proposals can be found at

http://www.iim-edu.org/executivejournal/WhitePaperBusinessContinuityDisasterRecoveryBCDRBestPractices.htm
Hannes says

18 March, 2009 at 13:47

Just stumbled over Ken’s view on this topic at http://kensvirtualreality.blogspot.com/2009/03/when-is-it-ok-to-default-on-your-vi.html …
J Stone says

31 March, 2009 at 19:49

VMware VirtualCenter 2.5 Update 3 Release Notes

Suppress “No management Network Redundancy” warning

This release introduces an option to suppress the warning message “Host xxx currently has no management network redundancy” for a host configured as a node in an HA cluster. Set the advanced option das.ignoreRedundantNetWarning to “true” to suppress the warning on hosts not configured in an HA cluster. If the warning appears for host already configured in a cluster, set the option and reconfigure HA on that host to clear the configuration issue.
Duncan Epping says

31 March, 2009 at 20:26

Cool thanls,

Still i wouldn’t use it though
esx4u says

15 April, 2009 at 14:44

Duncan,

What adress i can use for the das.isolationadress2
esx4u says

15 April, 2009 at 14:52

Duncan,

What address i can use for the das.isolationadress2?

I want to configure a second service console on a a seperate physical switch.
First i want to know if i have to configure a dasisolationaddress1 for the 1st servcice console. I’ve read that if u configure a dasisolationadress2 u automatically have to configure a dasisolationaddress for the first service console. By default HA pings the default GW of the first service console. Does the address of the 2nd service console have to be an address that is in the same subnet of the first service console address?
adsouthpaw says

18 April, 2009 at 03:12

In the case where we’re using iSCSI and NFS, does it make sense at all to ping the SAN IP as the second das.isolationaddress? In this case, if we can’t see the storage, the locks expire and we restart on another host. FYI, watch out for the old NetApp practice of disabling NFS locking!
Duncan says

18 April, 2009 at 09:59

Yes it would make sense. especially when you’ve already got a secondary service console setup for iSCSI purposes!
james@router says

6 March, 2010 at 14:11

Does my mac adresse change if I upgrade my computer with some other hardware? For example change the graphic card?
Cor Meurs says

15 February, 2011 at 10:19

Duncan,

We are having problems with the following configuration

nic0 –>VSwitch0 –> ServiceConsole1

nic1 +nic2—->vSwitch1—> Virtual machine portgroups vlans 2000- 4000 (trunk based on IP-hash)

nic3 +nic4 +nic5 —–> vSwitch2 —IPStorage trunk based on IP-Hash

For reduncancy I added a extra Service Console2 to vSwitch1

But as soon as we disable nic0 of Service Console1 the connection to the host is lost.

I have added the das.isolation2 to the advanced properties. I have changed the native vlan vor the Service console, but still no success.

It look like the second service console is unavailable

To you have a idea where the problem is?

Cheers

Cor
- Duncan Epping says
  
  15 February, 2011 at 10:55
  
  What do mean with the connection is lost? From a vCenter perspective? As that would be correct. The secondary service console is only used for HA purposes and not to reconnect to vCenter.

Related

Reader Interactions

Comments