Service Console redundancy

Duncan Epping · Jan 14, 2008 ·

The last couple of weeks more blogs and topics appeared around the warning VirtualCenter gives when there’s no service console redundancy. Several people posted about a workaround to clear this warning. The workaround is very easy: temporarily assign an additional nic to the service console vSwitch and reconfigure your HA. Notice that I used ” workaround” cause I definitely don’t see this as a solution for the problem. With the current technology there’s not much reason not to have a redundant service console in my opinion, especially when you are using HA. I know a nic hardly ever breaks but in this case probably more than 8 VM’s rely on this nic, the physical switch and the network cable it’s attached to. When I do VMware implementations it depends on the customer and the hardware which of the following three options I use. All are supported by VMware and each have their own pros and cons:

vSwitch0 – 2 Physical nics(vmnic0 & vmnic2) – 2 Portgroups(Service Console & VMkernel)
Service Console active on vmnic0 and standby on vmnic2
VMkernel active on vmnic2 and standby on vmnic0
Each portgroup has a VLAN assigned and runs dedicated on its own nic, only in the case of a fault it’s switched over to the standby nic, but it will return to the original nic when the connection is up again. This is achieved by setting Rolling Failover to NO! In 3.5 this feature is named “Failback” and has to be set to YES!

Pros: only need 2 nics in total for the Service Console and VMkernel, especially handy in Blade environments.
Cons: If the connection is dropped several time it will cause the nic to failover a lot which can cause HA to kick in. Need to set the Failure Detection Time to 60 seconds apposed to the 20 seconds in option 3. And need to have VLAN’s setup.
vSwitch0 – 2 Physical nics(vmnic0 & vmnic2) – 1 Portgroup(Service Console)
Service Console active on vmnic0 and vmnic2 with “virtual port id” load balancing.
vSwitch1 – 2 Physical nics(vmnic1 & vmnic3) – 1 Portgroup(VMkernel)
VMkernel active on vmnic1 and vmnic3 with “virtual port id” load balancing.
Each portgroups can have a VLAN ID assigned but you can also setup VLAN’s on the side of the physical switch.

Pros: When network engineers want to keep VLAN configuration on the physical switch it’s possible with this setup. You can set Rolling Failover to yes(or Failback to No), this way it will not start “flapping”. Portgroups are active on both nics to keep switching over time as low as possible.
Cons: Need extra nics and less flexible with VLAN’s if it’s not tagged by VMware. Best practice is to set Failure Detection Time to 60 seconds.
vSwitch0 – 1 Physical nic(vmnic0) – 1 Portgroup(Service Console)
Service Console active on vmnic0.
vSwitch1 – 2 Physical nics(vmnic1 & vmnic3) – 2 Portgroups(VMkernel & Secondary Service Console)VMkernel active on vmnic1 and svmnic3 with “virtual port id” load balancing. Secondary Service Console active with an IP on the same subnet as the VMkernel, but a different subnet as the primary Service Console.

Pros: You can define a lower “failure detection time” because of the fact that the service console is already active and doesn’t need to kick in. Failure Detection Time can be set to 20 seconds. No Spanning Tree problems for the Service Console will occur because it has two vswifs, and indeed 2 mac addresses.
Cons: Need to set an extra isolation address, and secondary Service Console needs to be in a different subnet because if you use the same subnet as the primary Service Console both IP adresses would resolve to the same mac. (See theether link below for more info on that one.)

I’ve implemented option 2 a lot, but it’s very prone to physical switch errors and spanning tree problems. Which made me reconsider and I think that option 3 is the less error prone, and in case of a failover or when HA needs to kick in it will within 20 seconds.

For more info check out these links:
VMware KB Article on Redunant SC’s
VMware KB Article on Isolation Addresses
VMware KB Article on HA best practices
Theether article on a secondary Service Console
Help on vSwitch settings for 3.5

Comments

John Middleton says

28 January, 2008 at 18:22

Need a little clarification on ‘Service Console redundancy’ in a Blade environment.

My Blades only have 2 NICs total.. how can I setup this redundancy (VM Kernel + SC) and still get my VM Network on a Vswitch?

In reading through the solution above it appears to have SC + VM Kernel on one Vswitch having the Physical NICs teamed to that…

Being somewhat new to VMware I am not seeing how to setup my limited 2 physical NICs to get everything we need. Right now the only solution I see is to create two Vswitches, one with SC + VM Network and a second Vswitch with SC + VM Kernel ?

Pardon my ignorance , – John
Duncan Epping says

28 January, 2008 at 18:46

Not ignorance… 2 nics, you probably have Dell Blades?

I would suggest bundle the 2 nics into 1 vswitch, and create portgroups for the VMkernel, Service Console and the different vlan’s your need. I’ve did the same setup about a year ago for a customer with Dell 1855 Blades. Works perfect if and when vlan’s are in place!
Joao Pacheco says

27 February, 2008 at 19:40

Hi,

Reading the “Service Console redundancy, January 14th, 2008” article, its 3rd option is not so clear about the way the Secondary Service Console should be set regarding the nic configuration. Is it to be set as active on both nics, or active passive?

Could you please bring some more light into this subject and get this config in more detail?

Thank you in advance,
JNBP

Reply to: jpacheco@dre.pt
Habibalby says

4 June, 2008 at 16:43

Hello Guys,

I came across the side while I’m goggling to have setup my pNICs properly of the Redundancy SC….

I have 2 Blade Servers that connected to a MSA 1000 SAN Storage, each with 1 HBA Daul Ports QL 2423, 2 Default pNICs and 4 pNICs via Mezzenine Cards, Total pNICs = 6. Each Server with 20 gig of Ram.

These two servers in will be in Clusters, HA and DRS will be used to provide Fail over.
I have decided to keep the pNICs as follows:

1- 2 pNICs for Production VMs
2- 2 pNICs for DMZ VMs.
3- 1 pNIC for VMkernal VMotion.
4- 1 pNIC for Service Console.

Q: How I will maintain Redanduncy for SC? Or other Network?

habibalby@gmail.com
Duncan says

4 June, 2008 at 21:14

You could always add a secondary service console on the vmkernel vswitch! This is probably the best solution for your environment.
Habibalby says

5 June, 2008 at 16:39

Hello Duncan,

Thank you for your reply.

in my scenario i will be having vSwitch0 contains “SC (vmnic0) and VMkernal (vmnic1) via VLAN on different Portgroup and Different IP Schema.

i;e, if I have vSwitch0 contains SC portgroup on 128.104.30.0 segment “vmnic0” and 10.0.0.2 for VMkernel portgroup on “vmnic1” i won’t be able to add another SC to the same vSwitch0, yeah!!!

Or i should make saperate vSwitch for each Network “SC and VMkernal”?

In this case, the result will be, vSwitch0 contains SC portgroup on 128.104.30.0 “vmnic0” and vSwitch1 contains VMkernal portgroup on 10.0.0.0 “vmnic1”

If i add an addition SC to the vSwitch1 which it on Network 10.0.0.0 and my Original SC on 128.104.30.0 Network, what IP Address i should give the second SC on vSwitch1? is it 10.0.0.x IP and this IP should reach to the 128.104.30.0 Network where my production is setting as well as the VC?

Or, since the SC will be the same IP schema as the Production IP Schema I should make;

vSwitch0 contains only SC on “vmnic0” on 128.104.30.0 AND
vSwitch2 contains Production VMs portgroup on “vmnic2, “vmnic3” on 128.104.30.0 Network as well as Adding another SC portrgoup on 128.104.30.0 to the vmnic2 and vmnic3 ??

Thanks for your suggestion
Habibalby says

7 June, 2008 at 07:50

Hello,
So, to summrize this.

[b]vSwitch0[/b] contains SC and VMotion each on it’s own Portgroup and vmnic.

[b]vmnic0[/b] = Service Console in the IT Management VLAN that’s spreated from Production Network

[b]vmnic1[/b] = vMotion in it’s own None Routed Network

In the vSwitch properties –> Nic Teaming –> checked Overide vSwitch failover order:

In the Service Console Portgroup [b]vmnic0[/b] Active Adabter and [b]vmnic1[/b] Standby Adabter

In the vMotion Portrgoup [b]vmnic1[/b] Active Adabter and [b]vmnic0[/b] Standby Adabter

Rolling Failover Set to: [b]Yes[/b]

Q: If the SC on it’s own Adminitrative Segment “VLAN” and only IT and VC has access to it via ACL from Production Segment and vmnic0 the remaining nic is the vmnic1 which is dedecated to the vMotion Network. Is there any routing between vmnic1 and vmnic0 must be in place in order for the IT Administrators and VC reach the SC via *vmnic1*?

Best Regards,

Hussain Al Sayed
pautuanny says

3 September, 2008 at 01:33

hey 🙂
its very point of view.
Good post.
realy gj

thx 🙂
BrianD says

25 November, 2008 at 22:56

In regards to option 3, I’ve always though that your Kernel Network should be totally isolated from your SC and your production NICs. But you’re steps would allude to having a SC and Kernel on the same VLAN. Are there any pros or cons to having the Kernel VLAN isolated? Or if you’re saying Kernel and secondary SC on different VLANs, then i assume port trunking on the physical switch is required?
Duncan Epping says

25 November, 2008 at 23:20

Yes port trunking would be required indeed. But it’s not necessary. I don’t see why one wouldn’t use the same nic for the vmkernel and the sc. I know technically speaking one could consider this a security breach, but as soon as you’ve got access to the Service Console you can access VM data any way. (VMDK’s, Memory dump with snapshots etc.)

I was going to revise the document anyway, might do it tomorrow if I can find the time.
habibalby says

1 December, 2008 at 20:11

Hello Folks,

I want to configure my ESX Servers to works with VLAN under Nortel Switches 4542 GT in Stack-mode.

Server Configuration:
2- DL380 G5, each with Single Port HBA, 6 pNICs, 2 pCPU Dual-Proc.
2- BL460 G1, Each with Dual-port HBA, 6 pNICs, 1 pCPU Dual-Proc.

Setup:
vSwitch0 = ESX Networks: Service Console “172.16.20.0/24” && VMotion “10.1.0.0/24” using VLANs.
vSwitch1= Production Network: 128.104.0.0/16
vSwitch2 = DMZ Network: 192.168.1.0/24

Private Network for ESX:
vSwitch0 with 2 pNICs connected vmnic0 & vmnic1 Teamed on the vSwitch Level.
2 Portsgroup.
1 Service Console
1 VMotion

In the portgroup Setting for S.C –> Nic Teaming is vmnic0 Active and vmnic1 Standby
In the portgroup setting for VMotion –> Nic Teaming is the vmnic1 Active and vmnic0 Standby.

vmnic0 connected to pSwitch on port configured with VLAN 2
vmnic1 connected to pSwitch on port configured with VLAN 3

Production Network:

vSwitch1 with 2 pNICs connected vmnic2 & vmnic3 Teamed on the vSwitch Level.
1 Portgroup.
Production VMs

vSwitch2 with 2 pNICs connected vmnic4 & vmnic5 Teamed on the vSwitch Level.
1 Portgroup
DMZ VMs

================================================== =============================

If
I assign an IP Address to the S.C with the same IP which is configured
on the VLAN, “Without Assigning the ((VLAN ID)) in the portgourp,
through pServer, i can reach to other ESX Host Service Console, because
both of them are on the same VLAN.

As soon as I assign the
((VLAN ID)) on the portgroup of S.C, i lost the connectivity to the
server, and I started troubleshooting the vswif0 to create another
Service Console Network in order to access it the ESX Host. “And the
same applies on the VMotion Network”.

The same goes to the
VMotion network as well. From the pSwitch, both the VLANs are reachable
to 172.16.20.0/24 Service Console, and 10.1.0.0/24 for VMotion Network.

I want the Service Console Network, can talk to the VMotion Network and vice versa to get the VMotion works.

Service Console:
IP:172.16.20.2/24
D.G: 172.16.20.1
DNS: 172.16.20.57 “This host is connected to the same VLAN where the ESX hosts connected”. It’s a VC and DNS Server.

VMkarnal:
IP:10.1.0.2/24
D.G: 10.1.0.1

From,
within the ESX Host, I’m unable to reach to the Default Gateway of the
VMotion Network using vmkping. Nor the Service Console able to reach to
the VMotion Network.

Moreover, I wanted to reach to the Service
Console Network 172.16.20.0 via 128.104.0.0 Network to do my
Administrative Task. In this case, do I have to add a Static Route in
the Service Console, in order for the VI Clients reach from Production
Network?

Further Testing:

have UnTagged the ports for both VLANs. I setup both PortGroups S.C & VMKernel without VLAN ID.
I
got one host can ping VMkernel PortGroup on another host via COS ping.
Also from the same host tried vmkping S.C IP and D.Gateway. It’s
successul.

However, from the another host I can reach the
first host S.C IP but not VMkernel. Nor the VMKernel able to reach it’s
D.Gateway.

Since the both VLANs are reachable within the pSwitch. Do I have to use a port Trunking, and assign different VLAN ID “The Trunked vLAN” in each Portgroup *S.C & VMotion*?

In additional to what i have mentioned to earlier regarding the NIC Teaming.
Both, vmnic0 & vmnic1 assigned to vSwitch0, in the NIC Teaming
Setting of vSwitch0, both vmnic0 & vmnic1 as Active/Active. And
within each PortGroup, S.C = vmnic0 Active & vmnic1 Standby. And
VMkernel = vmnic1 Active & vmnic0 Standby. Is this Setting may
confusing the VLAN to work properly?

Furter troubleshooting I’m going to make;
1. Remove the Nic Teaming from the PortGroups.
2. Configure the vSwitch0 with only vmnic0 on both hosts, assuming a pNIC failure.
3. Test both hosts can ping each other S.C and D.Gateway 172.16.20.1
4. Cofigure VMkernel with the prospetive VLAN IP schema & test vmkping whether it can reach S.C IP & it’s D.Gateway.
5. If it’s success, then will configure the same on the other host and test the connectivity between the hosts.

If not, do I have to configure a Trunking on the pSwitches and make both VLANs 3 & 4 members of the Trunked VLAN?

Further Testing:

In the pSwitch on port 5 where the vmnic0 connected, i have set the port ot TagAll.

Result:
1. I lost the connectivity to the vswif0 “Service Console IP”. But within this vSwitch0, I have VM Network Portgroup, and one of the Virtual Machines IP’s is set to the same VLAN of the Service Console, it’s reachable

This without VLAN ID specified on any PortGroup.

2. While the the pSwitch Port 5 is set to TagAll, I have specify a VLAN ID to both the Service Console & VM Network Portgroups, I got the connectivity back up on the Service Console as well as on the VM Network.

Now I have the vmnic0 connected to port 5 “VLAN 3 – ip: 172.16.20.0” on the pSwitch and vmnic0 linked to vSwitch0. Also, vmnic1 is connected to port 6 “VLAN 4 – ip: 10.1.0.0” on the pSwitch and vmnic1 is Linked to vSwitch0 as well.

Question: How to get the VMotion works since Service Console is setting on the different Network and VMotion on Different Network?

Do I have to specify a Static Route in the ESX Server in order for the VMKernel Network sees the Service Console Network?

Thanks,
Duncan Epping says

1 December, 2008 at 21:05

To start with, you should setup both connections to vswitch0 as a vlan trunk. So both vlan’s should be published on both links, cause when a nic failover occurs you want to be able to use this nic directly instead of having to do a reconfigure.

The VMotion network doesn’t need to be a routable network. It can be a completely isolated network on a seperate switch without an uplink to the Service Console network, as long as all ESX Hosts VMkernel can “vmkping” eachother. So that’s what you should test! Not a regular ping.

Start with this and then let me know the results.
habibalby says

2 December, 2008 at 09:35

Hello,

Just to let you get more clear picture about what I have done;

pSwitch1, Port 5 = VLAN3
pSwitch1, Port 6 = VLAN4
IP Address: 172.16.20.0/ 24 D.Gateway: 172.16.20.1
pSwicth2, Port 5 = VLAN3
pSwitch2, Port 6 = VLAN4
IP Address: 10.1.0.0/24 D.G: 10.1.0.1

I have trunked those ports in the pSwitch:
Trunk Name: ESX
Unit: Port:
1 5
1 6
2 5
2 6

I have set the both ports PVID equally, and selected both ports as member of each other VLANs. The Trunk is successful enabled.

I have set a VLAN ID in the Service Console and VMotion Portgroups, and set pSwitch Ports to TagAll and UnTag. But, I’m unable to get any Connectivity to the Service Console nor the VMkernel between each host.

I have disabled the Trunk, and set the VLAN Ports to be a member of each other VLAN and set the Port 5 “Where the vmnic0 Connected as TagAll”. I got the connectivity back to the hosts, and VMkernel can ping each other. But vmkping cannot reach to it’s Default Gateway.

Any further help?

Thanks,
habibalby says

3 December, 2008 at 14:40

Hello,

The problem has been solved by the following steps

1. Disabled the Trunking ports in the pSwitches
2. In the pSwitch I have set the VLAN3 and VLAN4 member of each other.
3. Set the VLAN ID in the Service Console and VMkernel Portgroups.
4. In the pSwitch, I have set the Tagging to TagAll for both ports 5 & 6 in both pSwitches.
5. PVID are the same on both ports on both pSwitches.
6. For the second host, did the same steps entering the VLAN ID in the correspondence portgroup.

Result:

* Both hosts they can reach other.
* Both hosts they can reach the configured Default Gateway.
* Both hosts they can reach the VMkernel IP Address of each host using vmkping command.
* Test vmnic0 failure = Still I can reach other host IP Address & hostname, and the Default Gateway as well as the VMkernel IP Address of other host in the Cluster
* Test vmnic1 failure = Still I can reach other host IP Address & hostname, and the Default Gateway as well as the VMkernel IP Address of other host in the Cluster.

Now, when I tried to re-configure the VMware HA, I got an error while I configuring the VMware HA, I got an errors:

HA agent on hosts.esx.local in cluster ESX in Development Network has an error: No Active Primaries Found “Means no Primary Host found in the Cluster”
CMD startagent failed: Internal AAM Error – Agent could not start

Steps followed to overcome this problem:

1. Put the hosts in Maintenance Mode.
2. Disconnected the hosts from the Virtual Center.
3. Removed the hosts from the Virtual Center
4. Restarted the Virtual Center, just in case to get everything cleared.
5. Added both hosts to the Cluster, and enabled VMware HA. WAW, everything works as expected successfully.
brex says

15 January, 2009 at 11:10

Hi all,

I’m preparing a VI3 environment with 2 IBM x3650 both with 4 pNIC (2 onboard + 2 with expansion card).
I’ll have 6 VM (actually) and one of them is a mail server which has to be on DMZ.
What configuration do you suggest? I’ve read a lot of docz and posts here and there and I’m ended with this possibility (due to the low pNIC availability in this setup):

VSWITCH0 (2 pNIC – vmnic0 & vmnic2):
Port Group1: Service Console (vmnic0 Active – vmnic2 Standby)
Port Group2: VMotion Network (vmnic2 Active – vmnic0 Standby)

VSWITCH1 (2 pNIC – vmnic1 & vmnic3):
Port Group1: VM Production Network (tagged with a specific VLAN ID)
Port Group2: VM DMZ Network (tagged with a specific VLAN ID)

I will then be able to offer Service Console redundancy through a standby pNIC used actually by the VMotion Network and vice-versa.
Having only 2 spare pNICs and needing some fault-tolerance (and trunking) for VM Networks I’ll have to put Production and DMZ VMs on the same VSWITCH separating them on 2 different Port Groups and using different subnets or VLANs (which is better?).

Is this safe? I know that a single pNIC only for DMZ would be a better solution but this will gave me no pNIC redundancy for both VM Networks.

Please give me any advice or suggestions on better design.
lynxbat says

17 February, 2009 at 18:42

Great article.
ryan@lan says

6 March, 2010 at 14:26

Does my mac adresse change if I upgrade my computer with some other hardware? For example change the graphic card?
hypersean says

25 March, 2010 at 06:27

Is is possible to configure both service consoles on the same subnet, for example:
Sevice Console1: 192.168.1.1 vmnic1 vSwich1 –> Physical Swicth A
Sevice Console2: 192.168.1.2 vmnic2 vSwich2 –> Physical Swicth A
Martin Flamio says

3 August, 2010 at 18:57

Exactly where can it be, i’d like to read more about this particular posting, thank you.
Jerred says

15 July, 2011 at 21:46

In solution three, if the servers are controlled by vCenter how do you coordinate between the two separate service console IPs? I want to setup something similar to this but am trying to figure out the logistics, as far as I know you can’t point the vCenter server at two different IPs per host correct?
- Duncan Epping says
  
  15 July, 2011 at 22:56
  
  Not sure I understand your question. But when you have vCenter just add the two hosts to vCenter and that is it…
Dave Bowman says

5 February, 2012 at 12:03

I just came across this interesting post and I find it very helpful. There’s just a thing which is not clear to me regarding solution 1 when you say “And need to have VLAN’s setup”.
Why a VLAN is needed?
Supposing Service Console is on net 192.168.1.x/24 and VMKernel (for vmotion) runs on 192.168.100.x/24, is a VLAN really mandatory (except for security reason of protecting the SC management network)?

Also: I setup up the two port groups as described (overriding vswitch0 settings), but is it important the way vSwitch0 stays configured? Can I leave it with both NICs active or is a setting that it’s just being ignored since the portgroup setting have precedence?

I found solution 1 best suits my setups, where hosts have 4 NICs, with vmnic0 and vmnic2 configured as per solution1 and vmnic1 and vmnic3 load balanced for VM portgroup.

Thanks for your support!
ITServ says

31 May, 2012 at 13:25

Very good article, but when i want to integrate physical switch redundancy im getting confused.

In my scenario the two pNICs are connected to separete pSwitches wich should not be connected to each other directly (only over the backbone switch).

So on all ESXis,
vMotion-kernel1 is connected to pSwitch1 and
vMotion-kernel2 is connected to pSwitch2.

How can i control that only vMotion kernel on the same switch will talk together?
Can i configure different ip range and vlan on the vMotion-kernel2 without generating problems?
Maybe this way?
vMotion-kernel1 – 192.168.10.x – vlan10
vMotion-kernel2 – 192.168.20.x – vlan20
ITServ says

31 May, 2012 at 13:26

oh dear, sorry this was the wrong tab. =)

Related

Reader Interactions

Comments