HA Futures: VMCP for Networking – Part 3 of 4 – (Please comment!)

Duncan Epping · Oct 30, 2018 ·

VMCP, or VM Component Protection, has been around for a while. Many of you are probably using this to mitigate storage issues. However, what if the VM network fails? Well, that is a problem right now… if the VM network fails then there’s no response from HA. This by many customers is considered to be a problem. So what would we like to propose? VM Component Protection for Networking!

How would this work? Well the plan would be to allow you to enable VM Component Protection for Networking for any network on your host. This could be the vMotion network, different VM networks etc. On this network HA would need to have an IP address it could check “liveness” against of course, very similar to how it used the default gateway to verify “host isolation”.

On top of that, besides validating liveness through an IP address, of course, HA should also monitor the physical NIC. If either of the two would not work, well then HA should take action immediately. What this action will be will depend on the type of failure that has occurred. We are considering the following two types of responses to a failure:

If vMotion still works, migrate the VM from impacted host to a healthy host
If vMotion doesn’t work, restart the impacted VM on a healthy host

In addition to monitoring the health of the physical NIC, HA can also use in guest/VM monitoring techniques to monitor the network route from within the VM to a certain address/gateway. Would this technique be useful?

What do you think? Please provide feedback/comments below, even if it is just a “yes, please!” Please help shape the future of HA!

Comments

Michael White says

30 October, 2018 at 15:10

Yes, Please!
Donny says

30 October, 2018 at 15:10

Curious to know what is the experience in realizing this risk? I could see monitoring the physical network, but have never had a virtual network issue (outside the 1000v, yuck).
rctunisi says

30 October, 2018 at 16:20

When it will be available? =)
Darshan Kolambkar says

30 October, 2018 at 17:00

Indeed, monitoring physical nic will help. While design architect has to make sure network design from two different source switches & different power source with required configuration at vm nic level.
David Pasek says

30 October, 2018 at 22:30

yes, please!
Anton Davidovskiy says

31 October, 2018 at 13:20

That’s a great new feature.
I have a few questions though:
– if we use this feature for regular VLAN-based VM network, does it mean that every ESXi host in cluster will have to have IP address in this network?
– any chance this is ever integrated with NSX at VM level? What I mean is checking VTEP network connectivity is one thing, but there could be control plane issues affecting VM network connectivity.
- David Pasek says
  
  31 October, 2018 at 16:33
  
  Q: If we use this feature for regular VLAN-based VM network, does it mean that every ESXi host in cluster will have to have IP address in this network?
  
  A (my humble opinion): Yes, it seems logical that every ESXi host will need to have IP address in particular “heartbeat” network. On the other hand, Duncan wrote that vMotion network could be leveraged for such heart-beating, therefore in such case you can leverage existing IP addresses already used for vMotion. If you, from whatever reasons, decide to not use vMotion you will probably need to create another VMkernel interfaces on each ESXi host having each IP address from other “VM network” (aka portgroup / VLAN).
  
  Duncan, please correct me if my assumptions are wrong.
Stephan K. says

31 October, 2018 at 14:05

Maybe the data from VLAN/MTU Health check clould be considered.
If a Host has a missing VLAN, HA could move all VMs in the coresponding Portgroup from this Host to another.
Stephan says

2 November, 2018 at 13:23

Nice one!
Christian says

5 November, 2018 at 14:46

We would like to see an feature like this in feature releases!

One of our environments have Backend on one NIC and Frontend on the other card. This design was implemented cause there were issues regarding FCoE over different CNAs at the moment of implementation. Now we have to deal with it and no of the other departments (the compute guys, net network team) want to fix this, cause “they are clean”. When it comes to situations where the FrontEnd NIC is hang up, the VMs have no more network, but also HA is not triggered. Deal with such situations just by enabling an feature like this would be very nice.

Cause this environment is our DMZ with a few firewalls it would also be cool, if the implementation would be transparent and indepent for the vlans (-> we do not want to ping the gateway from vmkernel / ESXi). Maybe it would be possible to ping the gateway via the VMware Tools from inside the VM?
Boor says

5 November, 2018 at 17:04

yes, please! if there is a problem detection on NIC chip level – it will be wonderful!
llseven says

13 November, 2018 at 01:11

Yes, please…yes !!! This is the featuere i was looking for a while
llseven says

13 November, 2018 at 01:15

When it will be available ??

Soon…i hope

Related

Reader Interactions

Comments