On Slack someone asked why the vMotion check for vSAN 6.6 Health Check was failing constantly. It was easy to reproduce when using the vMotion IP Stack on your vMotion VMkernel interface. I went ahead and tested it in my lab, and indeed this was the case. I looked around and then noticed the following in the vSAN 6.6 release notes:
vMotion network connectivity test incorrectly reports ping failures
The vMotion network connectivity test (Cluster > Monitor > vSAN > Health > Network) reports ping failures if the vMotion stack is used for vMotion. The vMotion network connectivity (ping) check only supports vmknics that use the default network stack. The check fails for vmknics using the vMotion network stack. These reports do not indicate a connectivity problem.
Workaround: Configure the vmknic to use the default network stack. You can disable the vMotion ping check using RVC commands. For example: vsan.health.silent_health_check_configure -a vmotionpingsmall
I guess that clarifies things, so I figured I would test it. Here’s what it looked like before I disabled the checks:
I used RVC to disable the checks, let me show two methods:
vsan.health.silent_health_check_configure -a vmotionpingsmall /localhost/VSAN-DC/computers/VSAN-Cluster
Note that you will need to replace the “VSAN-DC/..” with your cluster and datacenter name. This disables the vMotion ping test. The other is running this command in interactive mode, that will allow you to simply enter the number of the specific test that needs to be disabled. It will list all tests for you first though.
vsan.health.silent_health_check_configure -i /localhost/VSAN-DC/computers/VSAN-Cluster
The vMotion tests are somewhere half down:
44: vMotion: Basic (unicast) connectivity check
45: vMotion: MTU check (ping with large packet size)
And of course this doesn’t only apply to the vMotion tests, with vSAN 6.6 (vCenter 6.5.0d) you can also disable any of the other tests. Just use the “interactive” mode and disable what you want / need to disable.
Note that you can now also disable health checks in the UI as shown in the GIF below. Click it to watch it!
Jon Retting says
Thanks! But too late… Noticed the issue on my lab, came to the conclusion it was vMotion on non default stack. Thanks for the RVC intel 🙂
Tet Kyaw (@htetk) says
Thanks for the RVC trick. Just saw this issue now in my lab after upgrading to 6.6
Bryce Catten says
Hey Duncan, I am curious if you can speak about Disk Congestion errors? We are running an all SSD VSAN 6.0 u3. When moving large amounts of data (ie. putting a host in maintenance mode) we get one or two disk congestion errors on our flash disks. The errors are temporary and go away, but we are concerned about long term health of the drives or if this is to be expected?
Denis Losakov says
Thanks for the post! Only one question (I’m new to vSAN) its safe to disable/enable some health checks it production environment (have no lab to try it)?
Alasdair Carnie says
I have run these commands and the small ping has disappeared, but the large packet ping is still there and failing. Any suggestions?
Alasdair Carnie says
Ignore this. I was being an idiot.
Sajjad Siddicky says
I think these both need to be disabled if the interactive method is not used :
vsan.health.silent_health_check_configure -a vmotionpingsmall
vsan.health.silent_health_check_configure -a vmotionpinglarge