I received a question today, and I didn’t have the answer so I reached out to one of the developers. This person found this line in the ESXi documentation where it states the following, and the question was what does running ESXi in degrade mode actually means, or what is the impact?
If a local disk cannot be found, then ESXi 7.0 operates in degraded mode where certain functionality is disabled and the /scratch partition is on the RAM disk, linked to /tmp. You can reconfigure /scratch to use a separate disk or LUN. For best performance and memory optimization, do not run ESXi in degraded mode.
In other words “degrade mode” is a situation where you are running ESXi with a boot disk configuration which is undesired. In this case, the boot disk configuration (size, etc) results in the fact that /scratch is not stored on persistent media, but rather in RAM, which means that state is lost during a reboot. This could lead to various problems, hence it called degraded mode or state. Note that although you are now running in “degraded” mode, it could easily prevent you from upgrading potentially in the future.
So how do you resolve this problem? Follow the recommendations VMware provides for the ESXi configuration:
- An 8 GB USB or SD and an additional 32 GB local disk. The ESXi boot partitions reside on the USB or SD and the ESX-OSData volume resides on the local disk.
- A local disk with a minimum of 32 GB. The disk contains the boot partitions and ESX-OSData volume.
- A local disk of 142 GB or larger. The disk contains the boot partitions, ESX-OSData volume, and VMFS datastore.
Although not a requirement, I would urge to read and follow the next sections from the documentation:
- Although an 8 GB USB or SD device is sufficient for a minimal installation, you should use a larger device. The additional space is used for an expanded core dump file and the extra flash cells of a high-quality USB flash drive can prolong the life of the boot media. Use a 32 GB or larger high-quality USB flash drive.
- If you install ESXi on M.2 or other non-USB low-end flash media, delete the VMFS datastore on the device immediately after installation.
If you want to mitigate the situation after upgrading to ESXi 7.0 you can add a new local disk and enable “autoPartition=TRUE” and reboot. At reboot, the disk will be partitioned and populated for usage. The use of this advanced setting, and others which relate to ESXi 7.0, are described in this KB article here.
For those wondering, “ESXi-OSData” is the partition where we now store the content of what was previously stored in “scratch”, “core”, and “locker”. Niels wrote a deep-dive on the vSphere blog here, go check that out.
AD says
Last time I looked, many of the VSAN ready node configs violate this, since they use USB/SD based boot devices which put /scratch onto ramdisk. I was quite disappointed to spend a few $M on clusters only to have to buy additional disks after the fact and work out the logistics to get them added to the servers.
Is there any chance that future ready node configs could include a separate device for scratch, or mandate boot devices as SATA DOM or SATA/SAS/NVME disk only?
Duncan Epping says
Let me contact the engineering team, not sure what to say to be honest.
Pin Pin Poola says
Hi Duncan. Ref “If you install ESXi on M.2 or other non-USB low-end flash media, delete the VMFS datastore on the device immediately after installation”. Please could you help clarify why VMware are recommending immediately deleting the local VMFS datastore? Is this mitigation against future data loss due to the fact it IS ‘low-end’ M.2/flash boot media or is there another reason that might apply to any type of flash/nvme boot media (low/QLC, medium/TLC or high/MLC)? Thank you. Pin
Duncan Epping says
this is just so that people don’t run VMs on those devices… No other reason for it.