I noticed a couple of people reported this problem in the last two months so I figured a blog post would be useful. This thread on VMTN triggered this article. If your ESXi host is disconnected from vCenter (even 5.0 and 5.1 appear to be impacted by this) and you see error messages in your log files about free space like these:
WARNING: VisorFSObj: xxxx: Cannot create file /var/spool/snmp/xxxxxxxx_x_
x_xxxx.trp for process hostd-worker because the inode table of its ramdisk (root) is full.
VmkCtl Locking (/etc/vmware/esx.conf) : Unable to create or open a LOCK file. Failed with reason: No space left on device
This could be caused by the fact that ESXi is running out of inodes. You can simply check that on the command line by using the following command:
stat -f /
The outcome of this command will look as follows:
File: “/”
ID: 1 Namelen: 127 Type: visorfs
Block size: 4096
Blocks: Total: 449852 Free: 324368 Available: 324368
Inodes: Total: 8192 Free: 55
As you can see the amount of “free” inodes is low and this is causing the experienced issues. In some cases it is reported (by vdsyn in this case) that “/var/spool/snmp/” is full and needs to be cleaned out. In this KB Article “/var/run/sfcb/” is explicitly called out and also explains what you can delete and how. So make sure to look at those two directories when an ESXi host is disconnected from vCenter.
JunTan says
Hi Epping,
you can also use command df -i to check the inode useage of the filesystem.
In the *nix environment, the inode allocation is based on the filesystem.
Duncan Epping says
Euuh, there is no “df -i” on ESXi?
Pat Erler says
Hi,
as you are on rare problems with esxi, I have a problem on several servers, not being able to connect via ssh to the esxi console. Problem is, I almost find no reference to it in vmware’s forum (strangly some references in centOS forums and at HP, not related to esx at all). Here is what happens:
ssh yourserver.com
Server refused to allocate pty
you can login with
ssh -T yourserver.com
though (you don’t get a prompt, but can issue commands and receive the results)
I suspect somehow, that this may be related to putty and strange hang ups I got, when using putty with a management tool (http://www.visionapp.com/germany/solutions/asg-remote-desktop.html) which I just recently solved by specifying http://screencast.com/t/veLMANQM
what is annoying is, that you can’t workaround this problem without rebooting the esxi host..
If you got some time on hand, please have a look into it. Contact me, if you need more information or access to an affected system.
Andy says
I’ve got two hosts that are running VM’s and have disconnected from vCenter.
I also get the Server refused to allocate pty, and have worked around SSH by using winscp and issue commands using the terminal.
I have attempted to restart the management agent (services.sh restart), and forcefully kill/restart the hostd service.
I do not appear to be running out of inodes.
Just like the users who have posted in this thread http://communities.vmware.com/message/2071902, I had a change in paths, and have issues a rescan prior to the hosts disconnecting.
I am looking for a way to restart the management agents successfulyl without rebooting the hosts.
BradJ says
In esi just type
services.sh restart
As long as (and its not set this way by default) as your VM’s are not set “Start/Stop with the host” then your VM’s will be unaffected.
Or Arnon says
Hi Duncan,
We have encountered another issue regarding the “ramdisk full” error.
Our vCenter DB have inflated to 50GB due to multiple events of ramdisk full and HP CIM events. We did NOT lose connection to the ESXi servers, however, we’ve found another file that reached 30MB of the available 32MB of the root ramdisk: /var/log/hpHelper.log
This is a known issue, should be repaired by HP. However, it is a file worth checking on HP Gen8 servers.
Can you elaborate on the concept of a ramdisk and how to avoid the ramdisk full issue? (we have already configured a different scratch location from day 1 but it doesn’t seem enough).
Thank you
JasonV says
Ran into this article but had fits getting connected back with vCenter. Used the kill -9 command on the hostd processes, then give it time to die out. THen I followed this KB article to clean up and stop it from happening again.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2037798
If you have the RAMDISK full issue from HP there is an offline pack that you can install to fix the issue with GEN8 hardware.
Steffen Plotner says
We are seeing this problem on a weekly basis with one host of a larger cluster running esxi-5.5.
vmware must not be doing any quality checking anymore – we have vmware esxi4.1 running for years without a problem.
Today, I have to cold reboot another host just to make it work again – vcenter thinks the host is not responsive – incorrect, I can ping it. Yellow screen, restart agent is stuck and WON’T stop (who allows that to happen?????). We are running on decent Dell Blade hardware – this is not acceptable.
I will promise vmware that this is the last version we are using and most likely are forced to switch – IT heads are not going to tolerate these frantic reboot situations.
I normally complain that much – however, this is wasting everyone’s time.
Steffen
Duncan Epping says
Sorry to hear you are facing these kind of problems. I recommend you contact support and let them do a root-cause analysis on why this happens. Considering it is one host in a large cluster I suspect that there is something wrong with the hardware or maybe a combination of a driver / hardware. It is difficult to say without access to the environment and advanced troubleshooting tools and knowledge. Our support team should be able to provide you just that.
If support does not provide you the answer you are looking for I suggest to escalate the support call using your VMware sales/pre-sales representative or your technical account manager.
Ashley Smoot says
Had the same symptoms; host unavailable, but running VMs fine. Was down to 4 free inodes and out of RAM disk space. Couldn’t vMotion VMs off, so had to find problem. Turned out to be a syslog shipping problem because I had used the ship to logdir setting which didn’t seem to work sent it to local root disk about a month before. It was just a one time dump, but the files stayed there and were a ticking time bomb. Note to self: Check RAM disk every now and then or find some alert. This post help me find the solution, so thanks!