3.5

FT_HOSTS, where is it in ESX 3.5 U2?

Duncan Epping · Sep 8, 2008 ·

This seems to be High Availability day! I was just testing my previous blog when I discovered a weird DNS or host file related error. So I opened up my console and typed “vi /etc/FT_HOSTS”. “vi” opened a blank file and reported back “new file”. What the heck, so I did a find and found that the FT_HOSTS file has been relocated to: /etc/opt/vmware/aam/. So if you’re looking for FT_HOSTS….

And for ESXi, you should be looking here “/var/run/vmware/aam/” by the way.

HA isolation response “shutdown guest”

Duncan Epping · Sep 8, 2008 ·

So if you’re like me, better safe than sorry… than you’ve probably set your ESX 3.5 U2 HA cluster to “shutdown VM” instead of “Power off VM” or “Leave VM powered on”. By now most of you probably already noticed that when an isolation occurs HA will allow the VM to shutdown clean within 5 minutes. When the 5 minutes are past HA will shutdown the VM’s no matter what.

But for some of you 5 minutes(300 seconds) might just not be long enough, or if you have an ultra fast environment 5 minutes might just be to damn long….

So what can you do to shorten or extent… It’s easy, open up you HA cluster settings and click on advanced options and add the following:

das.isolationShutdownTimeout – values in seconds, default is 300

I’ve also updated my HA advanced settings blog! If anyone has more advanced settings that aren’t on the list let me know!

VCB errors

Duncan Epping · Sep 3, 2008 ·

I’ve been doing VMware Consolidated Backup troubleshooting for the last couple of days. A customer ran into problems that I can’t comment on at this moment. But after an upgrade of VCB 1.1 to VCB 1.5 the customer ran into a new limitation of VCB. After 30 VM’s the script stopped working, the following error was thrown at us:

‘vcbMounter’ 5648 error] Error: Cannot mount volume 1, service not accepting new devices.

After a few search actions I noticed the following in the documentation which is clearly a new limit in VCB 1.5:

NOTE Consolidated Backup supports a maximum of 60 concurrently mounted virtual machines. For example, you can concurrently mount 60 virtual machines that have a C: drive, or 30 virtual machines that have a C: and a D: each.

In other words, no more than 60 vmdk’s maybe mounted concurrently. This limit wasn’t in 1.1, well not hard coded anyway… but 1.1 still has it’s limitations!

Clearly, on the part of having more than 5 concurrent VCB dumps, I know that this isn’t a best practice but for this customer it’s what they want and need. I stronly advise against it for any environment though! Follow the best practice of a maximum of 5, and set it up in a way that it involves 5 different datastores!

We are currently investigating other options and trying to find out what the max concurrent connections should be within the environment of this specific customer. Taking all kinds of different factors in consideration like “vmfs locking”, “scsi reservations”, stress on the vmkernel and or service console, diskspace occupation combined with fast growing snapshots etc.

I’ve been looking into VMFS locking associated to snapshots. VMFS locking occurs when metadata changes, in other words it happens with one of the following actions: snapshot file growing, vm starting(cause the file is being locked for read/write), file creation etc.

VMFS Locking means that there is only 1 host able to access the VMFS until the lock is released. So you can imagine what happens when there are 5 vm’s on the same VMFS on five different ESX hosts with snapshots that are growing! It will be like a monday morning traffic jam! So please don’t over do it.

I’ve also got the feeling that VCB is probably the most underrated and misunderstood product out there. I’ll be the first to admit that “file level” backups with VCB isn’t always as convenient as it should be but this is also due to the fact that not every Backup vendor has developed a decent integration module. But for instance CommVault Galaxy has got a special agent for VCB file level backups. This agent makes it possible to do a file level backup via VCB and restore direct to the VM via the agent! Check this PDF for more info on their solution. Full Image backups on the other hand are very useful for DR purposes but can also be used to restore single files again. You can mount the VMDK and browse the folders for the file. You can also use Vizioncore’s vRanger or Veeam’s “Veeam Backup” for a third party add-on to VCB. Both products are definitely worth checking out, and are a great extension to an often overlooked product!

Talking about Full Image Level backup’s besure to read this article, it will save you disk space on your “holding tank” and Tape Library!

ESX vs ESXi

Duncan Epping · Sep 2, 2008 ·

I’ve had this question about a kazillion times by now, what’s the difference between ESX and ESXi. How do they compare… Can I do this with ESX, can I do that with ESXi.

Here’s the answer! This KB article contains a table with features and a description of what you can and can’t do in VirtualCenter. Check it out, it’s definitely worth reading.

And in addition to that, it is possible to do most configurations post installation via powershell. Check this topic on the VMTN forum by Lance!

Why I dislike agents in my Service Console

Duncan Epping · Aug 27, 2008 ·

I’ve never been a huge fan of agents in the Service Console. Too many times I’ve seen hosts fail because of an agent that had a memory leak etc. Now it seems that running the HP IM agents causes your ESX 3.5 U2 to become unavailable after a certain amount of time.

The errors that appear:

0 Z root 8536 3673 0 79 0 – 0 nct> Aug05 ? 00:00:00 cimservera
0 Z root 8537 3673 0 79 0 – 0 nct> Aug05 ? 00:00:00 cimservera
0 Z root 8543 3673 0 78 0 – 0 nct> Aug05 ? 00:00:00 cimservera
0 Z root 32350 3673 0 79 0 – 0 nct> Aug06 ? 00:00:00 cimservera
0 Z root 32351 3673 0 79 0 – 0 nct> Aug06 ? 00:00:00 cimservera
0 Z root 32352 3673 0 79 0 – 0 nct> Aug06 ? 00:00:00 cimservera
0 Z root 32353 3673 0 78 0 – 0 nct> Aug06 ? 00:00:00 cimservera

HStrydom on the VMTN forum posted the following:

I am having the same issue. What happens after 17 days is that there are about 32000 of these processes. ESX has a max value of +- 32000 PID’s. Thus when all have been used up, one cannot SSH into the server, log in from the console or the ESX server disconnects from VC.

Also we have HP servers with the HP agents loaded. Our Dell servers does not have this problem.

Have a look at your cron log, /var/log/cron & cron.1. you might see that some of the job have not run. Also look in your /var/log/messages. There is a lot of login failures.

In other words, if you see the same thing happening call HP and let’s hope they release a fix soon! And in the meanwhile start thinking about ESXi, it’s problems like these that makes you think about why you even need a Service Console in the first place.