u1

Alert: vSphere 5.5 U1 and NFS issue!

Duncan Epping · Apr 19, 2014 ·

Some had already reported on this on twitter and the various blog posts but I had to wait until I received the green light from our KB/GSS team. An issue has been discovered with vSphere 5.5 Update 1 that is related to loss of connection of NFS based datastores. (NFS volumes include VSA datastores.)

*** Patch released, read more about it here ***

This is a serious issue, as it results in an APD of the datastore meaning that the virtual machines will not be able to do any IO to the datastore at the time of the APD. This by itself can result in BSOD’s for Windows guests and filesystems becoming read only for Linux guests.

Witnessed log entries can include:

2014-04-01T14:35:08.074Z: [APDCorrelator] 9413898746us: [vob.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
2014-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
2014-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
2014-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS1
2014-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2014-04-01T14:37:28.081Z: [APDCorrelator] 9554275221us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

If you are hitting these issues than VMware recommends reverting back to vSphere 5.5. Please monitor the following KB closely for more details and hopefully a fix in the near future: http://kb.vmware.com/kb/2076392

Updating LSI firmware through the ESXi commandline

Duncan Epping · Apr 8, 2014 ·

I received an email this week from one of my readers / followers on twitter who had gone through the effort of upgrading his LSI controller firmware. He shared the procedure with me as unfortunately it wasn’t well documented. I hope this will help others in the future, I know it will help me as I was about to look at the exact same for my VSAN environment, thanks for sharing this Tom!

— copy / paste from Tom’s document —

We do quite a bit of virtualization and storage validation and performance testing in the Taneja Group Labs (http://tanejagroup.com/). Recently, we were performing some tests with VMware’s VSAN and due to some performance issues we were having with the AHCI controllers on our servers we needed to revise our environment to add some LSI SAS 2308 controllers and attach our SSD and HDDs to the LSI card. However our new LSI SAS controllers didn’t come with the firmware mandated by the VSAN HCL (they had v14 and the HCL specifies v18) and didn’t recognize the attached drives. So we set about updating LSI 2308 firmware. Updating the LSI firmware is a simple process and can be accomplished from an ESXi 5.5 U1 server but isn’t very well documented. After updating the firmware and rebooting the system the drives were recognized and could be used by VSAN. Below are the steps I took to update my LSI controllers from v14 to v18. [Read more…] about Updating LSI firmware through the ESXi commandline

VSAN for ROBO?

Duncan Epping · Apr 8, 2014 ·

I noticed this new SuperMicro VSAN Ready Node being published last week. The configuration is potentially a nice solution for ROBO deployments, primarily due to the cost of the system.

Supermicro SuperServer SYS-1018D-73MTF latest addition to @VMwareVSAN vSAN Ready line-up – http://t.co/G07r0InPps pic.twitter.com/p9gz3yRF4S

— Supermicro (@Supermicro_SMCI) April 4, 2014

When I did the math it came in around $ 3800,-. This is the configuration:

SuperMicro SuperServer 1018D-73MTF
1 x Intel E3-1270 V3 3.5GHz- Quadcore
32GB Memory
5 x 1TB 7200 RPM NL-SAS HDD
1 x 200GB Intel S3700 SSD
LSI 2308 Disk controller
4 x 1GbE NIC port

It is a nice configuration that will allow for roughly fifteen 1 vCPU Virtual Machines with 3GB of memory and 60GB disk capacity per host. Personally I would use a different CPU and some more memory probably as that gives you a bit more headroom, especially during maintenance. The cost from a software point of view is socket based so you can increase memory and change the type of CPU with relative low cost impact. The SuperMicro server listed however is limited to the E3 CPU family and to 32GB but there are alternatives out there. (For instance the Dell R320 or maybe even the R210 etc)

From a software point of view the cost of this configuration is limited to 3 x VSAN license and 3 x vSphere. As VSAN even works with Essentials Plus and Standard you could leverage that to keep the cost down, but keep in mind that you won’t have DRS if you drop down to Standard or lower. Still sounds like a nice ROBO package to me, especially when you have many sites this could be a great way to create a standardized packaged solution.

ESXi DCUI Shutdown vs vCenter Shutdown of a host

Duncan Epping · Apr 4, 2014 ·

Today on the community forums someone mentioned he had shutdown his host and that he expected vSphere HA to restart his virtual machines. For whatever reason he got in a situation where all of his VMs were still running but he couldn’t do much anymore with them and as such he wanted to kill the host so that HA could safely restart the virtual machines. However when he shutdown his host nothing happened, the VMs remained powered off. Why did this happen?

I had seen this before in the past, but it never really sunk in until I saw the questions from this customer. I figured I would test it just to see what happened and if I could spot a difference in the vSphere HA logs. I powered on a VM on one of my hosts and moved off all other VMs. I then went to the DCUI of the host and gave a “shutdown” using F12. I tailed the FDM log on one of my hosts and spotted the following log message:

2014-04-04T11:41:54.882Z [688C2B70 info 'Invt' opID=SWI-24c018b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered off <strong>clnPwrOff=true</strong> hostReporting=host-113

In the above scenario the virtual machine was not restarted even though the host was shutdown. I did the exact same exercise again, but only this time I did the shutdown using the vCenter Web Client. After I witnessed the VM being restarted I also noticed a difference in the FDM log:

2014-04-04T12:12:06.515Z [68040B70 info 'Invt' opID=SWI-1aad525b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered on <strong>clnPwrOff=false</strong> hostReporting=host-113

The difference is the power-off state that is reported by vSphere HA. In the first scenario the virtual machine is marked as “clnPwrOff=true” which basically tells vSphere HA that an administrator has powered off the virtual machine, this is what happened when “shutdown” was initiated through the DCUI and hence no restart took place. (It seems that ESXi initiates a shutdown of all running virtual machines.) In the second scenario vSphere HA reported that the VM was not cleanly powered off (“clnPwrOff=false”), and as such it restarted the virtual machine as it assumed something bad had happened to it.

So what did we learn? If you, for whatever reason, want vSphere HA to restart your virtual machines which are currently running on a host that you want to shutdown, make sure that you use the vCenter Web Client instead of the DCUI!

Disclaimer: my tests were conducted using vSphere 5.5 Update 1. I believe that at some point in the past “shutdown” via the DCUI would also allow HA to restart the VMs. I am now investigating why this has changed and when. When I find out I will update this post.

30K for a VSAN host @theregister? I can configure one for 2250 USD!

Duncan Epping · Mar 31, 2014 ·

I’ve been following the posts from the Register on VSAN and was surprised when they posted the cost of the hosts they configured: 30K each. With 3 at a minimum they concluded that for 90K you could buy yourself a nice legacy storage system. I don’t disagree with that to be honest… for 90K you can buy a nice legacy storage system. I guess you need to ask yourself first though what you will do with that 90K storage system by itself? Not much indeed, as you would need compute resources sitting next to it in order to do anything. So if you want to make a comparison, do not compare a full VSAN environment (or any other hyper-converged solution out there) to just a storage system at it just doesn’t make sense.

Now that still doesn’t make these hosts cheap I can hear you think, and again I agree with that. Although I have absolutely no clue where the 30K came from, and judging by the tweets this morning most people don’t know and feel it probably was overkill. Call me crazy, but I can configure a fully supported VSAN configuration for about 2250 USD (just HW) on the Dell website.

Dell T320
Intel Xeon E5-2420 1.90GHz 6 Core
Perc H310 Disk Controller
32GB Memory
1 x 7200RPM 1TB NL-SAS
1 x 100GB Intel S3700 SSD (or dell equal drive)
5 x 1GbE NIC Port

I would like to conclude that VSAN would be a lot cheaper than those legacy solutions, less than 7500 USD for 3 hosts is peanuts right?!? Yes I know, the above configuration wouldn’t fit many use cases (except for maybe a ROBO deployment where only a couple of VMs are needed) and that was the whole point of the exercise showing how pointless these exercises can be. You can twist these numbers anyway you like, and you can configure your VSAN hosts any way you like as long as the components (HDD/SSD/Controller) are on the VSAN HCL and the system is on the vSphere HCL. PS: Dear Register, next time you run through the exercise, you may want to post the configuration you selected… It makes things a bit clearer.