5.5

ESXi DCUI Shutdown vs vCenter Shutdown of a host

Duncan Epping · Apr 4, 2014 ·

Today on the community forums someone mentioned he had shutdown his host and that he expected vSphere HA to restart his virtual machines. For whatever reason he got in a situation where all of his VMs were still running but he couldn’t do much anymore with them and as such he wanted to kill the host so that HA could safely restart the virtual machines. However when he shutdown his host nothing happened, the VMs remained powered off. Why did this happen?

I had seen this before in the past, but it never really sunk in until I saw the questions from this customer. I figured I would test it just to see what happened and if I could spot a difference in the vSphere HA logs. I powered on a VM on one of my hosts and moved off all other VMs. I then went to the DCUI of the host and gave a “shutdown” using F12. I tailed the FDM log on one of my hosts and spotted the following log message:

2014-04-04T11:41:54.882Z [688C2B70 info 'Invt' opID=SWI-24c018b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered off <strong>clnPwrOff=true</strong> hostReporting=host-113

In the above scenario the virtual machine was not restarted even though the host was shutdown. I did the exact same exercise again, but only this time I did the shutdown using the vCenter Web Client. After I witnessed the VM being restarted I also noticed a difference in the FDM log:

2014-04-04T12:12:06.515Z [68040B70 info 'Invt' opID=SWI-1aad525b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered on <strong>clnPwrOff=false</strong> hostReporting=host-113

The difference is the power-off state that is reported by vSphere HA. In the first scenario the virtual machine is marked as “clnPwrOff=true” which basically tells vSphere HA that an administrator has powered off the virtual machine, this is what happened when “shutdown” was initiated through the DCUI and hence no restart took place. (It seems that ESXi initiates a shutdown of all running virtual machines.) In the second scenario vSphere HA reported that the VM was not cleanly powered off (“clnPwrOff=false”), and as such it restarted the virtual machine as it assumed something bad had happened to it.

So what did we learn? If you, for whatever reason, want vSphere HA to restart your virtual machines which are currently running on a host that you want to shutdown, make sure that you use the vCenter Web Client instead of the DCUI!

Disclaimer: my tests were conducted using vSphere 5.5 Update 1. I believe that at some point in the past “shutdown” via the DCUI would also allow HA to restart the VMs. I am now investigating why this has changed and when. When I find out I will update this post.

30K for a VSAN host @theregister? I can configure one for 2250 USD!

Duncan Epping · Mar 31, 2014 ·

I’ve been following the posts from the Register on VSAN and was surprised when they posted the cost of the hosts they configured: 30K each. With 3 at a minimum they concluded that for 90K you could buy yourself a nice legacy storage system. I don’t disagree with that to be honest… for 90K you can buy a nice legacy storage system. I guess you need to ask yourself first though what you will do with that 90K storage system by itself? Not much indeed, as you would need compute resources sitting next to it in order to do anything. So if you want to make a comparison, do not compare a full VSAN environment (or any other hyper-converged solution out there) to just a storage system at it just doesn’t make sense.

Now that still doesn’t make these hosts cheap I can hear you think, and again I agree with that. Although I have absolutely no clue where the 30K came from, and judging by the tweets this morning most people don’t know and feel it probably was overkill. Call me crazy, but I can configure a fully supported VSAN configuration for about 2250 USD (just HW) on the Dell website.

Dell T320
Intel Xeon E5-2420 1.90GHz 6 Core
Perc H310 Disk Controller
32GB Memory
1 x 7200RPM 1TB NL-SAS
1 x 100GB Intel S3700 SSD (or dell equal drive)
5 x 1GbE NIC Port

I would like to conclude that VSAN would be a lot cheaper than those legacy solutions, less than 7500 USD for 3 hosts is peanuts right?!? Yes I know, the above configuration wouldn’t fit many use cases (except for maybe a ROBO deployment where only a couple of VMs are needed) and that was the whole point of the exercise showing how pointless these exercises can be. You can twist these numbers anyway you like, and you can configure your VSAN hosts any way you like as long as the components (HDD/SSD/Controller) are on the VSAN HCL and the system is on the vSphere HCL. PS: Dear Register, next time you run through the exercise, you may want to post the configuration you selected… It makes things a bit clearer.

VSAN – Misconfiguration Detected

Duncan Epping · Mar 31, 2014 ·

Although Cormac Hogan already wrote about this I figured I would repeat some of his work. It seems like various folks are hitting this issue where an error is thrown while configuring VSAN: Misconfiguration Detected. The misconfiguration in this case refers to how the physical network has been configured. In order for VSAN to be successfully configured your layer 2 VSAN network will need to be enabled for multicast traffic. (below a screenshot of the error which I borrowed from Cormac… thanks Cormac)

In order to successfully configure VSAN you can do two things, now lets be clear that I am not the networking expert and personally I would always advise to discuss with your networking team what the best option is. Here are your two options:

Enable IGMP Snooping for your VSAN network (VLAN) and define an IGMP Snooping Querier. Default setting on most Cisco switches is IGMP Snooping enabled but without an IGMP Snooping Querier. In this configuration VSAN will not be able to configure correctly!
Disable IGMP Snooping for your VSAN network (VLAN). Please note that you can typically disable IGMP Snooping globally and per VLAN, in this case if you want to disable it… disable it on your VLAN!

Please consult your network vendor documentation on how to do this.

Selecting a disk controller for VSAN using the HCL

Duncan Epping · Mar 27, 2014 ·

As this was completely unclear to me as well and I started a thread on it on our internal social platform I figured I would share this with you. When you go through the exercise of selecting a disk controller for VSAN using the VMware Compatibility Guide (vmwa.re/vsanhcl) you will see that there are 4 “features” listed. The four features describe how you can use your disk controller to manage the disks in your host. This is important as selecting the wrong disk controller could lead to unwanted side effects.

Let me list the four features and explain what they actually mean:

Virtual SAN – SAS
Virtual SAN – SATA
Virtual SAN Pass-Through
Virtual SAN RAID 0

Virtual SAN – SAS / SATA and Pass-through are essentially the same thing. Well not entirely as it is implemented in a different way, but the result is the same. What this does is serving the disks straight up to the hypervisor. This functionality literally passes the disk through to ESXi, and avoids the need to create a RAID set or volume for your disks. This is by far the easiest way to pull your disks in to a VSAN datastore if you ask me.

Virtual SAN RAID 0 means that in order to use the disks you will need to create a single disk RAID 0 set for each disk in your system. The downside is when using this that things like hot-swap will be impossible as your Disk (ID) is bound to the RAID 0 set. However there is also a positive thing, many of these disk controllers support things like encryption of data at rest and if your disks support this you could potentially use this. It should be noted however that as far as I know today this functionality has not been tested (extensively) and support could be an issue. However, I could see why one would want to buy a controller that offer this functionality to be future proof.

Then there is another aspect, I have been asked about this a couple of times already and that is the performance capability of the controller. As far as I have seen the HCL today consists of 3Gbps and 6Gbps controllers. In most cases there is little to no cost difference, so if supported I would always recommend to go with the faster controller. But there is another thing here that is often overlooked and that is the queue depth. Before you pull the trigger and decide to buy controller-A over controller-B you may want to verify what the queue depth is of both of them. In some cases, and especially the cheaper disk controllers, the queue depth is low (32) where others offer 256 and higher. Especially when you are building an environment where a lot of IO is expected these are things to take in to consideration, plus you wouldn’t want to buy a screaming fast SSD and then find out your bottleneck is the queue depth of your disk controller right?

<update>A very good point made by Tom Fenton, if you select a controller and are at the point of rolling out VSAN make sure you validate the firmware and the driver used. If you click on the “Model” you will be able to see those details. This also applies for SSDs and HDDs!</update>

I hope that helps,

VSAN Basics – Changing a VM’s storage policy

Duncan Epping · Mar 22, 2014 ·

I have been talking a lot about the architecture of VSAN and have written many articles. It seems that somehow some of the more basic topics have not been fully addressed yet like changing a VM’s storage policy. One of our field folks had a question from a customer which was based on this video.

The question was how do you change the policy of a single VM? And why would you change the policy for a group of VMs?

Lets answer the “group of VMs” question first. You can imagine setting a policy for VMs that perform a specific function, for instance web servers. It could be that after a period of monitoring you notice that these VMs are not performing as expected when data needs to come from spindles. By changing the policy, as demonstrated in the video, you can simply increase the stripe width for all virtual machines.

Now the question remains, how do I change the policy of a single VM? It is actually really straight forward:

Create a new policy
- Go to VM Storage Policies
- Click “Create a new storage policy”
- Select the capabilities
Now go to your virtual machines and right click VM which needs a new policy
Click on “all vCenter actions”
Click on “VM Storage Policies”
Click on “Manage…”
Select a new policy
Apply to disks
Click “Ok”

Now the new policy will be applied to the VM. Depending on the selected policy this will take a certain amount of time as new components of your objects may need to be created.