ESXi DCUI Shutdown vs vCenter Shutdown of a host

Today on the community forums someone mentioned he had shutdown his host and that he expected vSphere HA to restart his virtual machines. For whatever reason he got in a situation where all of his VMs were still running but he couldn’t do much anymore with them and as such he wanted to kill the host so that HA could safely restart the virtual machines. However when he shutdown his host nothing happened, the VMs remained powered off. Why did this happen?

I had seen this before in the past, but it never really sunk in until I saw the questions from this customer. I figured I would test it just to see what happened and if I could spot a difference in the vSphere HA logs. I powered on a VM on one of my hosts and moved off all other VMs. I then went to the DCUI of the host and gave a “shutdown” using F12. I tailed the FDM log on one of my hosts and spotted the following log message:

2014-04-04T11:41:54.882Z [688C2B70 info 'Invt' opID=SWI-24c018b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered off clnPwrOff=true hostReporting=host-113

In the above scenario the virtual machine was not restarted even though the host was shutdown. I did the exact same exercise again, but only this time I did the shutdown using the vCenter Web Client. After I witnessed the VM being restarted I also noticed a difference in the FDM log:

2014-04-04T12:12:06.515Z [68040B70 info 'Invt' opID=SWI-1aad525b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered on clnPwrOff=false hostReporting=host-113

The difference is the power-off state that is reported by vSphere HA. In the first scenario the virtual machine is marked as “clnPwrOff=true” which basically tells vSphere HA that an administrator has powered off the virtual machine, this is what happened when “shutdown” was initiated through the DCUI and hence no restart took place. (It seems that ESXi initiates a shutdown of all running virtual machines.) In the second scenario vSphere HA reported that the VM was not cleanly powered off (“clnPwrOff=false”), and as such it restarted the virtual machine as it assumed something bad had happened to it.

So what did we learn? If you, for whatever reason, want vSphere HA to restart your virtual machines which are currently running on a host that you want to shutdown, make sure that you use the vCenter Web Client instead of the DCUI!

Disclaimer: my tests were conducted using vSphere 5.5 Update 1. I believe that at some point in the past “shutdown” via the DCUI would also allow HA to restart the VMs. I am now investigating why this has changed and when. When I find out I will update this post.

Startup News Flash part 16

Number 16 of the Startup News Flash, here we go:

Nakivo just announced the beta program for 4.0 of their backup/replication solution. It adds some new features like: recovery of Exchange objects directly from compressed and deduplicated VM backups, Exchange logs truncation, and automated backup verification. If you are interested in testing it, make sure to sign up here. I haven’t tried it, but they seem to be a strong upcoming player in the backup and DR space for SMB.

SanDisk announced a new range of SATA SSDs called “cloudspeed”. They released 4 different models with various endurance levels and workload targets, of course ranging in sizes from 100GB up to 960GB depending on the endurance level selected. Endurance level ranges from 1 up to 10 full drive writes per day. (Just as an FYI, for VSAN we recommend 5 full drive writes per day as a minimum) Performance numbers range between 15k to 20k write IOps and 75 to 88K read IOps. More details can be found in the spec sheet here. What interest me most is the FlashGuard Technology that is included, interesting how SanDisk is capable of understanding wear patterns and workloads to a certain extend and place data in a specific way to prolong the life of your flash device.

CloudPhysics announced the availability of their Storage Analytics card. I gave it a try last week and was impressed. I was planning on doing a write up on their new offering but as various bloggers already covered it I felt there was no point in repeating what they said. I think it makes a lot more sense to just try it out, I am sure you will like it as it will show you valuable info like “performance” and the impact of “thin disks” vs “thick disks”. Sign up here for a 30day free trial!

Don’t create a Frankencluster just because you can…

In the last couple of weeks I have had various discussions around creating imbalanced clusters. Imbalanced from either CPU, memory and even a storage point of view. This typically comes up in discussions where either someone wants to bring larger scale to their cluster and they want to add hosts with more resources of any of the before mentioned types. Or also when licensing costs need to be limited and people want to restrict certain VMs to run a specific set of hosts. Something that comes up often when people are starting to look at virtualizing Oracle. (Andrew Mitchell published this excellent article on the topic of Oracle Licensing and soft vs hard partitioning which is worth reading!)

Why am I not a fan of imbalanced clusters when it comes to compute or storage resources? Why am I not a fan of crippling your environment purposely to ensure your VMs will only run on a subset of vSphere hosts? The reason is simple, the problems I have seen and experienced and the inefficiency in certain scenarios. Lets look at some examples:

Lets assume I have 4 hosts with each 128GB of memory. I need more memory in my cluster and I add a host with 256GB of memory. Now you just went from 512Gb to 768GB which is a huge increase. However, this is only true when you don’t do any form of admission control and resource management. When you do proper resource management or admission control than you would need to make sure that all of your virtual machines can run in the case of a failure, and preferably run with equal performance before and after the failure has occured. If you added 256GB of memory and this is being used and that host containing 256GB goes down your virtual machines could potentially be impacted. They might not restart, and if they restart they may not get the same amount of resources as they received before the failure. This scenario also applies to CPU, if you create an imbalance .

Another one I encountered recently was presenting a LUN to a limited set of hosts, in this case a LUN was only presented to 2 hosts out of the 20 hosts in that cluster… Guess what, when those two hosts die… so do your VMs. Not optimal right when they are running an Oracle database for instance. On top of that I have seen people pitching a VSAN cluster of 16 nodes with only 3 hosts contributing storage. Yes you can do that, but again… when things go bad, they will go horribly bad. Just imagine 1 host fails, how will you rebuild your components that were impacted? What is the performance impact? Very difficult to predict how it will impact your workload, so just keep it simple. Sure there is a cost overhead associated with separating workloads and creating dedicated clusters, but it will be easier to manage and more predictable in failure scenarios.

I guess in summary: If you want predictability in terms of availability and recoverability of your virtual machines go for a balanced environment, don’t create a Frankencluster!

vSphere HA and VMs per Datastore limit!

I felt I would need to get this out there, as it is not something many seem to be aware off . More and more people are starting to use storage solutions which offer 1 large shared datastore, examples are solutions like Virtual SAN, Tintri and Nutanix. I have seen various folks saying: unlimited number of VMs per datastore, but of course there are limits to everything! If you are planning to build a big cluster (HA enabled), keep in mind that per cluster your limit for a datastore is 2048 powered-on virtual machines! Say what? Yes that is right, per cluster you are limited to 2048 powered-on VMs on a single datastore. This is documented in the Max Config Guide of both vSphere 5.5 and vSphere 5.1. Please note it says datastore and not VMFS or NFS explicitly, this applies to both!

The reason for this today is the vSphere HA poweron list. I described that list in this article, in short: this list keeps track of the power-state of your virtual machines If you need more VMs in your cluster than 2048 you will need to create multiple datastores for now. (More details in the blog post) Do note that this is a known limitation and I have been told that the engineering team is researching a solution to this problem. Hopefully it will be in one of the upcoming releases.

If only the world of applications was as simple as that…

In the last 6-12 months I have heard many people making statements around how the application landscape should change. Whether it is a VC writing on GigaOm about how server CPU utilization is still low and how Linux Containers and new application architectures can solve this. (There are various reasons CPU util is low by the way, ranging from being memory constraints to storage perf constraints and by design) Or whether it is network administrators simply telling the business that their application will need to cope with a new IP-Addresses after a disaster recovery fail-over situation. Although I can see where they are coming from I don’t think the world of applications is as simple as that.

Sometimes it seems that we forget something which is fundamental to what we do.  IT as it exists today provides a service to its customers. It provides an infrastructure which “our / your customers” can consume. This infrastructure, to a certain point, should be architected based on the requirements of your customer. Why? What is the point of creating a foundation which doesn’t meet the requirements of what will be put on top of it? That is like building a skyscraper on top of the foundation of a house, bad things are bound to happen! Although I can understand why we feel our customers should change I do not think it is realistic to expect this will happen over night. Or even if it is realistic to ask them to change?

Just to give an example, and I am not trying to pick on anyone here, lets take this quote from the GigaOm article:

Server virtualization was supposed to make utilization rates go up. But utilization is still low and solutions to solve that will change the way the data center operates.

I agree that Server Virtualization promised to make the utilization rates go up, and they did indeed. And overall utilization may still be low, although it depends on who you talk to I guess or what you include in your numbers. Many of the customers I talk to are up to around 40-50% utilized from a CPU perspective, and they do not want to go higher than that and have their reasons for it. Was utilization the only reason for them to start virtualizing? I would strongly argue that it was not the only reason, there are many others! Reducing the number of physical servers to manage, availability (HA) of their workloads, transportability of their workloads, automation of deployments, disaster recovery, maintenance… the reasons are almost countless.

I guess you will need to ask yourself what all of these reasons have in common? They are non-disruptive to the application architecture! Yes there is the odd application that cannot be virtualized, but the majority of all X86 workloads can be, without the need to make any changes to the application! Clearly you would have to talk to the app owner as their app is being migrated to a different platform, but there will be hardly any work for them associated with this migration.

Oh I agree, everything would be a lot better when the application landscape was completely overhauled and magically all applications use a highly available and scalable distributed application framework. Everything would be a lot better when all applications magically were optimized for the infrastructure they are consuming, applications can handle instant ip-address changes, applications can deal with random physical servers disappearing. Reality unfortunately is that that is not the case today, and for many of our customers in the upcoming years. Re-architecting an application, which often for most app owners comes from a 3rd party, is not something that happens overnight. Projects like those take years, if they even successfully finish.

Although I agree with the conclusion drawn by the author of the article, I think there is a twist to it:

It’s a dynamic time in the data center. New hardware and infrastructure software options are coming to market in the next few years which are poised to shake up the existing technology stack. Should be an exciting ride.

Reality is that we deliver a service, a service that caters for the needs of our customers. If our customers are not ready, or even willing, to adapt this will not just be a hurdle but a showstopper for many of these new technologies. Being a disruptive (I’m not a fan of the word) technology is one thing, causing disruption is another.

Disable “Disk.AutoremoveOnPDL” in a vMSC environment!

Last week I tweeted the recommendation to disable the advanced setting Disk.AutoremoveOnPDL in a vSphere 5.5 vMSC environment:

Based on this tweet I received a whole bunch of questions. Before I explain why I want to point out that I have contacted the folks in charge of the vMSC program and have requested them to publish a KB article asap on this subject.

With vSphere 5.5 a new setting was introduced called “Disk.AutoremoveOnPDL”. When you install 5.5 this setting is set to 1 which means it is enabled. What it does is the following:

The host automatically removes the PDL device and all paths to the device if no open connections to the device exist, or after the last connection closes. If the device returns from the PDL condition, the host can discover it, but treats it as a new device. Data consistency for virtual machines on the recovered device is not guaranteed.

(Source: http://pubs.vmware.com/vsphere-55/index.jsp?topic=%2Fcom.vmware.vsphere.storage.doc%2FGUID-45CF28F0-87B1-403B-B012-25E7097E6BDF.html)

In a vMSC environment you can understand that removing devices which are in a PDL state is not desired. As when the issue that caused the PDL has been solved (from a networking or array perspective) customers would expect the LUNs to automatically appear again. However, as they have been removed a “rescan” is needed to show these devices again instantly, or you will need to wait for the vSphere periodic path evaluation to occur. As you can imagine, in a vSphere Metro Storage Cluster environment (stretched storage) you expect devices to always be there instantly on recovery… even when they are in a PDL or APD state they should be available instantly when the situation has been resolved.

For now, I recommend to set Disk.AutoremoveOnPDL to 0 instead of the default of 1:

Hopefully soon this KB on the topic of Disk.AutoremoveOnPDL will be updated to reflect this.