Number of vSphere HA heartbeat datastores less than 2 error, while having more?

Last week on twitter someone mentioned he received the error that he had less than two vSphere HA heartbeat datastores configured. I wrote an article about this error a while back so I asked him if he had two or more. This was the case, so next thing to do was to “reconfigure for HA” to clear the message hopefully.

The number of vSphere HA heartbeat datastores for this host is 1 which is less than required 2

Unfortunately after reconfiguring for HA the error was still there, next suggestion was looking at the “heartbeat datastore” section in HA. For whatever reason HA was configured to “Select only from my preferred datastores” and no datastores were selected just like in the screenshot below. HA does not override this so when configured like this NO heartbeat datastores are used, resulting in this error within vCenter. Luckily the fix is easy, just set it to “Select any of the cluster datastores”.

the number of heartbeat datastores for host is 1

the number of heartbeat datastores for host is 1

Is flash the saviour of Software Defined Storage?

I have this search column open on twitter with the term “software defined storage”. One thing that kept popping up in the last couple of days was a tweet from various IBM people around how SDS will change flash. Or let me quote the tweet:

What does software-defined storage mean for the future of #flash?

It is part of a twitter chat scheduled for today, initiated by IBM. It might be just me misreading the tweets or the IBM folks look at SDS and flash in a completely different way than I do. Yes SDS is a nice buzzword these days. I guess with the billion dollar investment in flash IBM has announced they are going all-in with regards to marketing. If you ask me they should have flipped it and the tweet should have stated: “What does flash mean for the future of Software Defined Storage?” Or to make it even sound more marketing is flash the saviour of Software Defined Storage?

Flash is a disruptive technology, and changing the way we architect our datacenters. Not only did it already allow many storage vendors to introduce additional tiers of storage it also allowed them to add an additional layer of caching in their storage devices. Some vendors even created all flash based storage systems offering thousands of IOps (some will claim millions), performance issues are a thing of the past with those devices. On top of that host local flash is the enabler of scale-out virtual storage appliances. Without flash those type of solutions would not be possible, well at least not with a decent performance.

Since a couple of years host side flash is also becoming more common. Especially since several companies jumped in to the huge gap there was and started offering caching solutions for virtualized infrastructures. These solutions allow companies who cannot move to hybrid or all-flash solutions to increase the performance of their virtual infrastructure without changing their storage platform. Basically what these solutions do is make a distinction between “data at rest” and “data in motion”. Data in motion should reside in cache, if configured properly, and data in rest should reside on your array. These solutions once again will change the way we architect our datacenters. They provide a significant performance increase removing many of the performance constraints linked to traditional storage systems; your storage system can once again focus on what it is good at… storing data / capacity / resiliency.

I think I have answered the questions, but for those who have difficulties reading between the lines, how does flash change the future of software defined storage? Flash is the enabler of many new storage devices and solutions. Be it a virtual storage appliance in a converged stack, an all-flash array, or host-side IO accelerators. Through flash new opportunities arise, new options for virtualizing existing (I/O intensive) workloads. With it many new storage solutions were developed from the ground up. Storage solutions that run on standard x86 hardware, storage solutions with tight integration with the various platforms, solutions which offer things like end-to-end QoS capabilities and a multitude of data services. These solutions can change your datacenter strategy; be a part of your software defined storage strategy to take that next step forward in optimizing your operational efficiency.

Although flash is not a must for a software defined storage strategy, I would say that it is here to stay and that it is a driving force behind many software defined storage solutions!

vSphere HA – VM Monitoring sensitivity

Last week there was a question on VMTN about VM Monitoring sensitivity. I could have sworn I did an article on that exact topic, but I couldn’t find it. I figured I would do a new one with a table explaining the levels of sensitivity that you can configure VM Monitoring to.

The question that was asked was based on a false positive response of VM Monitoring, in this case the virtual machine was frozen due to the consolidation of snapshots and VM Monitoring responded by restarting the virtual machine. As you can imagine the admin wasn’t too impressed as it caused downtime for his virtual machine. He wanted to know how to prevent this from happening. The answer was simple, change the sensitivity as it is set to “high” by default.

As shown in the table high sensitivity means that VM Monitoring responds to missing “VMware Tools heartbeat” within 30 seconds. However, before VM Monitoring restarts the VM though it will check if their was any storage or networking I/O for the last 120 seconds (advanced setting: das.iostatsInterval). If the answer is no to both, the VM will be restarted. So if you feel VM Monitoring is too aggressive, change it accordingly!

Sensitivity Failure Interval Max Failures Max Failures Time window
Low 120 seconds 3 7 days
Medium 60 seconds 3 24 hours
High 30 seconds 3 1 hour

Do note that you can change the above settings individually as well in the UI, as seen in the screenshot below. For instance you could manually increase the failure interval to 240 seconds. How you should configure it is something I cannot answer, it should be based on what you feel is an acceptable response time to a failure. Also, what is the sweet spot to avoid a false positive… A lot to think about indeed when introducing VM Monitoring.

vSphere 5.1 Storage DRS Interoperability

A while back I did this article on Storage DRS Interoperability. I had questions last week about this so I figured I would write a new article which reflects the current state (vSphere 5.1). I also included some details that are part of the interoperability white paper Frank and I did so that we have a fairly complete picture. This white paper is on 5.0, it will probably be updated at some point in the future.

The first column describes the feature or functionality, the second column the recommended or supported automation mode and the third and fourth column show which type of balancing is supported.

Capability Automation Mode Space Balancing I/O Metric Balancing
Array-based Snapshots Manual Yes Yes
Array-based Deduplication Manual Yes Yes
Array-based Thin provisioning Manual Yes Yes
Array-based Auto-Tiering Manual Yes No
Array-based Replication Manual Yes Yes
vSphere Raw Device Mappings Fully Automated Yes Yes
vSphere Replication Fully Automated Yes Yes
vSphere Snapshots Fully Automated Yes Yes
vSphere Thin provisioned disks Fully Automated Yes Yes
vSphere Linked Clones Fully Automated (*) Yes Yes
vSphere Storage Metro Clustering Manual Yes Yes
vSphere Site Recovery Manager Not supported n/a n/a
VMware vCloud Director Fully Automated (*) Yes Yes
VMware View (Linked Clones) Not Supported n/a n/a
VMware View (Full Clones) Fully Automated Yes Yes

(*) = Change from 5.0

What is static overhead memory?

We had a discussion internally on static overhead memory. Coincidentally I spoke with Aashish Parikh from the DRS team on this topic a couple of weeks ago when I was in Palo Alto. Aashish is working on improving the overhead memory estimation calculation so that both HA and DRS can be even more efficient when it comes to placing virtual machines. The question was around what determines the static memory and this is the answer that Aashish provided. I found it very useful hence the reason I asked Aashish if it was okay to share it with the world. I added some bits and pieces where I felt additional details were needed though.

First of all, what is static overhead and what is dynamic overhead:

  • When a VM is powered-off, the amount of overhead memory required to power it on is called static overhead memory.
  • Once a VM is powered-on, the amount of overhead memory required to keep it running is called dynamic or runtime overhead memory.

Static overhead memory of a VM depends upon various factors:

  1. Several virtual machine configuration parameters like the number vCPUs, amount of vRAM, number of devices, etc
  2. The enabling/disabling of various VMware features (FT, CBRC; etc)
  3. ESXi Build Number

Note that static overhead memory estimation is calculated fairly conservative and we take a worst-case-scenario in to account. This is the reason why engineering is exploring ways of improving it. One of the areas that can be improved is for instance including host configuration parameters. These parameters are things like CPU model, family & stepping, various CPUID bits, etc. This means that as a result, two similar VMs residing on different hosts would have different overhead values.

But what about Dynamic? Dynamic overhead seems to be more accurate today right? Well there is a good reason for it, with dynamic overhead it is “known” where the host is running and the cost of running the VM on that host can easily be calculated. It is not a matter of estimating it any longer, but a matter of doing the math. That is the big difference: Dynamic = VM is running and we know where versus Static = VM is powered off and we don’t know where it might be powered!

Same applies for instance to vMotion scenarios. Although the platform knows what the target destination will be; it still doesn’t know how the target will treat that virtual machine. As such the vMotion process aims to be conservative and uses static overhead memory instead of dynamic. One of the things or instance that changes the amount of overhead memory needed is the “monitor mode” used (BT, HV or HWMMU).

So what is being explored to improve it? First of all including the additional host side parameters as mentioned above. But secondly, but equally important, based on the vm -> “target host” combination the overhead memory should be calculated. Or as engineering calls it calculating “Static overhead of VM v on Host h”.

Now why is this important? When is static overhead memory used? Static overhead memory is used by both HA and DRS. HA for instance uses it with Admission Control when doing the calculations around how many VMs can be powered on before unreserved resources are depleted. When you power-on a virtual machine the host side “admission control” will validate if it has sufficient unreserved resource available for the “static memory overhead” to be guaranteed… But also DRS and vMotion use the static memory overhead metric, for instance to ensure a virtual machine can be placed on a target host during a vMotion process as the static memory overhead needs to be guaranteed.

As you can see, a fairly lengthy chunk of info on just a single simple metric in vCenter / ESXTOP… but very nice to know!

Limit a VM from an IOps perspective

Last couple of weeks I heard people either asking questions around how tot limit a VM from an IOps perspective or making comments that Storage IO Control (SIOC) allows you to limit VMs. As I pointed at least three folks to this info I figured I would share it publicly.

There is an IOps limit setting on the virtual disk as an option… This is what allows you to limit a virtual machine / virtual disk to a specific amount of IOps. Now it should be noted that when you set this limit this is handled (vSphere 5.1 and prior) by the local host scheduler, also known as SFQ. One thing to realize though is that when you set a limit on multiple virtual disks for a virtual machine is that all of these limits will be added up and that will be your threshold. In other words:

  • Disk01 – 50 IOps limit
  • Disk02 – 200 IOps limit
  • Combined total: 250 IOps limit
  • If Disk01 only uses 5 IOps then Disk02 can use 245 IOps!

There is one caveat though, “combined total” only goes for the disks which are stored on the same datastore. So if you have 4 disks and they are stored across 4 datastores then each of the individual limits apply respectively.

More details can be found in this KB article: http://kb.vmware.com/kb/1038241

Guaranteeing availability through admission control, chip in!

I have been having these discussions with our engineering teams for the last year around guaranteed restarts of virtual machines in a cluster. In the current shape / form we use Admission Control to guarantee virtual machines are restarted. Today Admission Control is all about guaranteeing virtual machine restarts by keeping track of Memory and CPU resource reservations, but you can imagine that in the Software Defined Datacenter this could be expanded with for instance storage or networking reservation.

Now why am I having these discussions, what is the problem with Admission Control today? Well first of all it is the perception that many appear to have of Admission Control. Many believe the Admission Control algorithm uses “used” resources. Reality however is that Admission Control is not that flexible, it uses resource reservations and as you know this is static. So what is the result of using reservations?

By using reservations for “admission control” vSphere HA has a simple way of guaranteeing a restart is possible at all times. Simply because it checks if sufficient “unreserved resources” are available and if so it allows the virtual machine to be powered-on. If not, then it won’t allow the power-on just to ensure that all virtual machines can be restarted in case of a failure. But what is the problem? Although we guarantee a restart we do not guarantee any type of performance after the restart! Unless, unless of course you are setting your reservations equal to what you provisioned… but I don’t know anyone doing this as it eliminates any form of overcommitment and will result in an increase of cost and a decrease in flexibility.

So that is the problem. Question is – what should we do about it? We (the engineering teams and I) would like to hear from YOU.

  • What would you like admission control to be?
  • What guarantees do you want HA to provide?
  • After a failure, what criteria should HA apply in deciding which VMs to restart?

One idea we have been discussing is to have Admission Control use something like “used” resources… or for instance an “average of resources used” per virtual machine. What if you could say: I want to ensure that my virtual machines always get at least 80% of what they use on average? If so, what should HA do when there are not enough resources to meet the 80% demand of all VMs? Power on some of the VMs? Power on all with reduced share values?

Also, something we have discussed is having vCenter show how many resources are used on average taking your high availability N-X setup in to account, which should at least provide an insight around how your VMs (and applications) will perform after a fail-over. Is that something you see value in?

What do you think? Be open and honest, tell us what you think… don’t be scared, we won’t be bite, we are open for all suggestions.