• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

ha

vSphere 5.0 HA: Application Monitoring intro

Duncan Epping · Aug 11, 2011 ·

I don’t think anyone has blogged about App Monitoring yet so I figured I would do a “what’s new / intro” to App Monitoring in vSphere 5.0. Prior to vSphere 5 App Monitoring could only be leveraged by partners which had access to the SDK/APIs. A handful of partners leveraged those of which probably Symantec’s ApplicationHA is the best example. The “problem” with that though is that you would still need to buy a piece of software while you might have in-house development who could easily bake this into their application… well with vSphere 5 you can. I grabbed one of the latest code drops and started playing around. Note that I am not going to do an extensive article on this. Just showing what you have after installing the package. In my case I installed it on a Windows VM.

Now first of all after installing the package you will have new executable. This executable allows you to control App Monitoring offers without the need to compile a full binary yourself. This new command, vmware-appmonitoring.exe, takes the following arguments, which are not coincidentally similar to the functions I will show in a second:

  • Enable
  • Disable
  • markActive
  • isEnabled
  • getAppStatus

When running the command the following output is presented:

C:\VMware-GuestAppMonitorSDK\bin\win32>vmware-appmonitor.exe
Usage: vmware-appmonitor.exe {enable | disable | markActive | isEnabled | getApp Status}

Now I guess most parameters speak for itself. “Enable” will allow you to switch on App Monitoring and “Disable” turns it off again. “IsEnabled” will give you the current status, is it on or off? The “getAppStatus” tells what the status is of your app, is it healthy and has it been sending heartbeats regularly, well than the result will be green if there is a real issue than it will be red. (There’s also gray which means HA just picked up on the VM it’s status needs to be cleared and monitoring should be started soon) Now the one that is most important is “markActive”. This parameter needs to be called at least every 30 seconds. This is the heartbeat parameter. In other words “markActive” is what informs HA that the application is still alive!

I am sure that as soon as William Lam gets his hands on the package he will go wild and release a bunch of scripts which will allow you to enhance resiliency for application/service. These parameters can also be used by your development team, but in the form of a function. The Application Awareness API allows for anyone to talk to it using different types of languages like C++ and Java for instance. Currently there are 6 functions defined:

  • VMGuestAppMonitor_Enable()
    Enables Monitoring
  • VMGuestAppMonitor_MarkActive()
    Mark application as active, recommend to call this at least every 30 seconds
  • VMGuestAppMonitor_Disable()
    Disable Monitoring
  • VMGuestAppMonitor_IsEnabled()
    Returns status of Monitoring
  • VMGuestAppMonitor_GetAppStatus()
    Returns the current application status recorded for the application
  • VMGuestAppMonitor_Free()
    Frees the result of the VMGuestAppMonitor_GetAppStatus() call

These functions could be used by your development team to enhance resiliency in a simple way. This is just the start however, I personally would like to see some sort of rolling patch process added on top and for instance the ability to restart service or have a partial VM failure. Or even the hint the hypervisor that there is a partial failure and request a vMotion to a different host to validate if that solves the problem… If you feel there’s something that needs to be added to App Monitoring let me know and I’ll make sure the PM/Dev Team reads this thread.

** disclaimer: some of this info was taken from the vSphere 5.0 Technical Deepdive book **

vSphere 5 Coverage

Duncan Epping · Aug 6, 2011 ·

I just read Eric’s article about all the topics he covered around vSphere 5 over the last couple of weeks and as I just published the last article I had prepared I figured it would make sense to post something similar. (Great job by  the way Eric, I always enjoy reading your articles and watching your videos!) Although I did hit roughly 10.000 unique views on average per day the first week after the launch and still 7000 a day currently I have the feeling that many were focused on the licensing changes rather then all the new and exciting features that were coming up, but now that the dust has somewhat settled it makes sense to re-emphasize them. Over the last 6 months I have been working with vSphere 5 and explored these features, my focus for most of those 6 months was to complete the book but of course I wrote a large amount of articles along the way, many of which ended up in the book in some shape or form. This is the list of articles I published. If you feel there is anything that I left out that should have been covered let me know and I will try to dive in to it. I can’t make any promises though as with VMworld coming up my time is limited.

  1. Live Blog: Raising The Bar, Part V
  2. 5 is the magic number
  3. Hot of the press: vSphere 5.0 Clustering Technical Deepdive
  4. vSphere 5.0: Storage DRS introduction
  5. vSphere 5.0: What has changed for VMFS?
  6. vSphere 5.0: Storage vMotion and the Mirror Driver
  7. Punch Zeros
  8. Storage DRS interoperability
  9. vSphere 5.0: UNMAP (vaai feature)
  10. vSphere 5.0: ESXCLI
  11. ESXi 5: Suppressing the local/remote shell warning
  12. Testing VM Monitoring with vSphere 5.0
  13. What’s new?
  14. vSphere 5:0 vMotion Enhancements
  15. vSphere 5.0: vMotion enhancement, tiny but very welcome!
  16. ESXi 5.0 and Scripted Installs
  17. vSphere 5.0: Storage initiatives
  18. Scale Up/Out and impact of vRAM?!? (part 2)
  19. HA Architecture Series – FDM (1/5)
  20. HA Architecture Series – Primary nodes? (2/5)
  21. HA Architecture Series – Datastore Heartbeating (3/5)
  22. HA Architecture Series – Restarting VMs (4/5)
  23. HA Architecture Series – Advanced Settings (5/5)
  24. VMFS-5 LUN  Sizing
  25. vSphere 5.0 HA: Changes in admission control
  26. vSphere 5 – Metro vMotion
  27. SDRS and Auto-Tiering solutions – The Injector

Once again if there it something you feel I should be covering let me know and I’ll try to dig in to it. Preferably something that none of the other blogs have published of course.

vSphere 5.0 HA: Changes in admission control

Duncan Epping · Aug 3, 2011 ·

I just wanted to point out a couple of changes for HA in vSphere 5.0 with regards to admission control. Although they might seem minor they are important to keep in mind when redesigning your environment. Lets just discuss each of the admission control policies and list the changes underneath.

  • Host failures cluster tolerates
    Still uses the slot algorithm. Major change here is that you can have a value larger than 4 hosts. The 4 host limit was imposed by the Primary/Secondary node concept. As this constraint has been lifted it is now possible to select a value up to 31. So in the case of a 16 host cluster you can set the value to 15. (Yes you could even set it to 31 as the UI doesn’t limit you but that wouldn’t make sense would it…) Another change is the default slotsize for CPU. The default slotsize used to be 256MHz. This has been decreased to 32MHz.
  • Percentage as cluster resources reserved
    This admission control policy has been overhauled and it is now possible to select a percentage for both CPU and Memory separately. In other words you can set CPU to 30% and Memory to 25%. The algorithm hasn’t changed and this is still my preferred admission control policy!
  • Specify Failover host
    Allows you to select multiple hosts instead of just 1. So for instance in an 8 host cluster you can specify two as designated failover hosts. These hosts will not be used during normal operations, keep this in mind!

For more details on admission control I would like to refer to the HA deepdive (not updated to 5.0 yet) or my book on vSphere 5.0 Clustering which contains many examples of how to correctly set the percentage for instance.

VMworld Session: vSphere Clustering Q&A

Duncan Epping · Aug 1, 2011 ·

We need your help for our VMworld session “VSP1682 – vSphere Clustering Q&A”. In order to ensure we can fill up the full 60 minutes we want to have a couple of questions ready in case no one in the audience has a question. Although I doubt that will be the case, it is better to be prepared than to stare at each-other for 50 minutes. So please help us out and submit some questions about HA, DRS and/or Storage DRS.

Our session is on Monday morning at 08:00 so if you haven’t yet, register today. By the way, Frank has another session which is DRS/Resource Management Deepdive… definitely worth attending, it is VSP3116 and on Monday at 11:30 and Thursday at 10:30 (sold out). Make sure to attend one of those. I’ve seen a preview of the slidedeck and it will be worth it. Another E P I C session will be VSP1956 on Monday at 13:00. It is the ESXi Quiz, yes… Death to Powerpoint. At this session you will see vExperts taking on VMware employees and in a knowledge quiz!

HA Architecture Series – Advanced Settings (5/5)

Duncan Epping · Jul 28, 2011 ·

When doing some research for the vSphere Clustering Technical Deepdive book I stumbled across something which was very surprising and difficult to grasp at first. I figured explaining it in a short article was the best approach. Many of  you have read the HA deepdive article or the book and know that das.failuredetectiontime is probably the most commonly used advanced setting when configuring HA. There have been all sorts of recommendations and best practices flying around of which many were blatantly confusing to be honest. As stated in the previous article das.failuredetectiontime was no longer needed and has been deprecated. Did anything else change from an advanced settings perspective? Have advanced settings been added or removed. Here the new list:

  • das.ignoreInsufficientHbDatastore – 5.0 only
    Suppress the host config issue that the number of heartbeat datastores is less than das.heartbeatDsPerHost. Default value is “false”. Can be configured as “true” or “false”.
  • das.heartbeatDsPerHost – 5.0 only
    The number of required heartbeat datastores per host. The default value is 2; value should be between 2 and 5.
  • das.failuredetectiontime – 4.1 and prior
    Number of milliseconds, timeout time, for isolation response action (with a default of 15000 milliseconds). Pre-vSphere 4.0 it was a general best practice to increase the value to 60000 when an active/standby Service Console setup was used. This is no longer needed. For a host with two Service Consoles or a secondary isolation address a failuredetection time of 15000 is recommended.
  • das.isolationaddress[x] – 5.0 and prior
    IP address the ESX hosts uses to check on isolation when no heartbeats are received, where [x] = 0‐9. (see screenshot below for an example) VMware HA will use the default gateway as an isolation address and the provided value as an additional checkpoint. I recommend to add an isolation address when a secondary service console is being used for redundancy purposes.
  • das.usedefaultisolationaddress – 5.0 and prior
    Value can be “true” or “false” and needs to be set to false in case the default gateway, which is the default isolation address, should not or cannot be used for this purpose. In other words, if the default gateway is a non-pingable address, set the “das.isolationaddress0” to a pingable address and disable the usage of the default gateway by setting this to “false”.
  • das.isolationShutdownTimeout – 5.0 and prior
    Time in seconds to wait for a VM to become powered off after initiating a guest shutdown, before forcing a power off.
  • das.allowNetwork[x] – 5.0 and prior
    Enables the use of port group names to control the networks used for VMware HA, where [x] = 0 – ?. You can set the value to be ʺService Console 2ʺ or ʺManagement Networkʺ to use (only) the networks associated with those port group names in the networking configuration.
  • das.bypassNetCompatCheck – 4.1 and prior
    Disable the “compatible network” check for HA that was introduced with ESX 3.5 Update 2. Disabling this check will enable HA to be configured in a cluster which contains hosts in different subnets, so-called incompatible networks. Default value is “false”; setting it to “true” disables the check.
  • das.ignoreRedundantNetWarning – 5.0 and prior
    Remove the error icon/message from your vCenter when you don’t have a redundant Service Console connection. Default value is “false”, setting it to “true” will disable the warning. HA must be reconfigured after setting the option.
  • das.vmMemoryMinMB – 5.0 and prior
    The minimum default slot size used for calculating failover capacity. Higher values will reserve more space for failovers. Do not confuse with “das.slotMemInMB”.
  • das.slotMemInMB – 5.0 and prior
    Sets the slot size for memory to the specified value. This advanced setting can be used when a virtual machine with a large memory reservation skews the slot size, as this will typically result in an artificially conservative number of available slots.
  • das.vmCpuMinMHz – 5.0 and prior
    The minimum default slot size used for calculating failover capacity. Higher values will reserve more space for failovers. Do not confuse with “das.slotCpuInMHz”.
  • das.slotCpuInMHz – 5.0 and prior
    Sets the slot size for CPU to the specified value. This advanced setting can be used when a virtual machine with a large CPU reservation skews the slot size, as this will typically result in an artificially conservative number of available slots.
  • das.sensorPollingFreq – 4.1 and prior
    Set the time interval for HA status updates. As of vSphere 4.1, the default value of this setting is 10. It can be configured between 1 and 30, but it is not recommended to decrease this value as it might lead to less scalability due to the overhead of the status updates.
  • das.perHostConcurrentFailoversLimit – 5.0 and prior
    By default, HA will issue up to 32 concurrent VM power-ons per host. This setting controls the maximum number of concurrent restarts on a single host. Setting a larger value will allow more VMs to be restarted concurrently but will also increase the average latency to recover as it adds more stress on the hosts and storage.
  • das.config.log.maxFileNum – 5.0 only
    Desired number of log rotations.
  • das.config.log.maxFileSize – 5.0 only
    Maximum file size in bytes of the log file.
  • das.config.log.directory – 5.0 only
    Full directory path used to store log files.
  • das.maxFtVmsPerHost – 5.0 and prior
    The maximum number of primary and secondary FT virtual machines that can be placed on a single host. The default value is 4.
  • das.iostatsinterval (VM Monitoring) – 5.0 and prior
    The I/O stats interval determines if any disk or network activity has occurred for the virtual machine. The default value is 120 seconds.
  • das.failureInterval (VM Monitoring) – 5.0 and prior
    The polling interval for failures. Default value is 30 seconds.
  • das.minUptime (VM Monitoring) – 5.0 and prior
    The minimum uptime in seconds before VM Monitoring starts polling. The default value is 120 seconds.
  • das.maxFailures (VM Monitoring) – 5.0 and prior
    Maximum number of virtual machine failures within the specified “das.maxFailureWindow”, If this number is reached, VM Monitoring doesn’t restart the virtual machine automatically. Default value is 3.
  • das.maxFailureWindow (VM Monitoring) – 5.0 and prior
    Minimum number of seconds between failures. Default value is 3600 seconds. If a virtual machine fails more than “das.maxFailures” within 3600 seconds, VM Monitoring doesn’t restart the machine.
  • das.vmFailoverEnabled (VM Monitoring) – 5.0 and prior
    If set to “true”, VM Monitoring is enabled. When it is set to “false”, VM Monitoring is disabled.

Please note that this is the full list that I am aware of today, over time I will add / remove where and when applicable.

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 28
  • Page 29
  • Page 30
  • Page 31
  • Page 32
  • Interim pages omitted …
  • Page 54
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Also visit!

For the Dutch-speaking audience, make sure to visit RunNerd.nl to follow my running adventure, read shoe/gear/race reviews, and more!

Do you like Hardcore-Punk music? Follow my Spotify Playlist!

Do you like 80s music? I got you covered!

Copyright Yellow-Bricks.com © 2026 · Log in