• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

BC-DR

HA: Did you know?

Duncan Epping · Sep 20, 2009 ·

Did you know that…

  • the best practice to increase the isolation response time(das.failuredetectiontime) from 15000 to 60000 for an Active/Standby situation for your service console has been deprecated as of vSphere.
    (In other words for active/standby leave it set to the default 15000 for vSphere)
  • the limit of 100 VMs per host is actually “100 powered on and HA enabled VMs”. Of course this also goes for the 40 VM limit  for clusters with more than 8 hosts.
  • the limit of 100VMs per host in an HA cluster less than 9 hosts is a soft limit.
  • das.isolationaddress[0-9] is one of the most underrated advanced settings.
    It should be used as an additional safety net to rule out false positives.

Just four little things most people don’t seem to realize or know…

VMware Availability Solutions and Futures (BC3425 – Banjot Chanana)

Duncan Epping · Sep 16, 2009 ·

I was just replaying Banjot Chanana’s session “VMware Availability Solutions and Futures“. Banjot is the product manager for the availability solutions HA and FT. I met Banjot in Palo Alto the week before VMworld and we spoke about HA, present and futures. Unfortunately I can’t elaborate on anything that has been discussed but I can however repeat what Banjot spoke about during his session at VMworld.

The most exciting part of the presentation, for me at least, start at roughly 35:40. Banjot start to elaborate on futures especially when the 3D model gets expanded with “Stretched Clusters with FT” and “Stretched HA Clusters” I start to get interested. Some bullet points on future developments:

  • VM Component Protection -> loss of storage / loss of VM network -> fail-over / alert
    Drives higher availability against granular outages
  • Stretched HA Clusters -> Carving up Clusters in “sub-clusters” by tagging VMs -> fail-over to other “sub-cluster” based on affinity
    Drives higher availability against site failures
  • Application Monitoring -> Application awareness / correlation between infrastructure and application events -> SLA awareness also performance by using DRS
    Drives higher availability against application / service failure
  • Host Retirement -> Host health scores would also indicate “VM readiness” of a host -> VMotion based on host health scores ->
    Drives higher availability by monitor host health and taking action when thresholds are exceeded
  • Integrated Availability -> Availability Policies vs per VM settings -> Defining tiers and applying them to sets of VMs -> Based on SLA
    Decreases operational efforts and increases availability by reducing “human errors”

Although some people were disappointed by the lack of announcements of new products I think there’s more than enough exciting features coming up if you know where to find them. Thanks Banjot for these insights,

Future HA developments… (VMworld – BC3197)

Duncan Epping · Sep 15, 2009 ·

I was just listening to “BC3197 – High Availability – Internals and Best Practices” by Marc Sevigny. Marc is one of the HA engineers and is also my primary source of information when it comes to HA. Although most information can be found on the internet it’s always good to verify your understanding with the people who actually wrote it.

During the session Marc explains, and I’ve written about in this article, that when a dual host failure occurs the global startup order is not taking into account. The startup order will be processed per host with the current version. In other words “Host a” first with taking startup order into account and then “Host B” with taking startup order into account.

During the session however Marc revealed that in a future version of HA global startup settings(Cluster based) will be taken into account for any number of host failures! Great stuff, another thing to mention is that they are also looking into an option which would enable you to pick your primary hosts. For blade environment this will be really useful. Thanks Marc for the insights,

Site Recovery Manager 1.0 Update 1 Patch 4

Duncan Epping · Sep 14, 2009 ·

One of my colleagues, Michael White, just pointed out that VMware released a patch for Site Recovery Manager:

Site Recovery Manager 1.0 Update 1 Patch 4
File size: 7.9 MB
File type: .msi

Here are the most important fixes:

  • a problem that could cause a recovery plan to fail and log the message
    Panic: Assert Failed: “_pausing” @ d:/build/ob/bora-172907/santorini/src/recovery/secondary/recoveryTaskBase.cpp:328
  • a problem that caused the SRM SOAP API method getFinalStatus() to write all XML output on a single line
  • full session keys are no longer logged (partial keys are used in the log instead)
  • a problem that could cause SRM to crash during a test recovery and log the message
    Exception: Assert Failed: “!IsNull()” @ d:/build/ob/bora-128004/srm101-stage/santorini/public\common/typedMoRef.h:168
  • a problem that could cause a recovery plan test to fail to create test bubble network when recovering virtual machines that had certain types of virtual NICs
  • a problem that could cause incorrect virtual machine start-up order on recovery hosts that enable DRS
  • a problem that could cause the SRM server to crash while testing a recovery plan
  • a problem that could cause SRM to fail and log a “Cannot execute scripts” error when customizing Windows virtual machines on ESX 3.5 U1 hosts.
  • support for customizing Windows 2008 has been added
  • a problem that could prevent network settings from being updated during test recovery for guests other than Windows 2003 Std 32-bit
  • a problem that prevents protected virtual machines from following recommended Distributed Resource Scheduler (DRS) settings when recovering to more than one DRS cluster.
  • a problem observed at sites that support more than seven ESX hosts. If you refresh inventory mappings when connected to such a site, the display becomes unresponsive for up to ten minutes.
  • a problem that could prevent SRM from computing LUN consistency groups correctly when one or more of the LUNs in the consistency group did not host any virtual machines.
  • a problem that could cause the client user interface to become unresponsive when creating protection groups with over 300 members
  • several problems that could cause SRM to log an error rmessage vim.fault.AlreadyExists when recomputing datastore groups
  • a problem that could cause SRM to log an Assert Failed: “ok” @ src/san/consistencyGroupValidator.cpp:64 error when two different datastores match a single replicated device returned by the SRA
  • a problem that could cause SRM to remove static iSCSI targets with non-test LUNs during test recovery
  • several problems that degrade the performance of inventory mapping

VMware Data Recovery 1.0.2

Duncan Epping · Sep 10, 2009 ·

VMware just released a brand new version of VMware Data Recovery.

Version 1.0.2
Build Number 188925
Release Date 2009/09/09

This releases fixes a couple of known issues:

  • Various Integrity Check Issues
    Under certain circumstances, integrity checks reported damaged restore points and cannot load session errors. For example, such problems might be reported if:

    • A combination of simultaneous overlapping backups and integrity checks are started.
    • A backup is stopped before completion because the backup window closes. In such a case, the deduplication store records transactions, but the closing of the backup window prevents recording the transaction to the catalog.

    When integrity checks failed in such cases, Data Recovery would mark restore points as damaged or report that the backup session could not be found. Data Recovery integrity check now handles these conditions properly, so these problems no longer occur.

  • Connections Using Alternate Ports not Supported
    By default, connections to vCenter Server use port 443. If vCenter Server is configured to use an alternate port, Data Recovery continued to attempt to connect using the default port. This caused the Data Recovery plug-in to report authentication failures when attempting to connect to the Data Recovery appliance. Alternate vCenter Server port configurations are now supported.
  • Multiple VMDKs with the Same Name not Handled Properly
    A virtual machine can have multiple VMDK files with the same name that are stored on different LUNs. In such a case, Data Recovery would only restore one of the disks. Data Recovery now restores all disks.

You can find the full release notes here.

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 48
  • Page 49
  • Page 50
  • Page 51
  • Page 52
  • Interim pages omitted …
  • Page 63
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Also visit!

For the Dutch-speaking audience, make sure to visit RunNerd.nl to follow my running adventure, read shoe/gear/race reviews, and more!

Do you like Hardcore-Punk music? Follow my Spotify Playlist!

Do you like 80s music? I got you covered!

Copyright Yellow-Bricks.com © 2026 · Log in