Last week a customer asked me a question about how to respond to for instance a partial failure in their SAN environment. A while back I had a similar question from one of my other customers so I more or less knew where to look, and I actually already blogged about this over a year ago when I was showing some of the new vSphere features. Although this is fairly obvious I hardly ever see people using this and hence the reason I wanted to document one of the obvious things that you can implement…. Alarms

Alarms can be used to trigger an alert, and that is of course the default behavior of predefined alarms. However you can also create your own alarms and associate an action with it. I am showing the possibilities here and am not saying that this is a best practice, but the following two screenshots show that it is possible to place a host in maintenance mode based on degraded storage redundancy.

First you define the alarm:

And then you define the action:

Again, this is action could have a severe impact when a switch fails and I wouldn’t recommend it, but I wanted to ensure everyone understands the type of combinations that are possible. I would generally recommend to send an SNMP trap or even a notification email… and I would recommend to at least define the following alarms:

  • Degraded Storage Path Redundancy
  • Duplicate IP Detected
  • HA Agent Error
  • Host connection lost
  • Host error
  • Host warning
  • Host WWN changed
  • Host WWN conflict
  • Lost Network Connectivity
  • Lost Network Redundancy
  • Lost Storage Connectivity
  • Lost Storage Path Redundancy

Many of these deal with hardware issues and you might already be monitoring for them, if not make sure you monitor them through vCenter and take appropriate action when needed.