Yellow Bricks

Maffia fight caught on camera at #VMworld…

Duncan Epping · Oct 20, 2011 ·

I was just informed that this Maffia gang fight was caught on camera at VMworld. I heard that VMworld TV and even the Monster VM aka Mr Muscles was involved! Follow the Dutch_vMaffia on Twitter and check this video for some shocking footage. By the way, if you want to protect your organization against any threat out there… contact the Dutch vMaffia about vShield protection!

twitter.com/dutch_vmaffia

Spanish vSphere 5 book out now!

Duncan Epping · Oct 20, 2011 ·

A week ago Jose Maria Gonzalez contacted me and asked me if I wanted to write the foreword for his new book. I said yes and asked when it needed to be ready. Jose responded well… asap, preferably tomorrow so I started working on it. I finished quickly and Jose translated it to Spanish. So if you are Spanish speaking and looking for a book on vSphere 5 head over to and pick it up. It is a DIY book, in other words self-publishing. Before I post the link, let me congratulate Jose with his new book… I hope it will be a huge success!

http://www.josemariagonzalez.es/2011/10/20/libro-descubre-domina-vmware-vsphere-5.html

I want to a share a bit of the foreword for those who don’t speak Spanish, as I feel strongly about this…

Maybe even more important, consider sharing your experiences with the rest of the world like Jose did, and hopefully will continue to do. I do realize that not everyone enjoys writing books or blogging, but there are other ways of sharing knowledge and expertise. For example, join your local VMware User Group (VMUG) and host a session about your experiences and the challenges you faced. If there is no local VMUG consider starting one! (Bring back the UG in VMUG.)

When you start sharing your experiences you will quickly discover that there are more people facing similar challenges, you will meet new people who will give you their opinion on how to design, implement and maintain a virtual infrastructure. By giving back to your local, or even the global, community you will gain more knowledge than you could have ever imagined. Sharing your expertise might lead to opportunities you never expected to have, but more importantly it is what will enable you to be the best you can possible be.

vSphere Product Management Survey… help us out

Duncan Epping · Oct 17, 2011 ·

This survey asks questions about your environments, what tier-1 workloads you are virtualizing and how you’re looking at emerging applications.

Your feedback is a crucial part of our decision making process and this is a very direct way to reach vSphere Product Management and influence the direction of vSphere and future products. The link is

http://tiny.cc/vmweu

The survey takes about 10 minutes to complete.

Iomega PX6 / PX4 update that enables Time Machine to work properly

Duncan Epping · Oct 14, 2011 ·

Just wanted to point out that if you flash your Iomega PX6 or PX4 with the latest firmware, 3.1.14.995, it will actually enable you to do Mac OS X (Lion / 10.7) Time Machine backups and restores again… I discovered it by accident as somehow Iomega did not list it in the release note and they did not update the KB article which explains that it doesn’t work. Anyway, it works again… Simple download and upgrade of your PX6/PX4 firmware will do the trick, at least it did for me.

vSphere 5 HA – Isolation Response which one to pick?

Duncan Epping · Oct 11, 2011 ·

Last week I did an article about Datastore Heartbeating and the prevention of the Isolation Response being triggered. Apparently this was an eye-opener for some and I received a whole bunch of follow up questions through twitter and email. I figured it might be good to write-up my recommendations around the Isolation Response. Now I would like to stress that these are my recommendations based on my understanding of the product, not based on my understanding of your environment or SLA. When applying these recommendations always validate them against your requirements and constraints. Another thing I want to point out is that most of these details are part of our book, pick it up… the e-book is cheap.

First of all, I want to explain Isolation Response…

Isolation Response is the action HA triggers, per VM, when it is network isolated from the rest of your cluster. Now note the “per VM”, so a host will trigger the configured isolation response per VM, which could be either “power off” or “shutdown”. However before it will trigger the isolation response, and this is new in 5.0, the host will first validate if a master owns the datastore on which the VMs configuration files are stored. If that is not the case then the host will not trigger the isolation response.

Now lets assume for a second that the host has been network isolated but a master doesn’t own the datastore on which the VMs config files are stored, what happens? Nothing happens. Isolation response will not be triggered as the host knows that there is no master which can restart these VMs, in other words there is no point in powering down a VM when it cannot power it on. The host will of course periodically check if the datastore is claimed by a master.

There’s also a scenario where the complete datastore could be unavailable, in the case of a full network isolation and NFS / iSCSI backed storage for instance. In this scenario the host will power off the VM when it has detected another VM has acquired the lock on the VMDK. It will do this to prevent a so-called split brain scenario, as you don’t want to end up with two instances of your VM running in your environment. Keep in mind that in order to detect this lock the “isolation” on the storage layer needs to be resolved. It can only detect this when it has access to the datastore.

I guess there’s at least a couple of you thinking but what about the scenario where a master is network isolated? Well in that case the master will drop responsibility for those VMs and this will allow the newly elected master to claim them and take action if required.

I hope this clarifies things.

Now lets talk configuration settings. As part of the Isolation Response mechanism there are three ways HA could respond to a network isolation:

Leave Powered On – no response at all, leave the VMs powered on when there’s a network isolation
Shutdown VM – guest initiated shutdown, clean shutdown
Power Off VM – hard stop, equivalent to power cord being pulled out

When to use “Leave Powered On”
This is the default option and more than likely the one that fits your organization best as it will work in most scenarios. When you have a Network Isolation event but retain access to your datastores HA will not respond and your virtual machines will keep running. If both your Network and Storage environment are isolated then HA will recognize this and power-off the VMs when it recognizes the lock on the VMDKs of the VMs have been acquired by other VMs to avoid a split brain scenario as explained above. Please note that in order to recognize the lock has been acquired by another host the “isolated” host will need to be able to access the device again. (The power-off won’t happen before the storage has returned!)

When to use “Shutdown VM”
It is recommend to use this option if it is likely that a host will retain access to the VM datastores when it becomes isolated and you wish HA to restart a VM when the isolation occurs. In this scenario, using shutdown allows the guest OS to shutdown in an orderly manner. Further, since datastore connectivity is likely retained during the isolation, it is unlikely that HA will shut down the VM unless there is a master available to restart it. Note that there is a time out period of 5 minutes by default. If the VM has not been gracefully shutdown after 5 minutes a “Power Off” will be initiated.

When to use “Power Off VM”
It is recommend to use this option if it is likely that a host will lose access to the VM datastores when it becomes isolated and you want HA to immediately restart a VM when this condition occurs. This is a hard stop in contrary to “Shutdown VM” which is a guest initiated shutdown and could take up to 5 minutes.

As stated, Leave Powered On is the default and fits most organizations as it prevents unnecessary responses to a Network Isolation but still takes action when the connection to your storage environment is lost at the same time.

** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **