ha

INF-BCO2807 – vSphere HA and Datastore Access Outages tech preview

Duncan Epping · Sep 5, 2012 ·

All of you know by now that I have a love for availability related topics… Hence the reasons I needed to write something about INF-BCO2807. The session titled “vSphere HA and Datastore Access Outages – Current- Capabilities Deep-Dive and Tech Preview”, presented by Keith Farkas and Smriti Desai, discussed possible future HA enhancements that will solve component failures. Those of you who read my whitepaper on stretched clusters can immediately see why this would be a nice enhancement!

Once again a big fat disclaimer, VMware gives absolutely no guarantees when or even if this will be released.

This session was all about inaccessible data stores. During our talk Lee Dilworth and I explained the difference between a Permanent Device Loss (PDL) and an All Paths Down (APD) condition. In short, PDL is a “scsi sense code” issued by the storage system (or an iSCSI “login reject” for that matter). This scsi sense code allows vSphere (both the kernel and HA) to respond and act upon it. In the case of an APD vSphere cannot respond… the LUN is gone on that host and we don’t know why, so what do we do? Well with 5.1 and prior we do nothing. This results in zombied virtual machines, and that is not the state you want your virtual machines to be in right?

So how is VMware planning to solve this? It is planning to enhance HA with what was referred to as “Component Protection”. Component Protection allows responses per virtual machine when an APD or PDL has been detected. This is not based on guest I/Os failing, but on the vSphere platform declaring that the device is in a PDL or APD condition.

When an APD scenario is detected HA will be smart enough to understand which hosts can restart virtual machines, as in some cases multiple hosts might be impacted. Of course it will also only kill your virtual machine and restart it when it knows capacity is available for it.

I don’t know about you, but I would rather see this implemented today than tomorrow!? APD is not common, but also not rare… and when disaster strikes, it strikes hard!

I don’t think this session is scheduled for VMworld Europe, so make sure to watch the recording as soon as it is available as it is well worth your time. Keith and Smriti gave an excellent deepdive on the current vSphere HA and a nice look in to the future!

VMware vSphere 5.1 Clustering Deepdive available on Amazon now!

Duncan Epping · Aug 27, 2012 ·

Frank and I published the book this morning and Amazon was extremely fast with getting it up on the website. It is available now:

VMware vSphere 5.1 Clustering Deepdive available at VMworld!

Duncan Epping · Aug 27, 2012 ·

Frank and I had been talking about this for a couple of months, but without mentioning what it was we were working on. The last couple of months we’ve spent our spare time on updating the 5.0 Clustering Deepdive to 5.1.

Although this “just” an update to 5.1, we’ve added a section about stretched clustering to the book and the Storage DRS section has been completely overhauled. Several new paragraphs were added to the vSphere HA section and we had to do some minor tweaks to the vSphere DRS section. On top of we added a great foreword by Raghu Raghuram!

In the upcoming week the book will be available on Amazon (paper – kindle) and in the Apple iBooks store. As we needed to be careful with publishing it at a certain time/date in some cases it might take a couple of days before it shows up in your “local” online bookstore. If you really can’t wait, it is available now on Createspace.

Again, we have kept the prices low… The e-book will sell for only $ 7.49 (note a surcharge might be added based on location) and the paper copies sells for $ 24.95. It is a bargain if I say so myself. Note that even the paper copy will be available directly from European Amazon stores and so will the ebook.

For those at VMworld, there are copies available at the VMworld store on Tuesday, or maybe even Monday afternoon. Note that there is a limited amount available… if you want a copy I would recommend picking it up soon! If you see Frank or myself walking around and would love to have your book signed, don’t hesitate it is our pleasure! We had the honor of presenting the book to Carl Eschenbach yesterday, I can tell you Carl was thrilled and so are we… P i c k i t u p!

VMware Availability Survey

Duncan Epping · Aug 14, 2012 ·

I just received the following… If you have some spare time on your hands please fill out this survey, it would be much appreciated.

We are hard at work building our future products to better meet your needs. As part of this process we are developing a 3-year strategy for the VMware Business Continuity offerings, and are seeking your input to best align our strategy with your business objectives.

Please bring your voice to the table – if you have a few minutes today, would you please click on the link below and share your insights on the VMware Business Continuity road map. Answer as few or many of the questions as you’d like.

https://vmware.allegiancetech.com/cgi-bin/qwebcorporate.dll?idx=6KGA9Q

I set restart priorities but still my VMs seem to be powered on in a different order!

Duncan Epping · Aug 13, 2012 ·

On the VMware Community someone asked this question about restart priorities. At the same time I received a question on a similar topic via email. This particular question was as follows:

I have restart priorities defined on my cluster. However even if I place my virtual machines for which this order applies on one host and test a failure they seem to come online in the wrong order…

In vSphere HA you can define the restart priority for each individual virtual machine. Now this restart priority applies to the power-on task that is initiated by HA when a host has failed. Did you note that I emphasized power-0n attempt? Well there is a reason for that… it is the prioritization of the attempt itself. HA doesn’t wait for a virtual machine to power-on before it starts the next… it just does the power-on attempt and when it completes the next round will be attempted. This also means that if you use 3 different priorities it could happen that a “low priority” virtual machine is restarted literally seconds after a “high priority” virtual machine is. In the case of the person who asked the question he had a large database machine defined as “high priority” and an app as “low priority”. Unfortunately the database machine took minutes to power-on and report up, where the application took less than a minute.

Keep that in mind when defining the restart priorities for your virtual machine. Yes it will help, but only for prioritizing which virtual machine needs to be restarted first. This is not a guarantee your virtual machines will be completed booted up first,