<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Yellow Bricks &#187; BC-DR</title>
	<atom:link href="http://www.yellow-bricks.com/category/bcdr/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.yellow-bricks.com</link>
	<description>Building blocks for virtualization...</description>
	<lastBuildDate>Fri, 10 Feb 2012 11:12:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Re: when to disable HA? /cc @hashmibilal</title>
		<link>http://www.yellow-bricks.com/2012/01/25/re-when-to-disable-ha-cc-hashmibilal/</link>
		<comments>http://www.yellow-bricks.com/2012/01/25/re-when-to-disable-ha-cc-hashmibilal/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 08:28:56 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[ha]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9619</guid>
		<description><![CDATA[<p>Bilal Hashmi wrote a nice article about HA today and in this article he asked a couple of questions. As I think the info is useful for everyone I decided to respond through a blog article instead of by commenting. Let me start by saying that in general HA should never be disabled. The later versions of vSphere have a [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2012/01/25/re-when-to-disable-ha-cc-hashmibilal/">Re: when to disable HA? /cc @hashmibilal</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>Bilal Hashmi wrote a <a href="http://www.cloud-buddy.com/?p=996">nice article about HA</a> today and in this article he asked a couple of questions. As I think the info is useful for everyone I decided to respond through a blog article instead of by commenting.</p>
<p>Let me start by saying that in general HA should <span style="text-decoration: underline;">never</span> be disabled. The later versions of vSphere have a neat option called &#8220;Enable Host Monitoring&#8221;. This option should be used for scheduled network maintenance. The difference between disabling host monitoring and disabling HA is that disabling host monitoring does not cause a full reconfiguration (see screenshot below) of HA and a new election process. Just the &#8220;host monitoring&#8221; functionality is disabled, which is what you want in this scenario.</p>
<p><img class="colorbox-9619"  src="http://farm8.staticflickr.com/7156/6759193477_98dfa8265d.jpg" alt="" /></p>
<p>Bilal asked multiple questions / made multiple statements in his article, I will respond to two of these specifically to explain the way HA handles failures/isolation:</p>
<blockquote><p>In this case within 30 sec of the management network outage, each host would have declared itself isolated and wont attempt to restart any VMs like the primaries would in vSphere 5.</p></blockquote>
<p>So why is this? As soon as a Master is isolated it will drop &#8220;ownership&#8221; of datastores on which VMs are running that are part of its cluster. Before the other hosts trigger the isolation response for a given VM they will validate if the datastore on which this VM is stored is &#8220;owned&#8221; by a master. In the case of a cluster wide isolation due to a network outage / maintenance the ownership would be dropped and this would result in HA not triggering the isolation response. This is a major change compared to vSphere 4.x and prior!</p>
<blockquote><p>Now what happens when the network outage is over and the hosts are in a position to talk to each other? I have not been able to find documentation on whether an isolated host will enter an election (vSphere 4 or 5) ones the communication channel is open and bring the cluster back to life.</p></blockquote>
<p>Lets focus on vSphere 5.0 as that seems most relevant. A host remains isolated until it observes HA network traffic, like for instance election messages <span style="text-decoration: underline;">OR</span> it starts getting a response from an isolation address. Meaning that as long as the host is in &#8220;isolated state&#8221; it will continue to validate its isolation by pinging the isolation address. As soon as the isolation address responds it will initiate an election process or join an existing election process and the cluster will return to a normal state.</p>
<p>There&#8217;s absolutely no need to manually intervene. HA takes care of all of this for you.</p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2012/01/25/re-when-to-disable-ha-cc-hashmibilal/">Re: when to disable HA? /cc @hashmibilal</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2012/01/25/re-when-to-disable-ha-cc-hashmibilal/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Fiddling around with SRM&#8217;s Storage Replication Adapter &#8211; Part II</title>
		<link>http://www.yellow-bricks.com/2012/01/12/fiddling-around-with-srms-storage-replication-adapter-part-ii/</link>
		<comments>http://www.yellow-bricks.com/2012/01/12/fiddling-around-with-srms-storage-replication-adapter-part-ii/#comments</comments>
		<pubDate>Thu, 12 Jan 2012 14:13:05 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[srm]]></category>
		<category><![CDATA[VMware]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9577</guid>
		<description><![CDATA[<p>** Disclaimer: This is for educational purposes, please don’t implement this in your production environment as it is not supported! ** After my article this week about (ab) using the SRA provided through Site Recovery Manager to fail-over any LUN I expected some people reaching out to me with additional questions. One of the questions which came in more than once was [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2012/01/12/fiddling-around-with-srms-storage-replication-adapter-part-ii/">Fiddling around with SRM&#8217;s Storage Replication Adapter &#8211; Part II</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>** Disclaimer: This is for educational purposes, please don’t implement this in your production environment as it is <strong>not</strong> supported! **</p>
<p>After my <a href="http://www.yellow-bricks.com/2012/01/10/hacking-site-recovery-manager-srm-a-storage-array-adapter/">article</a> this week about (ab) using the SRA provided through Site Recovery Manager to fail-over any LUN I expected some people reaching out to me with additional questions. One of the questions which came in more than once was &#8220;is it possible to do a test-failover of a LUN which is not managed by the SRM infra&#8221;? I guess the short answer is yes it is. The long answer is: well it depends on what your definition of a &#8220;test-failover&#8221; is. Of course booting up a physical machine from SAN while keeping the same IP etc would cause conflicts. I am also not going to show you how to re&#8217;ip your physical machines as I expect you to know this. From an SRM perspective how exciting is this?</p>
<p>To be honest, not really. The same concept applies. For a test-failover SRM calls the SRA by a script called &#8220;command.pl&#8221; and it feeds it XML. The following lines of XML are relevant for this exercise, but the critical one is &#8220;TestFailoverStartParameters&#8221;:</p>
<p style="padding-left: 30px;"><code>--&gt; &lt;TestFailoverStartParameters&gt;<br />
--&gt; &lt;ArrayId&gt;BB005056AE32820000-server_2&lt;/ArrayId&gt;<br />
--&gt; &lt;AccessGroups&gt;<br />
--&gt; &lt;AccessGroup id="domain-c7"&gt;<br />
--&gt; &lt;Initiator type="iSCSI" id="iqn.1998-01.com.vmware:localhost-11616041"/&gt;<br />
--&gt; &lt;Initiator type="iSCSI" id="iqn.1998-01.com.vmware:localhost-4a15366e"/&gt;<br />
--&gt; &lt;Initiator type="NFS" id="10.21.68.106"/&gt;<br />
--&gt; &lt;Initiator type="NFS" id="10.21.68.105"/&gt;<br />
--&gt; &lt;/AccessGroup&gt;<br />
--&gt; &lt;/AccessGroups&gt;<br />
--&gt; &lt;TargetDevices&gt;<br />
--&gt; &lt;TargetDevice key="fs14_T1_LUN1_BB005056AE32800000_fs10_T1_LUN1_BB005056AE32820000"&gt;<br />
--&gt; &lt;AccessGroups&gt;<br />
--&gt; &lt;AccessGroup id="domain-c7"/&gt;<br />
--&gt; &lt;/AccessGroups&gt;<br />
--&gt; &lt;/TargetDevice&gt;<br />
--&gt; &lt;/TargetDevices&gt;<br />
--&gt; &lt;/TestFailoverStartParameters&gt;<br />
--&gt; &lt;/Command&gt;</code></p>
<p>Now in our case we want to fail-over a random non vSphere LUN. We will need the &#8220;initiator&#8221; (server(s)) who will need to see be able to see this LUN and we will need the LUN identifier. All of this can either be found in the SRM log files (LUN identifiers) or on the physical server (initiator details). If you would call command.pl and feed it the XML file the SRA will request the array to create a snapshot and give the host access to that snapshot. Now it is up to you to take the next steps!</p>
<p>It is no rocket science. Anything SRM does with the SRA you can do from the command line using command.pl and a custom XML file. As mentioned in the comments in my previous article, I know people are interested in using this for Physical Hosts&#8230; I will discuss this internally, but for now don&#8217;t come close, it is not supported!</p>
<p>&nbsp;</p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2012/01/12/fiddling-around-with-srms-storage-replication-adapter-part-ii/">Fiddling around with SRM&#8217;s Storage Replication Adapter &#8211; Part II</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2012/01/12/fiddling-around-with-srms-storage-replication-adapter-part-ii/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>&#8220;Hacking&#8221; Site Recovery Manager (SRM) / a Storage Array Adapter</title>
		<link>http://www.yellow-bricks.com/2012/01/10/hacking-site-recovery-manager-srm-a-storage-array-adapter/</link>
		<comments>http://www.yellow-bricks.com/2012/01/10/hacking-site-recovery-manager-srm-a-storage-array-adapter/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 15:17:43 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[5]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[srm]]></category>
		<category><![CDATA[VMware]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9552</guid>
		<description><![CDATA[<p>** Disclaimer: This is for educational purposes, please don’t implement this in your production environment as it is not supported! ** Last week I received a question and I figured I would dive in to it this week. The question was if it is possible to fail-over LUNs using VMware Site Recovery Manager (SRM) which are not part of the Cluster which [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2012/01/10/hacking-site-recovery-manager-srm-a-storage-array-adapter/">&#8220;Hacking&#8221; Site Recovery Manager (SRM) / a Storage Array Adapter</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>** Disclaimer: This is for educational purposes, please don’t implement this in your production environment as it is <strong>not</strong> supported! **</p>
<p>Last week I received a question and I figured I would dive in to it this week. The question was if it is possible to fail-over LUNs using VMware Site Recovery Manager (SRM) which are not part of the Cluster which SRM &#8220;manages&#8221;. In other words, can I fail-over a LUN which is attached to a physical Windows Server or to a completely separate VMware Cluster? Before we continue, I did not hack SRM itself, neither did I make any changes to the SRA.</p>
<p>Lets briefly explain what SRM does normally when you go through the process of of creating a DR plan. Now this is slimmed down with only focussing on the relevant stuff for this question:</p>
<ul>
<li>First it will discover the devices using the Storage Replication Adapter (SRA)</li>
<li>It then discovers all LUNs using the SRA</li>
<li>It show the replicated LUNs containing VMs to the admin</li>
<li>Admin can use these in his plan and &#8220;protect&#8221; the VMs appropriately</li>
</ul>
<p>I decided to install SRM in a nested environment using the <a href="http://nickapedia.com/2010/09/12/ubertastic-celerra-uber-vsa-v3-unisphere/">Celerra Uber VSA</a>. I installed the VNX SRA and configured it and went through some of the log files just to find a piece of evidence that my plan is even possible. For Windows 2008 you can find the SRM Log Files in this location by the way:</p>
<pre style="padding-left: 30px;">%ALLUSERSPROFILE%\VMware\VMware vCenter Site Recovery Manager\Logs\</pre>
<p>Other locations are documented in this <a href="http://kb.vmware.com/kb/1021802">KB</a>. When I created the environment I created multiple LUNs with different sizes to make them easily recognizable. The LUN which is replicated but not exposed to our vCenter/SRM environment is 25GB and the LUN which is exposed is 30GB. This is what the log files showed me when I did a quick find on the size:</p>
<pre style="padding-left: 30px;">(Production) fsid=14 size=30000MB alloc=0MB dense  read-write
path=/srm01/fs14_T1_LUN1_BB005056AE32800000/fs14_T1_LUN1_BB005056AE32800000 (snapped)</pre>
<pre style="padding-left: 30px;">(Production) fsid=16 size=25000MB alloc=0MB dense read-write
path=/vc01/fs16_T1_LUN2_BB005056AE32800000/fs16_T1_LUN2_BB005056AE32800000 (snapped)</pre>
<p>As you can see both my 25GB and my 30GB LUN is listed. I added a name to it which also allows me to quickly identify it &#8220;srm01&#8243; and &#8220;vc01&#8243;, where &#8220;vc01&#8243; is the one which is not managed by SRM.</p>
<p>So how does SRM get this information? Well it is actually pretty straight forward, SRM calls a script which is part of the SRA. SRM feeds this script XML. This XML code contains the commands / details required. I&#8217;ve <a href="http://www.yellow-bricks.com/2009/01/20/sra-discoverluns/">written</a> about this a long time ago when I was troubleshooting SRM and it is still applicable:</p>
<pre style="padding-left: 30px;">perl command.pl &lt; file.xml</pre>
<p>Now the XML file is of course key here&#8230; How does that need to be structured and can we use, or should I say abuse, it to do a fail-over of a LUN which is not &#8220;managed&#8221; by SRM/vCenter. Well I started digging and it turns out to be fairly straight forward. Keep in mind the disclaimer at the top though, this is not what the SRA&#8217;s were intended for&#8230; this is purely for educational purposes and far from supported. Again the logfiles exposed a lot of details here, but I stripped it down to make it readable. This is the response from the SRA when SRM asked for details on which devices are available:</p>
<pre style="padding-left: 30px;">2012-01-09T12:14:53.583-08:00 [05388 verbose 'SraCommand' opID=7D6C5634-00000023] discoverDevices responded with:
--&gt; &lt;?xml version="1.0" encoding="UTF-8" standalone="yes"?&gt;
--&gt; &lt;SourceDevice state="read-write" id="1-1"&gt;
--&gt; &lt;Name&gt;fs14_T1_LUN1_BB005056AE32800000&lt;/Name&gt;
--&gt; &lt;Identity&gt;
--&gt; &lt;Wwn&gt;60:06:04:8c:ab:b2:88:c0:59:40:72:24:1b:5f:77:72&lt;/Wwn&gt;
--&gt; &lt;/Identity&gt;
--&gt; &lt;TargetDevice key="fs14_T1_LUN1_BB005056AE32800000_fs10_T1_LUN1_BB005056AE32820000"/&gt;
--&gt; &lt;/SourceDevice&gt;
--&gt; &lt;SourceDevice state="read-write" id="1-2"&gt;
--&gt; &lt;Name&gt;fs16_T1_LUN2_BB005056AE32800000&lt;/Name&gt;
--&gt; &lt;Identity&gt;
--&gt; &lt;Wwn&gt;60:06:04:8c:b8:50:22:96:0c:0b:bf:d8:59:0b:a1:75&lt;/Wwn&gt;
--&gt; &lt;/Identity&gt;
--&gt; &lt;TargetDevice key="fs16_T1_LUN2_BB005056AE32800000_fs12_T1_LUN3_BB005056AE32820000"/&gt;
--&gt; &lt;/SourceDevice&gt;
--&gt; &lt;/SourceDevices&gt;</pre>
<p>Now if you look at SRM and try to make a Protection Group plan you will quickly discover that only those Datastores which have a VM hosted on there can be added. This is shown in the screenshot below.</p>
<p><img class="colorbox-9552"  src="http://farm8.staticflickr.com/7167/6671927693_3007904133.jpg" alt="" /></p>
<p>As mentioned SRM filters out the &#8220;irrelevant LUNs&#8221;, to me this LUN wasn&#8217;t irrelevant however. So what&#8217;s next? I decided to initiated a fail-over and to look at the log files. When the fail-over is initiated the following is issued by SRM, again I stripped some details to make it more readable:</p>
<pre style="padding-left: 30px;">--&gt; &lt;FailoverParameters&gt;
--&gt; &lt;ArrayId&gt;BB005056AE32820000-server_2&lt;/ArrayId&gt;
--&gt; &lt;AccessGroups&gt;
--&gt; &lt;AccessGroup id="domain-c7"&gt;
--&gt; &lt;Initiator id="iqn.1998-01.com.vmware:localhost-11616041" type="iSCSI"/&gt;
--&gt; &lt;Initiator id="iqn.1998-01.com.vmware:localhost-4a15366e" type="iSCSI"/&gt;
--&gt; &lt;Initiator id="10.21.68.106" type="NFS"/&gt;
--&gt; &lt;Initiator id="10.21.68.105" type="NFS"/&gt;
--&gt; &lt;/AccessGroup&gt;
--&gt; &lt;/AccessGroups&gt;
--&gt; &lt;TargetDevices&gt;
--&gt; &lt;TargetDevice key="fs14_T1_LUN1_BB005056AE32800000_fs10_T1_LUN1_BB005056AE32820000"&gt;
--&gt; &lt;AccessGroups&gt;
--&gt; &lt;AccessGroup id="domain-c7"/&gt;
--&gt; &lt;/AccessGroups&gt;
--&gt; &lt;/TargetDevice&gt;
--&gt; &lt;/TargetDevices&gt;
--&gt; &lt;/FailoverParameters&gt;</pre>
<p>I guess we should be able to work with this! Using the &#8220;discoverdevices&#8221; information and combining it with the &#8220;Failover&#8221; information I should be able to construct my own custom XML file. After creating this XML file I should be able to fail-over any LUN which is part of the selected device&#8230; What is my plan? I am planning to change the following:</p>
<ul>
<li>Initiator id</li>
<li>TargetDevice key</li>
</ul>
<p>I wasn&#8217;t sure if I needed to change the AccessGroup so I figured I would just test it like this. I called the script as follows:</p>
<pre style="padding-left: 30px;">&lt;path to perl&gt;\bin\perl.exe command.pl &lt; file.xml</pre>
<p>I watched a whole bunch of messages pass by and then looked at the Celerra when then fail-over commend was completed and noticed the following:</p>
<p><img class="colorbox-9552"  src="http://farm8.staticflickr.com/7143/6672491135_39d9bfe217.jpg" alt="" /></p>
<p>And of course within the &#8220;unmanaged&#8221; vCenter you can see it:</p>
<p><img class="colorbox-9552"  src="http://farm8.staticflickr.com/7011/6673389961_dbed269156_z.jpg" alt="" /></p>
<p>Successful fail-over of a LUN which wasn&#8217;t part of an SRM Protection Group! Yes, when you replace the Initiator ID even the masking is correctly configured. The only thing left would be either resignaturing the volume or mounting the volume. This of course depends on the OS owning the volume and the desired end result. All in all, a nice little experiment&#8230; Once again, don&#8217;t try this in your own environment, it is far from supported!</p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2012/01/10/hacking-site-recovery-manager-srm-a-storage-array-adapter/">&#8220;Hacking&#8221; Site Recovery Manager (SRM) / a Storage Array Adapter</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2012/01/10/hacking-site-recovery-manager-srm-a-storage-array-adapter/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>What happens to powered off VMs when a host fails?</title>
		<link>http://www.yellow-bricks.com/2011/11/11/what-happens-to-powered-off-vms-a-host-fails/</link>
		<comments>http://www.yellow-bricks.com/2011/11/11/what-happens-to-powered-off-vms-a-host-fails/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 13:35:36 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Various]]></category>
		<category><![CDATA[5]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[ha]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9375</guid>
		<description><![CDATA[<p>I had the question today what happens to a powered off VM when the host they are registered against fails? This customer always has multiple powered off VMs and was afraid their VMs would show up as orphaned. I was pretty confident that the VM would be re-registered against one of the remaining hosts in the cluster, but I validated [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/11/11/what-happens-to-powered-off-vms-a-host-fails/">What happens to powered off VMs when a host fails?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>I had the question today what happens to a powered off VM when the host they are registered against fails? This customer always has multiple powered off VMs and was afraid their VMs would show up as orphaned. I was pretty confident that the VM would be re-registered against one of the remaining hosts in the cluster, but I validated it just in case and this is what the events section of the VM shows:</p>
<p style="padding-left: 30px;"><code>Relocating from cs-tkmt-h08, emc-vnx-fcoe to cs-tkmt-h05, emc-vnx-fcoe</code></p>
<p>In other words, the VM is relocated from my ESXi host cs-tkmt-h08 to cs-tkmt-h05. No need to worry about orphaned VMs and manually registering them against the remaining hosts&#8230; vSphere does it for you.</p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/11/11/what-happens-to-powered-off-vms-a-host-fails/">What happens to powered off VMs when a host fails?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2011/11/11/what-happens-to-powered-off-vms-a-host-fails/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>All host failed, how does HA respond?</title>
		<link>http://www.yellow-bricks.com/2011/11/01/all-host-failed-how-does-ha-respond/</link>
		<comments>http://www.yellow-bricks.com/2011/11/01/all-host-failed-how-does-ha-respond/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 17:09:21 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[5]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[ha]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9324</guid>
		<description><![CDATA[<p>I wrote an article about the scenario where all host fail, due to for instance a power outage, and how HA responds to it. I had a question today if this was still valid with vSphere 5.0. I figured it wouldn&#8217;t hurt to describe the steps that vSphere 5.0 takes. Power Outage, all hosts down Power on hosts Election process [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/11/01/all-host-failed-how-does-ha-respond/">All host failed, how does HA respond?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>I wrote an <a href="http://www.yellow-bricks.com/2010/10/22/did-you-know-all-hosts-failed/">article</a> about the scenario where all host fail, due to for instance a power outage, and how HA responds to it. I had a question today if this was still valid with vSphere 5.0. I figured it wouldn&#8217;t hurt to describe the steps that vSphere 5.0 takes.</p>
<ol>
<li>Power Outage, all hosts down</li>
<li>Power on hosts</li>
<li>Election process will be kicked off. Master will be elected.</li>
<li>Master reads protected list</li>
<li>Master initiates restarts for those VMs which were listed as protected but not running</li>
</ol>
<p>Now the one thing I want to point out is that with vSphere 5.0 we will also track if the VM was cleanly powered off, as in initiated by the admin, or powered-off due to a failure/isolation. In the case they are cleanly powered off they will not be restarted, but in this scenario of course they are not cleanly powered off and as such the VMs will be powered on. The great thing about vSphere 5.0 is that you no longer need to know which hosts where your primary nodes so you can power these on first to ensure quick recovery&#8230; No, you can power on any host and HA will sort it out for you.</p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/11/01/all-host-failed-how-does-ha-respond/">All host failed, how does HA respond?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2011/11/01/all-host-failed-how-does-ha-respond/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Managing resources with HA Admission Control?</title>
		<link>http://www.yellow-bricks.com/2011/10/26/managing-resources-with-ha-admission-control/</link>
		<comments>http://www.yellow-bricks.com/2011/10/26/managing-resources-with-ha-admission-control/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 12:02:11 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[5]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[drs]]></category>
		<category><![CDATA[ha]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9321</guid>
		<description><![CDATA[<p>Last week at VMworld and on the VMTN community I had a couple of questions around resource management and HA Admission Control. It appears people were using HA Admission Control for managing resources within their environment. In other words, the amount of VMs that HA would allow you to restart would be leading for managing resources. But is that what [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/26/managing-resources-with-ha-admission-control/">Managing resources with HA Admission Control?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>Last week at VMworld and on the VMTN community I had a couple of <a href="http://communities.vmware.com/message/1851681#1851681">questions</a> around resource management and HA Admission Control. It appears people were using HA Admission Control for managing resources within their environment. In other words, the amount of VMs that HA would allow you to restart would be leading for managing resources. But is that what you should do?</p>
<p>If you look at how HA works and what HA is intended to do the answer in short is, <span style="text-decoration: underline;">No</span>. Now the reason for this is that HA is all about getting your virtual machines up and running again. If you look at HA Admission Control in vSphere 5.0 you will quickly see that for instance the default value for CPU has been decreased from 256MHz to 32MHz, if no CPU reservations are specified that is. Now in many scenarios virtual machines will consume and demand more than that. Another thing to point out is that if no memory reservation is specified the memory overhead of the VM is used. These values are more than likely much lower than what your virtual machine currently consumes or demands. The thing to keep in mind is that these CPU and Memory values only represent what HA needs in order to power-on your virtual machines.</p>
<p>If you want to manage resources, avoid severe overcommitment, guarantee a certain experience you should start looking at the DRS statistics. You should start exploring tools like VC Ops, Cap IQ&#8230; Don&#8217;t (ab)use vSphere HA for this. It is not designed to solve this problem. One thing to think about though is maybe increasing the minimum value for slotsizes to avoid scenarios where environments are fully overloaded!? If you have a consolidation ratio in mind it should be fairly simple to figure out which value to use:</p>
<p style="padding-left: 30px;">available memory esource per host / consolidation ratio = das.vmMemoryMinMB<br />
or<br />
available CPU esource per host / consolidation ratio = das.vmCpuMinMHz</p>
<p>I am not saying that you should do this, but I think it might not be a bad practice in environments where multiple people have access to vCenter and can deploy VMs. At least people will be triggered when you are running out of &#8220;slots&#8221; to start VMs.</p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/26/managing-resources-with-ha-admission-control/">Managing resources with HA Admission Control?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2011/10/26/managing-resources-with-ha-admission-control/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>vSphere 5 HA &#8211; Isolation Response which one to pick?</title>
		<link>http://www.yellow-bricks.com/2011/10/11/vsphere-5-ha-isolation-response-which-one-to-pick/</link>
		<comments>http://www.yellow-bricks.com/2011/10/11/vsphere-5-ha-isolation-response-which-one-to-pick/#comments</comments>
		<pubDate>Tue, 11 Oct 2011 13:53:17 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[5]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[ha]]></category>
		<category><![CDATA[isolation]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9265</guid>
		<description><![CDATA[<p>Last week I did an article about Datastore Heartbeating and the prevention of the Isolation Response being triggered. Apparently this was an eye-opener for some and I received a whole bunch of follow up questions through twitter and email. I figured it might be good to write-up my recommendations around the Isolation Response. Now I would like to stress that [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/11/vsphere-5-ha-isolation-response-which-one-to-pick/">vSphere 5 HA &#8211; Isolation Response which one to pick?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>Last week I did an article about <a href="http://www.yellow-bricks.com/2011/10/03/datastore-heartbeating-and-preventing-isolation-events/">Datastore Heartbeating</a> and the prevention of the Isolation Response being triggered. Apparently this was an eye-opener for some and I received a whole bunch of follow up questions through twitter and email. I figured it might be good to write-up my recommendations around the Isolation Response. Now I would like to stress that these are my recommendations based on my understanding of the product, not based on my understanding of your environment or SLA. When applying these recommendations always validate them against your requirements and constraints. Another thing I want to point out is that most of these details are part of our book, <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&amp;camp=0&amp;creative=0&amp;linkCode=as1&amp;creativeASIN=B005C1SARM&amp;adid=16Q69JRGDTX1DHPRKTQM&amp;">pick it up</a>&#8230; the e-book is cheap.</p>
<p>First of all, I want to explain Isolation Response&#8230;</p>
<p>Isolation Response is the action HA triggers, per VM, when it is network isolated from the rest of your cluster. Now note the &#8220;per VM&#8221;, so a host will trigger the configured isolation response per VM, which could be either &#8220;power off&#8221; or &#8220;shutdown&#8221;. However before it will trigger the isolation response, and this is new in 5.0, the host will first validate if a master owns the datastore on which the VMs configuration files are stored. If that is not the case then the host will not trigger the isolation response.</p>
<p>Now lets assume for a second that the host has been network isolated but a master doesn&#8217;t own the datastore on which the VMs config files are stored, what happens? Nothing happens. Isolation response will not be triggered as the host knows that there is no master which can restart these VMs, in other words there is no point in powering down a VM when it cannot power it on. The host will of course periodically check if the datastore is claimed by a master.</p>
<p>There&#8217;s also a scenario where the complete datastore could be unavailable, in the case of a full network isolation and NFS / iSCSI backed storage for instance. In this scenario the host will power off the VM when it has detected another VM has acquired the lock on the VMDK. It will do this to prevent a so-called split brain scenario, as you don&#8217;t want to end up with two instances of your VM running in your environment. Keep in mind that in order to detect this lock the &#8220;isolation&#8221; on the storage layer needs to be resolved. It can only detect this when it has access to the datastore.</p>
<p>I guess there&#8217;s at least a couple of you thinking but what about the scenario where a master is network isolated? Well in that case the master will drop responsibility for those VMs and this will allow the newly elected master to claim them and take action if required.</p>
<p>I hope this clarifies things.</p>
<p>Now lets talk configuration settings. As part of the Isolation Response mechanism there are three ways HA could respond to a network isolation:</p>
<ol>
<li>Leave Powered On &#8211; no response at all, leave the VMs powered on when there&#8217;s a network isolation</li>
<li>Shutdown VM &#8211; guest initiated shutdown, clean shutdown</li>
<li>Power Off VM &#8211; hard stop, equivalent to power cord being pulled out</li>
</ol>
<p><strong>When to use &#8220;Leave Powered On&#8221;</strong><br />
This is the default option and more than likely the one that fits your organization best as it will work in most scenarios. When you have a Network Isolation event but retain access to your datastores HA will not respond and your virtual machines will keep running. If both your Network and Storage environment are isolated then HA will recognize this and power-off the VMs when it recognizes the lock on the VMDKs of the VMs have been acquired by other VMs to avoid a split brain scenario as explained above. Please note that in order to recognize the lock has been acquired by another host the &#8220;isolated&#8221; host will need to be able to access the device again. (The power-off won&#8217;t happen before the storage has returned!)</p>
<p><strong>When to use &#8220;Shutdown VM&#8221;</strong><br />
It is recommend to use this option if it is likely that a host will retain access to the VM datastores when it becomes isolated and you wish HA to restart a VM when the isolation occurs. In this scenario, using shutdown allows the guest OS to shutdown in an orderly manner. Further, since datastore connectivity is likely retained during the isolation, it is unlikely that HA will shut down the VM unless there is a master available to restart it. Note that there is a time out period of 5 minutes by default. If the VM has not been gracefully shutdown after 5 minutes a &#8220;Power Off&#8221; will be initiated.</p>
<p><strong>When to use &#8220;Power Off VM&#8221;</strong><br />
It is recommend to use this option if it is likely that a host will lose access to the VM datastores when it becomes isolated and you want HA to immediately restart a VM when this condition occurs. This is a hard stop in contrary to &#8220;Shutdown VM&#8221; which is a guest initiated shutdown and could take up to 5 minutes.</p>
<p>As stated, Leave Powered On is the default and fits most organizations as it prevents unnecessary responses to a Network Isolation but still takes action when the connection to your storage environment is lost at the same time.</p>
<p>&nbsp;</p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/11/vsphere-5-ha-isolation-response-which-one-to-pick/">vSphere 5 HA &#8211; Isolation Response which one to pick?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2011/10/11/vsphere-5-ha-isolation-response-which-one-to-pick/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>vSphere Metro Storage Cluster solutions, what is supported and what not?</title>
		<link>http://www.yellow-bricks.com/2011/10/07/vsphere-metro-storage-cluster-solutions-what-is-supported-and-what-not/</link>
		<comments>http://www.yellow-bricks.com/2011/10/07/vsphere-metro-storage-cluster-solutions-what-is-supported-and-what-not/#comments</comments>
		<pubDate>Fri, 07 Oct 2011 10:57:44 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[4.1]]></category>
		<category><![CDATA[5]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[metro]]></category>
		<category><![CDATA[vmsc]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9252</guid>
		<description><![CDATA[<p>I started digging in to this yesterday when I had a comment on my Metro Cluster article. I found it very challenging to get through the vSphere Metro Storage Cluster HCL details and decided to write an article about it which might help you as well when designing or implementing a solution like this. First things first, here are the [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/07/vsphere-metro-storage-cluster-solutions-what-is-supported-and-what-not/">vSphere Metro Storage Cluster solutions, what is supported and what not?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>I started digging in to this yesterday when I had a comment on my <a href="http://www.yellow-bricks.com/2011/10/05/vsphere-5-0-ha-and-metro-stretched-cluster-solutions/">Metro Cluster</a> article. I found it very challenging to get through the vSphere Metro Storage Cluster HCL details and decided to write an article about it which might help you as well when designing or implementing a solution like this.</p>
<p>First things first, here are the basic rules for a supported environment?<br />
(Note that the below is taken from the &#8220;important support information&#8221;, which you see in the &#8220;screenshot, call out 3&#8243;.)</p>
<ul>
<li>Only array-based synchronous replication is supported and asynchronous replication is not supported.</li>
<li>Storage Array types FC, iSCSI, SVD, and FCoE are supported.</li>
<li>NAS devices are not supported with vMSC configurations at the time of writing.</li>
<li>The maximum supported latency between the ESXi ethernet networks sites is 10 milliseconds RTT.</li>
<ul>
<li>Note that 10ms of latency for vMotion is only supported with Enterprise+ plus licenses (<a href="http://www.yellow-bricks.com/2011/08/03/vsphere-5-metro-vmotion/">Metro vMotion</a>).</li>
</ul>
<li>The maximum supported latency for synchronous storage replication is 5 milliseconds RTT.</li>
</ul>
<p>How do I know if the array / solution I am looking at is supported and what are the constraints / limitations you might ask yourself? This is the path you should walk to find out about it:</p>
<ul>
<li>Go to : <a href="http://www.vmware.com/resources/compatibility/search.php?deviceCategory=san">http://www.vmware.com/resources/compatibility/search.php?deviceCategory=san</a> (See screenshot, call out 1)</li>
<li>In the &#8220;Array Test Configuration&#8221; section select the appropriate configuration type like for instance &#8220;FC Metro Cluster Storage&#8221; (See screenshot, call out 2)<br />
(note that there&#8217;s no other category at the time of writing)</li>
<li>Hit the &#8220;Update and View Results&#8221; button</li>
<li>This will result in a list of supported configurations for FC based metro cluster solutions, currently only EMC VPLEX is supported</li>
<li>Click name of the Model (in this case VPLEX) and note all the details listed</li>
<li>Unfold the &#8220;FC Metro Cluster Storage&#8221; solution for the footnotes as they will provide additional information on what is supported and what is not.</li>
<li>In the case of our example, VPLEX, it says &#8220;Only Non-uniform host access configuration is supported&#8221; but what does this mean?</li>
<ul>
<li>Go back to the Search Results and click the &#8220;Click here to Read Important Support Information&#8221; link (See screenshot, call out 3)</li>
<li>Half way down it will provide details for &#8220; vSphere Metro Cluster Storage (vMSC)in vSphere 5.0&#8243;</li>
<li>It states that &#8220;Non-uniform&#8221; are ESXi hosts only connected to the storage node(s) in the same site. Paths presented to ESXi hosts from storage nodes are limited to local site.</li>
</ul>
<li>Note that in this case not only is &#8220;non-uniform&#8221; a requirement, you will also need to adhere to the latency and replication type requirements as listed above.</li>
</ul>
<p>Yes I realize this is not a perfect way of navigating through the HCL and have already reached out to the people responsible for it.</p>
<p><a href="http://farm7.static.flickr.com/6050/6219765536_91b952b197_b.jpg"><img class="alignnone colorbox-9252" title="vMSC HCL" src="http://farm7.static.flickr.com/6050/6219765536_91b952b197.jpg" alt="" width="500" height="273" /></a></p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/07/vsphere-metro-storage-cluster-solutions-what-is-supported-and-what-not/">vSphere Metro Storage Cluster solutions, what is supported and what not?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2011/10/07/vsphere-metro-storage-cluster-solutions-what-is-supported-and-what-not/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>vSphere 5.0 HA and metro / stretched cluster solutions</title>
		<link>http://www.yellow-bricks.com/2011/10/05/vsphere-5-0-ha-and-metro-stretched-cluster-solutions/</link>
		<comments>http://www.yellow-bricks.com/2011/10/05/vsphere-5-0-ha-and-metro-stretched-cluster-solutions/#comments</comments>
		<pubDate>Wed, 05 Oct 2011 12:20:55 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[5]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[ha]]></category>
		<category><![CDATA[metro]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9186</guid>
		<description><![CDATA[<p>I had a discussion via email about metro clusters and HA last week and it made me realize that HA’s new architecture (as part of vSphere 5.0) might be confusing to some. I started re-reading the article which I wrote a while back about HA and metro / stretched cluster configurations and most actually still applies. Before you read this [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/05/vsphere-5-0-ha-and-metro-stretched-cluster-solutions/">vSphere 5.0 HA and metro / stretched cluster solutions</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>I had a discussion via email about metro clusters and HA last week and it made me realize that HA’s new architecture (as part of vSphere 5.0) might be confusing to some. I started re-reading the article which I wrote a while back about HA and metro / stretched cluster configurations and most actually still applies. Before you read this article I suggest reading the <a href="http://www.yellow-bricks.com/vmware-high-availability-deepdiv/">HA Deepdive Page</a> as I am going to assume you understand some of the basics. I also want to point you to <a href="http://www.vmware.com/resources/compatibility/search.php?deviceCategory=san&amp;productid=12738&amp;releaseid=76&amp;deviceCategory=san&amp;partner=30&amp;keyword=VPLEX&amp;isSVA=1&amp;page=1&amp;display_interval=10&amp;sortColumn=Partner&amp;sortOrder=Asc">this section</a> in our HCL which lists the certified configurations for vMSC (vSphere Metro Storage Cluster). You can select the type of storage like for instance &#8220;FC Metro Cluster Storage&#8221; in the &#8220;Array Test Configuration&#8221; section.</p>
<p>In this article I will take a single scenario and explain the different type of failures and how HA underneath handles this. I guess the most important part in these scenarios is why HA or did not respond to a failure.</p>
<p>Before I will explain the scenario I want to briefly explain the concept of a metro / stretched cluster, which can be carved up in to two different type of solutions. The first solution is where a synchronous copy of your datastore is available on the other site, this mirror copy will be read-only. In other words there is a read-write copy in Datacenter-A and a read-only copy in Datacenter-B. This means that your VMs in Datacenter-B located on this datastore will do I/O on Datacenter-A since the read-write copy of the datastore is in Datacenter-A. The second solution is which EMC calls &#8220;write anywhere&#8221;. In this case VMs always write locally. The key point here is that each of the LUNs / datastores has a “preferred site” defined, this is also sometimes referred to as &#8220;site bias&#8221;. In other words, if anything happens to the link in between then the storage system on the preferred site for a given datastore will be the only one left who can read-write access it. This of course to avoid any data corruption in the case of a failure scenario.</p>
<p>In this article we will use the following scenario:</p>
<ul>
<li>2 sites</li>
<li>18 hosts</li>
<li>50KM in between sites</li>
<li>Preferred (aka “Should”) VM-Host affinity rule to create &#8220;Datacenter Affinity &#8211; I/O Locality&#8221;</li>
<li>Designated heartbeat datastores<br />
(read <a href="http://www.yellow-bricks.com/2011/07/26/ha-architecture-series-datastore-heartbeating-35/" target="_blank">this article</a> more details on heartbeat datastores)</li>
<li>Synchronous mirrored datastores<br />
(Please note that I have not depicted the &#8220;mirror&#8221; copy just to simply the diagram)</li>
</ul>
<p>This is what it will look like:</p>
<p><a href="http://farm7.static.flickr.com/6194/6210963932_069dae4cde_b.jpg"><img class="alignnone colorbox-9186" title="stretched cluster scenario" src="http://farm7.static.flickr.com/6194/6210963932_069dae4cde.jpg" alt="" width="500" height="272" /></a></p>
<p>What do you see in this diagram? Each site will have 9 hosts. The HA master is located in Datacenter-A. Each host will use the designated &#8220;heartbeat datastore&#8221; in each of the datacenters, note that I only drew the line for the lower left ESXi host just to simplify the diagram.</p>
<p>There are many failures which can occur but HA will be unaware of many of these. I will not discuss these as they are explained in-depth in the storage vendor’s documentation. I will however discuss the following &#8220;common&#8221; failures:</p>
<ul>
<li>Host failure in Datacenter-A</li>
<li>Storage Failure in Datacenter-A</li>
<li>Loss of Datacenter-A</li>
<li>Datacenter Partition</li>
<li>Storage Partition</li>
</ul>
<p><strong>Host failure in Datacenter-A</strong></p>
<p>When a host fails in Datacenter-A this is detected by the HA master node as network heartbeats from it are not received any longer. When the master has detected network heartbeats are missing it will start monitoring for datastore heartbeats. As the host has failed there will be no datastore heartbeats issued. During this time a third liveness check will be done, which is pinging the management addresses of the failed hosts. If all of these liveness checks are unsuccessful, the master will declare the host dead, and will attempt to restart all the protected virtual machines that were running on the host before the master lost contact with it. . The rules defined on a cluster level are “preferred rules” and as such the virtual machine can be restarted on the other site. If DRS is enabled then it will attempt to correct any sub-optimal placements HA made during the restart of your virtual machines. This same scenario also applies to a situation where all hosts fail in one site without storage being affected.</p>
<p><strong>Storage Failure in Site-A</strong></p>
<p>In this scenario only the storage system fails in Site-A. This failure does not result in down time for your VMs. What will happen? Simply said the mirror, read-only, copy of your datastore will become read-write and be presented with the same identifier and as such the hosts will be able to write to these volumes without the need to resignature. (This sounds very simple of course, but do note I am describing it on an extremely high level and in most solutions manual intervention is required to indicate a failure has occurred.) In most cases however, from a VMs perspective this happens seamlessly. It should be noted that all I/O will now go across your link to the other site. Note that HA is not aware of this failure. Although the storage heartbeat might be lost for a second, HA will not take action as a HA master agent only checks for the storage heartbeat when the network heartbeat has not been received for three seconds.</p>
<p><strong>Loss of Datacenter-A</strong></p>
<p>This is basically a combination of the first and the second failure we’ve described. In this scenario, the hosts in Datacenter-B will lose contact with the master and elect a new one. The new master will access the per-datastore files HA uses to record the set of protected VMs, and so determine the set of HA protected VMs. The master will then attempt to restart any VMs that are not already running on its host and the other hosts in Datacenter-B.  At the same time, the master will do the liveness checks noted above, and after 50 seconds, report the hosts of Datacenter-A as dead. From a VMs perspective the storage fail-over could occur seamlessly. The mirror copy of the datastore promoted to Read-write and the hosts on Datacenter-B will be able to access the datastores which were local to Datacenter-A.</p>
<p><strong>Datacenter Partition</strong></p>
<p>This is where most people feel things will become tricky. What happens if your link between the two datacenters fails? Yes I realize that the chances of this happening are slim as you would typically have redundancy on this layer, but I do think it is an interesting one to explain. The main thing to realize here is that with these types of failures VM-Host affinity, or should I call it VM-Datacenter affinity, is very important suddenly, but we will come back to that in a second. Lets explain the scenario first.</p>
<p>In this case a distinction could be made between the two types of metro cluster as briefly explained before. There is a solution, which EMC likes to call “write anywhere” which basically presents a virtual datastore across Datacenters and allows writes on both sites. On the other hand there is the traditional stretched cluster solution where there’s only 1 site actively handling I/Os and a “passive” site which will be used in the case a fail-over need to occur. In the case of “write anywhere” a so-called site bias or preference is defined per datastore. Both scenarios however are similar  in terms of that a given datastore will in the case of a failure only be accessible on one site.</p>
<p>If the link between the sites should fail the datastore would become active on just one of the sites. What would happen to the VM that is running on Datacenter-B but has its files stored on a datastore which was configured with site bias for Datacenter-A?</p>
<p>In the case where a VM is running on Datacenter-B the VM would have its storage “yanked” out underneath of it. The VM would more than likely keep running and keep retrying the I/O. However as the link has been broken between the sites, HA in Datacenter-A will try to restart the workload. Why is that?</p>
<ul>
<li>The network heartbeat is missing because the link dropped</li>
<li>The datastore heartbeat is missing because the link dropped and the datastore becomes inaccessible from Datacenter-B</li>
<li>A ping to the management address of the host fails because the link is missing</li>
<li>The master for Datacenter-A knows the VM was powered on before the failure, and since it can’t communicate with the VM’s host in Datacenter-B after the failure, it will attempt to restart the VM.</li>
</ul>
<p>What happens in Datacenter-B? In Datacenter-B a master is elected. This master will determine the VMs that need to be protected. Next it will attempt to restart all those that it knows are not already running.  Any VMs biased to this site that are not already running will be powered on. However any VMs biased to the other site, won’t be as the datastore is inaccessible. HA will report a restart failure for the latter since it does not know that the VMs are (still) running in the other site.</p>
<p>Now you might wonder what will happen if the link returns? This is the classic “VM split brain scenario”. For a short period of time you will have 2 active copies of the VM on your network  both with the same mac address. However only one copy will have access to the VM’s files and HA will recognize this. As soon as this is detected the VM copy that has no access to the VMDK will be powered off.</p>
<p>I hope all of you understand why it is important to understand what the preferential site is for your datastore as it can and probably will impact your up-time. Also note that although we defined VM-Host affinity rules these are preferential / should rules and can be violated by both DRS.</p>
<p><strong>Storage Partition</strong></p>
<p>This is the final scenario. In this scenario only the storage connection between Datacenter-A and Datacenter-B fails. What happens in this case? This scenario is very straight forward. As the HA master is still receiving network heartbeats it will not take any action unfortunately currently. This is very important to realize. Also HA will not be aware of these rules. HA will restart virtual machines where ever it feels it should, DRS however should move the virtual machines to the correct location based on these rules. That is, if and when correctly defined of course!</p>
<p><strong>Summarizing</strong></p>
<p>Stretched Clusters are in my opinion great solutions to increase resiliency in your environment. There is however always a lot of confusion around failure scenarios and the different type of responses from both the vSphere layer and the Storage layer. In this article I have tried to explain how vSphere HA responds to certain failures in a stretched / metro cluster environment. I hope this will help everyone getting a better understanding of vSphere HA. I also fully realize that things can be improved by a tighter integration between HA and your storage systems, for now all I can is that this is being worked on. I want to finish with a quick summary of some vSphere 5.0 HA design consideration for  stretched cluster environments:</p>
<ul>
<li>Das.isolationaddress per Datacenter!<br />
Having multiple isolation addresses will help your hosts understanding if they are isolated or if the master has isolated in the case of a network failure.</li>
<li>Designated heartbeat datastores per Datacenter!<br />
Each site will need a designated heartbeat datastore to ensure each site can at a minimum update the heartbeat region of the site local storage.</p>
<ul>
<li>If there are multiple storage systems on each site it is recommended to increase the number of heartbeat datastores to four, two for each site.</li>
</ul>
</li>
<li>Define VM-Host affinity rules as it can lower the impact during a failure and it can help keeping I/O local</li>
</ul>
<p>Thanks for taking the time to read this far, and don&#8217;t hesitate to leave a comment if you have any questions or feedback / remarks.</p>
<p>&lt;edit&gt; completely coincidentally Chad Sakac <a href="http://virtualgeek.typepad.com/virtual_geek/2011/10/new-vmware-hcl-category-vsphere-metro-stretched-cluster.html">posted an article</a> about Stretched Clusters and the new HCL Category. Read it!&lt;/edit&gt;</p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/05/vsphere-5-0-ha-and-metro-stretched-cluster-solutions/">vSphere 5.0 HA and metro / stretched cluster solutions</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2011/10/05/vsphere-5-0-ha-and-metro-stretched-cluster-solutions/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Datastore Heartbeating and preventing Isolation Events?</title>
		<link>http://www.yellow-bricks.com/2011/10/03/datastore-heartbeating-and-preventing-isolation-events/</link>
		<comments>http://www.yellow-bricks.com/2011/10/03/datastore-heartbeating-and-preventing-isolation-events/#comments</comments>
		<pubDate>Mon, 03 Oct 2011 13:57:41 +0000</pubDate>
		<dc:creator>Duncan Epping</dc:creator>
				<category><![CDATA[BC-DR]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[5]]></category>
		<category><![CDATA[5.0]]></category>
		<category><![CDATA[ha]]></category>
		<category><![CDATA[vSphere]]></category>

		<guid isPermaLink="false">http://www.yellow-bricks.com/?p=9189</guid>
		<description><![CDATA[<p>I was just listening to some of the VMworld sessions and one was about HA. The presenter had a section about Datastore Heartbeats and mentioned that Datastore Heartbeats was added to prevent &#8220;Isolation Events&#8221;. I&#8217;ve heard multiple people make this statement over the last couple of months and I want to make it absolutely clear that this is NOT true. [...]</p><p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/03/datastore-heartbeating-and-preventing-isolation-events/">Datastore Heartbeating and preventing Isolation Events?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></description>
			<content:encoded><![CDATA[<p>I was just listening to some of the VMworld sessions and one was about HA. The presenter had a section about Datastore Heartbeats and mentioned that Datastore Heartbeats was added to prevent &#8220;Isolation Events&#8221;. I&#8217;ve heard multiple people make this statement over the last couple of months and I want to make it absolutely clear that this is NOT true. Let me repeat this, Datastore Heartbeats do not prevent an isolation event from occurring.</p>
<p>Lets explain this a bit more in-depth. What happens when a Host is cut off from the network because its NIC which carries the management traffic has just failed?</p>
<ol>
<li>T0 – Isolation of the host (slave)</li>
<li>T10s – Slave enters “election state”</li>
<li>T25s – Slave elects itself as master</li>
<li>T25s – Slave pings “isolation addresses”</li>
<li>T30s – Slave declares itself isolated and “triggers” isolation response</li>
</ol>
<p>Now as you can see the Datastore Heartbeat mechanism plays no role whatsoever in the process for declaring a host isolated, or does it? No from the perspective of the host which is isolated it does not. The Datastore Heartbeat mechanism is used by the master to determine the state of the unresponsive host. The Datastore Heartbeat mechanism allows the the master to determine if the host which stopped sending network heartbeats is isolated or has failed completely. Depending on the determined state the master will take appropriate action.</p>
<p>To summarize, the datastore heartbeat mechanism has been introduced to allow the master to identify the state of hosts and is not use by the &#8220;isolated host&#8221; to prevent isolation.</p>
<p><div style="border: 1px solid gray; background-color:#CCCCCC;margin: 0px 0pt 0px 0px; padding: 5px;">

"<a href="http://www.yellow-bricks.com/2011/10/03/datastore-heartbeating-and-preventing-isolation-events/">Datastore Heartbeating and preventing Isolation Events?</a>" originally appeared on <a href="http://www.yellow-bricks.com">Yellow-Bricks.com</a>. Follow us on <a href="http://www.twitter.com/DuncanYB">Twitter</a> and <a href="http://www.facebook.com/pages/Yellow-Bricks-virtualization-blog/132292893499196">Facebook</a>.<br>
Available now: vSphere 5 Clustering Deepdive. (<a href="http://www.amazon.com/dp/1463658133/ref=as_li_qf_sp_asin_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=1463658133&adid=07SG91DX7FQT2HS66PMM"><strong>paper</strong></a> | <a href="https://www.amazon.com/dp/B005C1SARM/ref=as_li_tf_til?tag=yellowbricks-20&camp=0&creative=0&linkCode=as1&creativeASIN=B005C1SARM&adid=16Q69JRGDTX1DHPRKTQM&"><strong>e-book</strong></a>)</div><br><br></p>]]></content:encoded>
			<wfw:commentRss>http://www.yellow-bricks.com/2011/10/03/datastore-heartbeating-and-preventing-isolation-events/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

