• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

stretched

HA/DRS configuration with Virtual SAN Stretched Cluster environment

Duncan Epping · Sep 9, 2015 ·

This question is going to come sooner or later, how do I configure HA/DRS when I am running a Virtual SAN Stretched cluster configuration. I described some of the basics of Virtual SAN stretched clustering in a what’s new for 6.1 post already, if you haven’t read it then I urge you to do so first. There are a couple of key things to know, first of all the latency between data sites that can be tolerated is 5ms and to the witness location ~100ms.

If you look at the picture you below you can imagine that when a VM sits in Fault Domain A and is reading from Fault Domain B that it could incur a latency of 5ms for each read IO. From a performance perspective we would like to avoid this 5ms latency, so for stretched clusters we introduce the concept of read locality. We don’t have this in a non-stretched environment, as there the latency is microseconds and not miliseconds. Now this “read locality” is something we need to take in to consideration when we configure HA and DRS.

[Read more…] about HA/DRS configuration with Virtual SAN Stretched Cluster environment

VMworld Virtual SAN slidedecks up on slideshare

Duncan Epping · Sep 9, 2015 ·

I just posted the slidedecks that I presented at VMworld on Virtual SAN up on slideshare. The recording and the slides will probably at some point also show up on vmware.com but as I had many requests from people to share the material I figured I would do that straight after the event. If you have any questions don’t hesitate to ask.

Building a Stretched Cluster with Virtual SAN 6.1

Five common customer use cases for Virtual SAN

HA restarts in a DR/DA event

Duncan Epping · May 3, 2014 ·

I received a couple of questions last week about HA restarts in the scenario where a full site failure has occurred or a part of the storage system has failed and needs to be taken over by another datacenter. Yes indeed this is related to stretched clusters and HA restarts in a DR/DA event.

The questions were straight forward, how does the restart time-out work and what happens after the last retry? I wrote about HA restarts and the sequence last year, so lets just copy and paste that here:

  • Initial restart attempt
  • If the initial attempt failed, a restart will be retried after 2 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 4 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 8 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 16 minutes of the previous attempt

You can extend the restart retry by increasing the value “das.maxvmrestartcount”. And then after every 15/16 minutes a new restart will be attempted. The question this triggered though is why would it even take 4 retries? The answer I got was: we don’t know if we will be able to fail over the storage within 30 minutes and if we will have sufficient compute resources…

Here comes the sweet part about vSphere HA, it actually is a pretty smart solution, it will know if VMs can be restarted or not. In this case as the datastore is not available there is absolutely no point in even trying and HA as such will not even bother. As soon as the storage becomes available though the restart attempts will start. Same applies to compute resource, if for whatever reason there is insufficient unreserved compute resources to restart your VMs then HA will wait for them to become available… nice right!?! Do note I emphasized the word “unreserved” as that is what HA cares about and not actually about used resources.

Stretched vCloud Director infrastructure

Duncan Epping · Jan 15, 2013 ·

A while back I wrote about design considerations when designing or building a stretched vCloud Director infrastructure. Since then I have been working on a document in collaboration with Lee Dilworth, and this document should be out soon hopefully. As various people have asked for the document I decided to throw it in to this blog post so that the details are already out there.

** Disclaimer: this article has not been reviewed by the technical marketing team yet, this is a preview of what will possibly be published. When the official document is published I will add a link to this article **

Introduction

VMware vCloud® Director™ 5.1 (vCloud Director) gives enterprise organizations the ability to build secure private clouds that dramatically increase datacenter efficiency and business agility. Coupled with VMware vSphere® (vSphere), vCloud Director delivers cloud computing for existing datacenters by pooling vSphere virtual resources and delivering them to users as catalog-based services. vCloud Director helps you build agile infrastructure-as-a-service (IaaS) cloud environments that greatly accelerate the time-to-market for applications and responsiveness of IT organizations.

Resiliency is a key aspect of any infrastructure but is even more important in “Infrastructure as a Service” (IaaS) solutions. This solution overview was developed to provide additional insight and information in how to architect and implement a vCloud Director based solution on a vSphere Metro Storage Cluster infrastructure.

Architecture Introduction

This architecture consists of two major components. The first component is the geographically separated vSphere infrastructure based on stretched storage solution, here after referred to as the vSphere Metro Storage Cluster (vMSC) infrastructure. The second component is vCloud Director.

Note –  Before we dive in to the details of the solution we would like to call out the fact that vCloud Director is not site aware. If incorrectly configured availability could be negatively impacted in certain failure scenarios.

[Read more…] about Stretched vCloud Director infrastructure

Some questions about Stretched Clusters with regards to power outages

Duncan Epping · Oct 9, 2012 ·

Today I received an email about the vSphere Metro Storage Cluster paper I wrote, or better said about stretched clusters in general. I figured I would answer the questions in a blog post so that everyone can chip in / read etc. So lets show the environment first so that the questions are clear. Below is an image of the scenario.

Below are the questions I received:

If a power outage occurs at Frimley the 2 hosts get a message by the UPS that there is a power outage. After 5 minutes (or any other configured value) the next action should start. But what will be the next action? If a scripted migration to a host at Bluefin starts, will DRS move some VMs back to Frimley? Or could the VMs get a mark to stick at Bluefin? Should the hosts at Frimley placed into Maintenance mode so the migration will be done automatically? And what happens if there is a total power outage both at Frimley and Bluefin? How a controlled shutdown across hosts could be arranged?

Lets start breaking it down and answer where possible. The main question is how do we handle power outages. As in any datacenter this is fairly complex. Well the powering-off part is easy, powering everything on in the right order isn’t. So where do we start? First of all:

  1. If you have a stretched cluster environment and, in this case, Frimley data center has a power outage, it is recommended to place the hosts in maintenance mode. This way all VMs will be migrated to the Bluefin data center without disruption. Also, when power returns it allows you to do check on the host before introducing them to the cluster again.
  2. If maintenance mode is not used and a scripted migration is done virtual machines will be migrated back probably by DRS. DRS is triggered every 5 minutes (at a minimum). Avoid this, use maintenance mode!
  3. If there is an expected power outage and the environment is brought down it will need to be manually powered on in the right order. You can also script this, but a stretched cluster solution doesn’t cater for this type of failure unfortunately.
  4. If there is an unexpected power outage and the environment is not brought down then vSphere HA will start restarting virtual machines when the hosts come back up again. This will be done using the “restart priority” that you can set with vSphere HA. It should be noted that the “restart priority” is only about the completion of the power-on task, not about the full boot of the virtual machine itself.

I hope that clarifies things.

  • « Go to Previous Page
  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Go to page 4
  • Go to page 5
  • Go to Next Page »

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive", the “vSphere Clustering Technical Deep Dive” series, and the host of the "Unexplored Territory" podcast.

Upcoming Events

29-08-2022 – VMware Explore US
07-11-2022 – VMware Explore EMEA
….

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2022 · Log in