• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

u1

vSphere HA configuration for HCI Mesh!

Duncan Epping · Oct 29, 2020 · 7 Comments

I wrote a vSAN HCI Mesh Considerations blog post a few weeks ago. Based on that post I received some questions, and one of the questions was around vSphere HA configurations. Interestingly I also had some internal discussions around how vSAN HCI Mesh and HA were integrated. Based on the discussions I did some testing just to validate my understanding of the implementation.

Now when it comes to vSphere HA and vSAN the majority of you will be following the vSAN Design Guide and understand that having HA enabled is crucial for vSAN. Also when it comes to vSAN configuring the Isolation Response is crucial, and of course setting the correct Isolation Address. However, so far there’s been an HA feature which you did not have to configure for vSAN and HA to function correctly, and that feature is VM Component Protection aka APD / PDL responses.

Now, this changes with HCI Mesh. Specifically for HCI Mesh the HA and vSAN team have worked together to detect APD (all paths down) down scenarios! When would this happen? Well if you look at the below diagram you can see that we have “Client Clusters” and a “Server Cluster”. The “Client Cluster” consumes storage from the “Server Cluster”. If for whatever reason a host in the “Client Cluster” loses access to the “Server Cluster”, it results in the VMs on that host consuming storage on the “Server Cluster” to lose access to the datastore. This is essentially an APD (all paths down) scenario.

Now, to ensure the VMs are protected by HA for this situation you only need to enable the APD response. This is very straight-forward. You simply go to the HA cluster settings and set the “Datastore with APD” setting to either “Power off and restart VMs – Conservative” or “Power off and restart VMs – Aggressive”. The difference between conservative and aggressive is that with conservative HA will only kill the VMs when it knows for sure the VMs can be restarted, wherewith aggressive it will also kill the VMs on a host impacted by an APD while it isn’t sure it can restart the VMs. Most customers will use the “Conservative Restart Policy” by the way.

As I also mentioned in the HCI Mesh Considerations blog, one thing I would like to call out is the timing for the APD scenario: The APD is declared after 60 seconds, after which the APD response (restart) is triggered automatically after 180 seconds. Mind that this is different than with an APD response with traditional storage, as with traditional storage it will take 140 seconds before the APD is declared. You can, of course, in the log file see that an APD is detected, declared and VMs are killed as a result. Note that the “fdm.log” is quite verbose, so I copied only the relevant lines from my tests.

APD detected for remote vSAN Datastore /vmfs/volumes/vsan:52eba6db0ade8dd9-c04b1d8866d14ce5
Go to terminate state for VM /vmfs/volumes/vsan:52eba6db0ade8dd9-c04b1d8866d14ce5/a57d9a5f-a222-786a-19c8-0c42a162f9d0/YellowBricks.vmx due to APD timeout (CheckCapacity:false)
Failover operation in progress on 1 Vms: 1 VMs being restarted, 0 VMs waiting for a retry, 0 VMs waiting for resources, 0 inaccessible vSAN VMs.

Now for those wondering if it actually works, of course, I tested it a few times and recorded a demo, which can be watched on youtube (easier to follow in full screen), or click play below. (Make sure to subscribe to the channel for the latest videos!)

I hope this helps!

VMware vSphere Cluster Services (vCLS) considerations, questions and answers.

Duncan Epping · Oct 9, 2020 · 34 Comments

In the vSphere 7.0 Update 1 release VMware introduced a new service called the VMware vSphere Cluster Services (vCLS). vCLS provides a mechanism that allows VMware to decouple both vSphere DRS and vSphere HA from vCenter Server. Niels Hagoort wrote a lengthy article on this topic here. You may wonder why VMware introduces this, well as Niels states. by decoupling the clustering services (DRS and HA) from vCenter Server via vCLS we ensure the availability of critical services even when vCenter Server is impacted by a failure.

vCLS is a collection of multiple VMs which, over time, will be the backbone for all clustering services. In the 7.0 U1 release a subset of DRS functionality is enabled through vCLS. Over the past week(s) I have seen many questions coming in and I wanted to create a blog with answers to these questions. When new questions or considerations come up, I will add these to the list below.

[Read more…] about VMware vSphere Cluster Services (vCLS) considerations, questions and answers.

Site locality in a vSAN Stretched Cluster?

Duncan Epping · May 28, 2019 · 7 Comments

On the community forums, a question was asked around the use of site locality in a vSAN Stretched Cluster. When you create a stretched cluster in vSAN you can define within a policy how the data needs to be protected. Do you want to replicate across datacenters? Do you want to protect the “site local data” with  RAID-1 or RAID-5/6? All of these options are available within the UI.

What if you decide to not stretch your object across locations, is it mandatory to specify which datacenter the object should reside in?

The answer is simple: no it is not. The real question, of course is, would be: should you define the location? Most definitely! If you wonder how to do this, simplicy specify it within the policy you define for these objects as follows:

The above screenshot is taken from the H5 client, if you are still using the Web Client it probably looks slightly different (Thanks Seamus for the screenshot):

Why would you do this? Well, that is easy to explain. When the objects of a VM get provisioned the decision will be made per object where to place it. If you have multiple disks, and you haven’t specified the location, you could find yourself in the situation where disks of a single non-stretched VM are located in different datacenters. This is, first of all, terrible for performance, but maybe more importantly also would impact availability when anything happens to the network between the datacenters. So when you use site locality for non-stretched VMs, make sure to also configure the location so that your VM and objects will align as demonstrated in the below diagram.

 

Impact of adding Persistent Memory / Optane Memory devices to your VM

Duncan Epping · May 22, 2019 ·

I had some questions around this in the past month, so I figured I would share some details around this. As persistent memory (Intel Optane Memory devices for instance) is getting more affordable and readily available more and more customers are looking to use it. Some are already using it for very specific use cases, usually in situations where the OS and the App actually understand the type of device being presented. What does that mean? At VMworld 2018 there was a great session on this topic and I captured the session in a post. Let me copy/paste the important bit for you, which discusses the different modes in which a Persistent Memory device can be presented to a VM.

  • vPMEMDisk = exposed to guest as a regular SCSI/NVMe device, VMDKs are stored on PMEM Datastore
  • vPMEM = Exposes the NVDIMM device in a “passthrough manner, guest can use it as block device or byte addressable direct access device (DAX), this is the fastest mode and most modern OS’s support this
  • vPMEM-aware = This is similar to the mode above, but the difference is that the application understands how to take advantage of vPMEM

But what is the problem with this? What is the impact? Well when you expose a Persistent Memory device to the VM, it is not currently protected by vSphere HA, even though HA may be enabled on your cluster. Say what? Yes indeed, the VM which has the PMEM device presented to it will be disabled for vSphere HA! I had to dig deep to find this documented anywhere, and it is documented in this paper. (Page 47, at the bottom.) So what works and what not? Well if I understand it correctly:

  • vSphere HA >> Not supported on vPMEM enabled VMs, regardless of the mode
  • vSphere DRS >> Does not consider vPMEM enabled VMs, regardless of the mode
  • Migration of VM with vPMEM / vPMEM-aware >> Only possible when migrating to host which has PMEM
  • Migration of VM with vPMEMDISK >> Possible to a host without PMEM

Also note, as a result (data is not replicated/mirrored) a failure could potentially lead to loss of data. Although Persistent Memory is a great mechanism to increase performance, it is something that should be taken into consideration when you are thinking about introducing it into your environment.

Oh, if you are wondering why people are taking these risks in terms of availability, Niels Hagoort just posted a blog with a pointer to a new PMEM Perf paper which is worth reading.

 

 

DRS Advanced Setting IsClusterManaged

Duncan Epping · May 7, 2019 ·

On Reddit, someone asked what DRS advanced setting IsClusterManaged does and if it is even legit. I can confirm it is legit, it is a setting which was introduced to prevent customers from disabling DRS while the cluster is managed by vCloud Director for instance. As disabling DRS would lead to deleting resource pools, which would be a very bad situation to find yourself in when you run vCloud Director as it leans on DRS Resource Pools heavily. So if you see the advanced setting IsClusterManaged in your environment for DRS, just leave it alone, it is there for a reason. (Most likely because you are using something like vCloud Director…)

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 10
  • Go to Next Page »

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the HCI BU at VMware. He is a VCDX (# 007) and the author of multiple books including "vSAN Deep Dive" and the “vSphere Clustering Technical Deep Dive” series.

Upcoming Events

04-Feb-21 | Czech VMUG – Roadshow
25-Feb-21 | Swiss VMUG – Roadshow
04-Mar-21 | Polish VMUG – Roadshow
09-Mar-21 | Austrian VMUG – Roadshow
18-Mar-21 | St Louis Usercon Keynote

Recommended reads

Sponsors

Want to support us? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2021 · Log in