I have mentioned Runecast a fair amount on my blog, dating back to 2017, but somehow I forgot to blog our episode with Stan on the topic of Runecast on the Unexplored Territory Podcast. I just noticed it, so I figured I would share the episode with you folks. I have been a fan of their solution from day 1, and I would encourage people to look and what they have to offer, and of course listen to the episode. Listen via Spotify ( bit.ly/3Nr16nz), Apple ( bit.ly/43AlZlB) or the embedded player below!
Performance Management Object reduced availability on stretched cluster
I created a new lab environment not too long ago and I ran into this situation where the Performance Management Object showed up as Reduced Availability with no Rebuild in vSAN Skyline Health. This happened in my case because I created a Stretched Cluster configuration after I had already formed a cluster, which means that the performance management object was randomly placed across hosts without taking those “failure domains” into account. I completely forgot about it until someone on VMTN reminded me about this. I had two options, fix the existing perf database, or simply disable/enable the perf service to it is recreated.
As I had no data stored in the database I figured disable/enable is the easiest route. I looked for the option in vSphere 8.0 U1 but could not find it, it seems that the UI option no longer exists for whatever reason. How do I now disable/enable the service? Ruby vSphere Console (RVC) to the rescue!
When you log in to RVC you can simply run the following commands on the cluster object you want to disable/enable the performance service for. Fairly straight forward, and fixed the issue within a minute or so:
vsan.perf.stats_object_delete <cluster> vsan.perf.stats_object_create <cluster>
I also documented this in the vSAN 8.0 ESA Deep Dive Book by the way, you can buy a paper copy or ebook on Amazon.
vSAN Stretched Cluster failure matrix
The last couple of weeks I was involved internally in a discussion around the different vSAN stretched cluster failure scenarios. I wrote a lengthy email about how vSAN and HA would respond in certain scenarios. I have documented many of these over the years on my blog already, but never really published them as a whole.
In some of the scenarios below, I discuss a “partition”, a partition is a scenario where both the L3 connection to the witness is down and the inter site / inter switch link to the other site for one of the locations. So in the diagram above for instance, if I say that Site B is partitioned then it means that Site A can still communicate with the witness, but Site B cannot communicate with the Witness and cannot communicate with Site A either.
For all of the below scenarios the following applies, Site A is the preferred location and Site B is the secondary location. When it comes to the table, the first two columns refer to the policy setting for the VM as shown in the screenshot below. The third column refers to the location where the VM runs from a compute perspective. The fourth discusses the type of failure, and the fifth and sixth columns discuss the behavior witnessed.
Time to list the various scenarios, and no, it doesn’t include all failures that could occur but should discuss most scenarios which are important for a stretched cluster configuration. Do note, the below-discussed behavior will only be witnessed when the best practices, as documented here and here, are followed. Also note that the table has multiple pages, there are close to 30 scenarios described! If there are any questions feel free to leave a comment, if you feel a failure scenario is missing, also please leave a comment.
Site Disaster Tolerance | Failures to Tolerate | VM Location | Failure | vSAN behavior | HA behavior |
---|---|---|---|---|---|
None Preferred | No data redundancy | Site A or B | Host failure Site A | Objects are inaccessible if failed host contained one or more components of objects | VM cannot be restarted as object is inaccessible |
None Preferred | RAID-1/5/6 | Site A or B | Host failure Site A | Objects are accessible as there's site local resiliency | VM does not need to be restarted, unless VM was running on failed host |
None Preferred | No data redundancy / RAID-1/5/6 | Site A | Full failure Site A | Objects are inaccessible as full site failed | VM cannot be restarted in Site B, as all objects reside in Site A |
None Preferred | No data redundancy / RAID-1/5/6 | Site B | Full failure Site B | Objects are accessible, as only Site A contains objects | VM can be restarted in Site A, as that is where all objects reside |
None Preferred | No data redundancy / RAID-1/5/6 | Site A | Partition Site A | Objects are accessible as all objects reside in Site A | VM does not need to be restarted |
None Preferred | No data redundancy / RAID-1/5/6 | Site B | Partition Site B | Objects are accessible in Site A, objects are not accessible in Site B as network is down | VM is restarted in Site A, and killed by vSAN in Site B |
None Secondary | No data redundancy / RAID-1/5/6 | Site B | Partition Site B | Objects are accessible in Site B | VM resides in Site B, does not need to be restarted |
None Preferred | No data redundancy / RAID-1/5/6 | Site A | Witness Host Failure | No impact, witness host is not used as data is not replicated | No impact |
None Secondary | No data redundancy / RAID-1/5/6 | Site B | Witness Host Failure | No impact, witness host is not used as data is not replicated | No impact |
Site Mirroring | No data redundancy | Site A or B | Host failure Site A or B | Components on failed hosts inaccessible, read and write IO across ISL as no redundancy locally, rebuild across ISL | VM does not need to be restarted, unless VM was running on failed host |
Site Mirroring | RAID-1/5/6 | Site A or B | Host failure Site A or B | Components on failed hosts inaccessible, read IO locally due to RAID, rebuild locally | VM does not need to be restarted, unless VM was running on failed host |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site A | Full failure Site A | Objects are inaccessible in Site A as full site failed | VM restarted in Site B |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site A | Partition Site A | Objects are inaccessible in Site A as full site is partitioned and quorum is lost | VM restarted in Site B |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site A | Witness Host Failure | Witness object inaccessible, VM remains accessible | VM does not need to be restarted |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site B | Full failure Site A | Objects are inaccessible in Site A as full site failed | VM does not need to be restarted as it resides in Site B |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site B | Partition Site A | Objects are inaccessible in Site A as full site is partitioned and quorum is lost | VM does not need to be restarted as it resides in Site B |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site B | Witness Host Failure | Witness object inaccessible, VM remains accessible | VM does not need to be restarted |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site A | Network failure between Site A and B (ISL down) | Site A binds with witness, objects in Site B becomes inaccessible | VM does not need to be restarted |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site B | Network failure between Site A and B (ISL down) | Site A binds with witness, objects in Site B becomes inaccessible | VM restarted in Site A |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site A or Site B | Network failure between Witness and Site A/B | Witness object inaccessible, VM remains accessible | VM does not need to be restarted |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site A | Full failure Site A, and simultaneous Witness Host Failure | Objects are inaccessible in Site A and Site B due to quorum being lost | VM cannot be restarted |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site A | Full failure Site A, followed by Witness Host Failure a few minutes later | Pre vSAN 7.0 U3: Objects are inaccessible in Site A and Site B due to quorum being lost | VM cannot be restarted |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site A | Full failure Site A, followed by Witness Host Failure a few minutes later | Post vSAN 7.0 U3: Objects are inaccessible in Site A, but accessible in Site B as votes have been recounted | VM restarted in Site B |
Site Mirroring | No data redundancy / RAID-1/5/6 | Site B | Full failure Site B, followed by Witness Host Failure a few minutes later | Post vSAN 7.0 U3: Objects are inaccessible in Site B, but accessible in Site A as votes have been recounted | VM restarted in Site A |
Site Mirroring | No data redundancy | Site A | Full failure Site A, and simultaneous host failure in Site B | Objects are inaccessible in Site A, if components reside on failed host then object is inaccessible in Site B | VM cannot be restarted |
Site Mirroring | No data redundancy | Site A | Full failure Site A, and simultaneous host failure in Site B | Objects are inaccessible in Site A, if components do not reside on failed host then object is accessible in Site B | VM restarted in Site B |
Site Mirroring | RAID-1/5/6 | Site A | Full failure Site A, and simultaneous host failure in Site B | Objects are inaccessible in Site A, accessible in Site B as there's site local resiliency | VM restarted in Site B |
New book: VMware vSAN 8.0 U1 Express Storage Architecture Deep Dive!
We already gave some hints on twitter, and during an episode of the Unexplored Territory podcast, but here it finally is… The new book, the VMware vSAN 8.0 U1 Express Storage Architecture Deep Dive! It has been a year since we released the vSAN 7.0 U3 Deep Dive book, and with this brand new vSAN architecture being introduced in vSAN 8.0 we figured it was time to do a full overhaul of the book as well. Mind you, this new book purely deals with the Express Storage Architecture, aka vSAN ESA. This also means that some of the features which are not supported by ESA are not discussed in this book, for that you will need to buy the vSAN 7.0 U3 Deep Dive book, which covers OSA. Another big change is that we brought in a third author, we asked our good friend Pete Koehler to contribute to the book. Pete had done reviews of previous books, and considering the amount of material he produced for VMware Tech Marketing for vSAN (and ESA specifically) it made a lot of sense to bring him in!
VMware’s vSAN has rapidly proven itself in environments ranging from hospitals to oil rigs to e-commerce platforms and is the market leader in the hyperconverged space. Along the way, the world of IT has rapidly changed, not just from a software point of view, but also from a hardware perspective. With vSAN 8.0 VMware brought a new architecture to market called vSAN Express Storage Architecture (ESA). This architecture is highly optimized for today’s world of datacenter resources, be it CPU, memory, networking, or NVMe based flash storage.
The authors of the vSAN Deep Dive have thoroughly updated their definitive guide to this transformative technology. Writing for vSphere administrators, architects, and consultants, Cormac Hogan, Duncan Epping , and Pete Koehler explain what vSAN ESA is, why the architecture has changed, what it now offers, and how to gain maximum value from it. The book offers expert insight into preparation, installation, configuration, policies, provisioning, clusters, architecture, and more. You’ll also find practical guidance for using all data services, stretched clusters, two-node configurations, and cloud-native storage services.
Although we pressed publish on Tuesday, sometimes it takes a while before the book is available in all Amazon stores, but it should just trickle down in the upcoming 24-48 hours. The book is priced at 9.99 USD for the ebook and 29.99 USD for a paper copy, and is sold through Amazon only. Get it while it is hot, and we would appreciate it if you would use our referral links and leave a review when you finish it. Thanks for the support, and we hope you will enjoy it!
Of course, we also have the links to other major Amazon stores:
- United Kingdom – ebook – paper
- Germany – ebook – paper
- Netherlands – ebook – paper
- Canada – ebook – paper
- France – ebook – paper
- Spain – ebook – paper
- India – ebook
- Japan – ebook – paper
- Italy – ebook – paper
- Mexico – ebook
- Australia – ebook – paper
- Brazil – ebook
- Or just do a search in your local amazon store!
VMUG Advantage Homelab Group Buy Discount offer 2023! (Also for renewals!)
It is that time of the year again for many, time to renew your VMUG subscription. The minimum discount you will get is 12% and this can go up to 15% when the number of participant goes above 300, which drops the price down to 170 USD. What do you get when you sign up and buy a 12-month subscription?
- 365-day Evaluation Licenses
- Including vSphere 8, vSAN 8, Workstation 17 Pro, Fusion 13 Pro, NSX, vRealize, Horizon, and more!
- 20-35% discount on training and certification
- Access to “test drive“
- Advantage members receive a $ 100 USD VMware Explore discount (not stackable)
The VMUG Advantage Program comes at a cost of 200 USD. Last year the discount was 15%, which means the price ended up being 170 USD for a full year. If you have just one training course planned per year, the VMUG Advantage Program will have already paid for itself (20% discount on a 3000+ USD training course). Yes, I have been talking about USD so far, but of course, this offer is available to all our community members globally (Europe, APJ, Africa, Middle East, etc). Now, again, the discount percentage you get will depend on the number of people signing up for this year’s promotion, but even if only 1 person signs up (you) you will immediately get a 12% discount. The ranges look as follows:
Quantity | Discount | Cost |
---|---|---|
1-199 | 12% | $176 |
200-299 | 14% | $172 |
300+ | 15% | $170 |
If more than 1000 people sign up, VMUG HQ will also do a raffle and give away some cool VMUG Advantage “swag”. Can’t wait? Sign up for the discount code here, and join the program! Note, the survey is open for 2 weeks, so from the 19th of April 2023 until the 3rd of May, after the survey closes the discount code will be distributed to all those who signed up.
UPDATE: The goal has been reached, and you can get a 15% discount when using the code: ADV15OFF. Note: This promotion is only available until May 3rd, 2023!