Software Defined

Runecast Analyzer 3.0!

Duncan Epping · Aug 21, 2019 ·

This week I had a brief conversation with the folks from Runecast. I have been following them since day 1 and they have made a big impression on me from the start. During the conversation the Runecast folks shared with me that Runecast Analyzer 3.0 was going to be announced today and they gave a quick overview and demo of what would be announced and included in 3.0. They also quickly went over the functionality that was added the past year, some things which really were well adopted by customers were HIPAA and DISA-STIG compliance feature. Also Horizon support and security auto-remediation capabilities. Another thing that customers really appreciated were the upgradability simulations (beta feature), where Runecast validates your environment against the HCL.

Stan (Runecast CEO) also mentioned that this year Runecast signed up a customer with over 10k hosts, as you can imagine a lot of the work in the past 12 months was focused on scalability and performance at that level of scale. But that is not what today’s announcement is about, today Runecast is announcing 3.0. In 3.0 there are some great enhancements to the platform again. First of all, production-ready HCL Analysis for vSphere and vSAN. On top of that, the ESXi Upgrade Simulation is now GA, and the log analysis has been improved. Runecast is also introducing a new H5 Client plugin-in with new widgets and a dark theme! Just look at it below, you have got to love the dark theme!

But as I mentioned, there’s more to it than just the H5 Client Plugin, the HCL Analysis and the Upgrade Simulation are two key features if you ask me. During the demo, Stan showed me the below screen, and I think that by itself makes it worth testing out Runecast. It simply shows you in one overview if your environment is compliant to the HCL or not, and if it is not compliant, which combination of firmware and driver you should be using to make it compliant. In this example, the driver should be upgraded to 2.0.42. A very useful feature if you ask me. Note that this will work for both vSphere and vSAN and all components needed to run either of these.

Just as useful is the Upgrade Simulation by the way, are you considering upgrading? Make sure to run this first so you know if you will end up in a supported state or not?! And some of you may say that VMware has similar capabilities in their product, but the Runecast appliance doesn’t need to be connected to the internet at all times. You can regularly update the dataset and run these compliancy and upgrade checks (or any of the other checks) regularly offline. Especially for customers where internet access is challenging (dark sites) this is very helpful.

All in all, some very useful updates to an already very useful solution.

vSAN Deep Dive book available in traditional chinese

Duncan Epping · Aug 14, 2019 ·

It took a while, but it is my pleasure to announce that the publisher DrMaster just published the Traditional Chinese version of “VMware vSAN 6.7 U1 Deep Dive”. For those who would like to get a copy of the book in Traditional Chinese, there are a couple of ways to pick it up:

Tenlong Computer Books (Taiwan based bookstore):　
https://www.tenlong.com.tw/products/9789864344086
Books.Com.TW (One of the largest e-commerce website in Asia)　
https://www.books.com.tw/products/0010829405
DrMaster Press’s website
http://www.drmaster.com.tw/bookinfo.asp?BookID=MP11902

I would like to thank the folks at DrMaster for taking on the effort of translating and publishing it!

Does a vSAN IO Limit impact resync traffic?

Duncan Epping · Jun 12, 2019 ·

A question just came in, and I figured other people may have the same question so I would share it. The question was if a vSAN IO limit would impact resync traffic or for instance SvMotion? In this case the customer defines limits within each policy to ensure VMs do not interfere with other VMs or excessively uses IO resources. Especially in cloud environments this can be useful, or when running production and test/dev on the same cluster. The concern, of course, was if this limit would impact for instance recovery times after a failure. Because you can imagine that a limit of 50 IOPS would be devastating when a VM (or multiple VMs) need to have objects resynced.

The answer is simple: no, the IO limit specified within a policy does not impact resync traffic (or SvMotion for that matter). It only applies to Guest IO to a VMDK, namespace or swap object. Which means that it is safe to set limits when it comes to recovery times.

Major vSAN Milestone: 20K customers – Celebrating by dropping the price of our book with 50%!

Duncan Epping · Jun 2, 2019 ·

I haven’t done one of these in a while, and as it is a question that comes up regularly during customer conversations I figured I would share a nice quote from the VMware earnings call. But before I do I want to thank every VMware employee, partner and customer who helped us reaching this major milestone. Sometimes customers ask how invested VMware is in storage, well very invested. Determined to remain the number 1 player in the hyperconverged and hybrid cloud world, and the below numbers show why!

vSAN license bookings grew over 50% year-over-year in Q1 with a total customer count growing to over 20,000. (seekingalpha.com)

Yes, that is 20,000 customers indeed. Actually, more than 20k customers. Which, again, is a great success and would not have been possible without the help from you guys. So to thank all of you Cormac and I have decided to lower the price of our book temporarily. For 1 week, today until Friday the 7th, we have lowered the price of the book by ~50%. This means that on the Amazon US store the book will be 20 USD for the paper version, and only 5 USD for the ebook. So pick it up! (It may take a day for the price change to reach some of the Amazon stores…) Please note, as an Amazon Associate I earn from below qualifying purchases.

Paper – https://amzn.to/2SFsKxF
Ebook – https://amzn.to/2L67DCl

It seems the price has been pushed down to all “local” amazon websites. So go to your local website and pick up the book for 50% of the previous price. Links to most localized websites here:

vSAN Stretched Cluster failure scenarios and component votes

Duncan Epping · Apr 3, 2019 ·

I was at a customer last week and had an interesting question about the vSAN voting mechanism. This customer had a stretched cluster and used RAID-5 within each location to protect the data on top of replicating across locations. During certain failure scenarios unexpectedly the data remained available, of course, it is great that you have higher availability than expected, but why did this happen? What this customer tested was powering off the Witness (which is deemed as a site failure) and next powered of 2 hosts in 1 location, which exceeds the “failures to tolerate” in a single location. You would expect, based on all documentation so far, that the data would be unavailable. Well for some VMs this was the case, but for others, that was not the case. Why is this? Well, it is all about the vote count in this case. Look at the below diagram and the number of votes for each component first.

In the above scenario if the Witness (W) fails we have 4 votes less. Out of a total of 13 that is not a problem. If two additional hosts fail, this is most likely still not a problem, even though you are exceeding the provided “failures to tolerate”. However, if by any chance Host1 is one of those failed hosts then you would lose quorum. Host1 has a component with 2 votes. So if host1 has failed and the witness has failed and host2 for instance, you have now lost 7 out of 13 votes. This means quorum is lost. Please note that that single component with 2 votes is random. For a different VM/Object it could be that the component which is placed on host6 or host7 has 2 votes.

Another thing to point out, if host5-8 all would fail the data is still available. However, if then host3 and host4 would fail the object would become unavailable. Even though you still would have quorum across locations, you have now also exceeded the specified “failures to tolerate” within the location. This is also something that will be taken in to account.

I hope that helps.