edge

Witness resiliency feature with a 2-node cluster

Duncan Epping · Oct 9, 2023 · Leave a Comment

A few weeks ago I had a conversation with a customer about a large vSAN ESA 2-node deployment they were planning for. One of the questions they had was if they would have a 2-node configuration with nested fault domains if they would be able to tolerate a witness failure after one of the node had gone down. I tested this for a stretched cluster, but I hadn’t tested it with a 2-node configuration. Will we actually see the votes be re-calculated after a host failure, and will the VM remain up and running when the witness fails after the votes have been recalculated?

Let’s just test it, and look at RVC at what happens in each case. Let’s look at the healthy output first, then we will look at a host failure, followed by the witness failure:

Healthy

    DOM Object: 71c32365-667e-0195-1521-0200ab157625 
      RAID_1
        Concatenation
          Component: 71c32365-b063-df99-2b04-0200ab157625 
            votes: 2, usage: 0.0 GB, proxy component: true
          RAID_0
            Component: 71c32365-f49e-e599-06aa-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: true
            Component: 71c32365-681e-e799-168d-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: true
            Component: 71c32365-06d3-e899-b3b2-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: tru
        Concatenation
          Component: 71c32365-e0cb-ea99-9c44-0200ab157625 
            votes: 1, usage: 0.0 GB, proxy component: false
          RAID_0
            Component: 71c32365-6ac2-ee99-1f6d-0200ab157625 
               votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-e03f-f099-eb12-0200ab157625 
               votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-6ad0-f199-a021-0200ab157625 
               votes: 1, usage: 0.0 GB, proxy component: false
      Witness: 71c32365-8c61-f399-48c9-0200ab157625 
        votes: 4, usage: 0.0 GB, proxy component: false

1 host down, as you can see the votes for the witness changed, of course the staste also changed from “active” to “absent”.

    DOM Object: 71c32365-667e-0195-1521-0200ab157625 
      RAID_1
        Concatenation (state: ABSENT (6)
          Component: 71c32365-b063-df99-2b04-0200ab157625 
            votes: 1, proxy component: false
          RAID_0
            Component: 71c32365-f49e-e599-06aa-0200ab157625 
              votes: 1, proxy component: false
            Component: 71c32365-681e-e799-168d-0200ab157625 
              votes: 1, proxy component: false
            Component: 71c32365-06d3-e899-b3b2-0200ab157625 
              votes: 1, proxy component: false
        Concatenation
          Component: 71c32365-e0cb-ea99-9c44-0200ab157625 
             votes: 2, usage: 0.0 GB, proxy component: false
          RAID_0
            Component: 71c32365-6ac2-ee99-1f6d-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-e03f-f099-eb12-0200ab157625
              votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-6ad0-f199-a021-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: false
      Witness: 71c32365-8c61-f399-48c9-0200ab157625 
        votes: 1, usage: 0.0 GB, proxy component: false

And after I failed the witness, of course we had to check if the VM was still running and didn’t show up as inaccessible in the UI, and it did not. vSAN and the Witness Resilience feature worked as I expected it would work. (Yes, I double checked it through RVC as well, and the VM was “active”.)

Unexplored Territory Podcast #032 – IT giving McLaren Racing the edge! Featuring Edward Green (McLaren Racing)

Duncan Epping · Dec 1, 2022 · Leave a Comment

In this special episode, we talk to Edward Green (Head of Commercial Technology, McLaren Racing) and Joe Baguley (CTO EMEA, VMware) about VMware’s partnership with McLaren Racing. Joe talks about his passion for motorsports, and how VMware got involved with McLaren. Edward dives deep into what IT means for McLaren Racing. What does their data center look like at the track? What kind of datasets are collected? Can the size of the dataset impact the race results? How does McLaren Racing provide fantastic experiences through its Paddock Club? All of that, and much more, is discussed by Edward Green in this episode of the Unexplored Territory Podcast! Listen via Spotify (https://spoti.fi/3Xx0orT), Apple (https://apple.co/3UdrJfC), or simply use the embedded player below!

Joined GigaOm’s David S. Linthicum on a podcast about cloud, HCI and Edge.

Duncan Epping · Oct 14, 2019 ·

A while ago I had the pleasure to join David S. Linthicum from GigaOm on their Voices in Cloud Podcast. It is a 22 minute podcast where we discuss various VMware efforts in the cloud space, edge computing and of course HCI. You can find the episode here, where they also have the full transcript for those who prefer to read instead of listen to a guy with a Dutch accent. It was a fun experience for sure, I always enjoy joining podcast’s and talking tech… So if you run a podcast and are looking for a guest, don’t hesitate to reach out!

Of course you can also find Voices in Cloud on iTunes, Google Play, Spotify, Stitcher, and other platforms.

Project nanoEDGE aka tiny vSphere/vSAN supported configurations!

Duncan Epping · Oct 10, 2019 ·

A few weeks ago VMware announced Project nanoEDGE on their blog virtual blocks. I had a whole bunch of questions the following days from customers and partners interested in understanding what it is and what it does. I personally prefer to call project nanoEDGE “a recipe”. In the recipe, it states which configuration would be supported for both vSAN as well as vSphere. Lets be clear, this is not a tiny version of VxRail or VMware Cloud Foundation, this is a hardware recipe that should help customers to deploy tiny supported configurations to thousands of locations around the world.

Project nanoEDGE is a project by VMware principal system engineer Simon Richardson. The funny thing is that right around the time Simon started discussing this with customers to see if there would be interest in something like this, I had similar discussions within the vSAN organization. When Simon mentioned he was going to work on this project with support from the VMware OCTO organization I was thrilled. I personally believe there’s a huge market for this. I have had dozens of conversations over the years with customers who have 1000s of locations and are currently running single-node solutions. Many of those customers need to deliver new IT services to these locations and the requirements for those services have changed as well in terms of availability, which makes it a perfect play for vSAN and vSphere (with HA).

So first of all, what would nanoEDGE look like?

As you can see, these are tiny “desktop alike” boxes. These boxes are the Supermicro E300-9D and they come in various flavors. The recipe currently explains the solution as 2 full vSAN servers and 1 host which is used for the vSAN Witness for the 2 node configuration. Of course, you could also run the witness remotely, or even throw in a switch and go with a 3 node configuration. The important part here is that all used components are on both the vSphere as well as the vSAN compatibility guide! The benefit of using the 2-node approach is the fact that you can use cross-over cables between the vSAN hosts and avoid the cost of a 10GbE Switch as a result! So what is in the box? The bill of materials is currently as follows:

3x Supermicro E300-9D-8CN8TP
- The box comes with 4x 1GbE NIC Port and 2x 10GbE NIC Port
- 10GbE can be used for direct connect
- It has an Intel® Xeon® processor D-2146NT – 8 cores
6 x 64GB RAM
3 x PCIe Riser Card (RSC-RR1U-E8)
3 x PCIe M.2 NVMe Add on Card (AOC-SLG3-2M2)
3x Capacity Tier – Intel M.2 NVMe P4511 1TB
3x Cache Tier – Intel M.2 NVMe P4801 375GB
3x Supermicro SATADOM 64GB
1 x Managed 1GbE Switch

From a software point of view the paper lists they tested with 6.7 U2, but of course, if the hardware is on the VCG for 6.7 U3 than it will also be supported to run that configuration. Of course, the team also did some performance tests, and they showed some pretty compelling numbers (40.000+ read IOPS and close to 20.000 write IOPS), especially when you consider that these types of configurations would usually run 15-20 VMs in total. One thing I do want to add, the bill of materials lists M.2 form factor flash devices, this allows nanoEdge to avoid the use of the internal unsupported AHCI disk controller, this is key in the hardware configuration! Do note, that in order to fit two M.2 devices in this tiny box, you will need to also order the listed PCIe Riser Card and the M.2 NVMe add on card, William Lam has a nice article on this subject by the way.

There are many other options on the vSAN HCL for both caching as well as capacity, so if you prefer to use a different device, make sure it is listed here.

I would recommend reading the paper, and if you have an interest in this solution please reach out to your local VMware representative for more detail/help.

Project Dimension – VMware’s Edge Computing effort

Duncan Epping · Nov 20, 2018 ·

Internally some of my focus has been shifting, going forward I will spend more time on edge computing besides vSAN. Edge (and IoT for that matter) has had my interest for a while, and when VMware announced an edge project I was intrigued and interested instantly. At VMworld US the edge computing efforts were announced. The name for the effort is Project Dimension. There were several sessions at VMworld, and I would recommend watching those if you are looking for more info then provided below. The session out of which I took most of the below info was IOT2539BE, titled “Project Dimension: the easy button for edge computing” by Esteban Torres and Guru Shashikumar. Expect more content on Project Dimension in the future as I start getting involved more.

What is Project Dimension? What discussed at VMworld was the following:

A new VMware Cloud service; starting at edge locations
Enable enterprises to consume compute, storage, and networking at the edge like they consume public cloud
VMware will work with OEM partners to deliver and manage hyperconverged appliances in edge locations
- All appliances will be managed by VMware via VMware Cloud

So what does it include? Well as mentioned it includes hardware, the type etc hasn’t been mentioned, but it was said that Dell and Lenovo are the first two OEMs to support Project Dimension. This hyperconverged solution will include:

vSphere
vSAN
Velocloud

This solution will be managed by a “hybrid cloud control plane” as it is referred to, all by VMware. Architecturally this is what the service will look like:

Now what I found very interesting is that during the session someone asked about the potential for Dimension in on-prem datacenters, and the answer was: “Edge is where we are beginning, but the long-term plan is to offer the same model for data centers as well”. Some may notice that in the above list and diagram NSX is missing, as mentioned during the session, this is being planned for, but preferably will be a “lighter” flavor. What also stands out is that the HCI solution includes not only compute but also networking (switches and SD-WAN appliance).

Now, what is most interesting is the management aspect, VMware and the OEM partner will do the full maintenance/lifecycle management for you. This means that if something breaks the OEM will fix it, you as a customer however always contact VMware, single point of contact for everything. If there’s an upgrade then VMware will go through that motion for you. Every edge cluster for instance also has a vCenter Server instance, but you as an administrator/service owner will not be managing that vCenter Server instance, you will be managing the workloads that run in that environment. This to me makes sense, as when you scale out and potentially have hundreds or thousands of locations you don’t want to spend most of your time managing the infra for that, you want to focus on where the company’s revenue is.

Now getting back to the maintenance/upgrades. How does this work, how do you know you have sufficient capacity to allow for an upgrade to happen? VMware will also ensure this is possible by doing some form of admission control, which prevents you to claim 100% of the physical resources. Another interesting thing mentioned is that Dimension will allow you to chose when the upgrade or patches will be applied. In most environments maintenance will have an impact on workloads in some shape or form, so by providing blackout dates a peak season/time can be avoided.

From a hardware point of view and procurement perspective, this service is also different then you are used to. The services will be on a subscription basis. 1 year or 3-year reserved edge clusters, or more of course. And from a hardware perspective, it kind of aligns with what you typically see in the cloud: Small, Medium or Large instance. Which then refers to the number of resources you get per node. Starting with 3 nodes, of course, have the ability to scale up and potentially start smaller than 3 nodes in the future. The process in terms of sign up / procurement is displayed in the diagram below, delivery would be within 1-2 weeks, which seems extremely fast to me.

What I also found interesting was the mention of a “try and buy” option, you pay for 3 months and if you like it you keep it, and your 3 months contract will go to 1 year (or so) automatically.

At this point you may be asking: why is VMware doing this? Well, it is pretty simple: demand and industry changes. We are starting to see a clear trend, more and more workloads are shifting closer to the consumer. This allows our customers to process data faster and more importantly respond faster to the outcome, and of course, take action through machine learning. But the biggest challenge customers have is consistently managing these locations at a global scale, and this is what Project Dimension should solve. This is not just a challenge at the edge, but across edge, on-prem and public cloud if you ask me. There are so many moving parts, various different tools, and interfaces, which just makes things overly complex.

So what is VMware planning on delivering with Project Dimension? Consistently, reliable and secure hyperconverged infrastructure which is managed through a Cloud Control Plane (single pane of glass management for edge environments) and edge-to-cloud connectivity through Velocloud SD-WAN. (Management traffic for now, but “edge to edge” and “edge to on-prem” soon!) There’s a lot of innovation happening at the back-end when it comes to managing and maintaining 1000s of edge locations, but you as a customer are buying simplicity, reliability, and consistency.

Please note, Project Dimension is in beta, and the team is still looking for beta customers. You need to have a valid use case, as I can see some of you thinking “nice for a home lab for a couple of weeks”, but that, of course, is not what the team is looking for. For those who have a good use case, please go to the product page and leave your details behind: http://vmwa.re/dimension