• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

futures

VMworld Session Summary: #HCI1207BU HCI Management: Current and Future

Duncan Epping · Aug 28, 2019 ·

I wasn’t able to sit in this session in person, but fortunately, the recording was out within a day. This session has been one of my favorite vSAN related sessions in the past couple of years, and it definitely is one of my favorite sessions this year. Again presented by Christian Dickmann and JunChi Zhang.

In this session, JunChi and Christian will go over some of VMware’s ideas around what HCI Management should look like in the future (HCI1207BU). Christian started by providing an overview of what HCI/vSAN is about today. Christian explained that the world of vSAN was relatively simple in the early years, but today the infrastructures where vSAN is being deployed, and the use-cases, are getting more complex. More and more customers are deploying in various locations and want to manage these solutions in a consistent and efficient way.

The topics which were covered in the session are Lifecycle Management, Intelligent Operations and Cloud Native Apps. JunChi started with Lifecycle Management. JunChi talked about what the vSAN team developed to make life easier when it comes to deploying vSphere. For instance, it is possible to specify in the vCenter Installer that vCenter needs to run on a new vSAN cluster and then the installer will create a single node vSAN cluster to ensure this is possible. Also, the Cluster Quickstart Guide was added, a great way to create and configure a vSphere cluster end to end. [Read more…] about VMworld Session Summary: #HCI1207BU HCI Management: Current and Future

HA Futures: Per VM Admission Control – Part 4 of 4 – (Please comment!)

Duncan Epping · Nov 2, 2018 ·

As admission control hasn’t evolved in the past years, we figured we would include another potential Admission Control change. Right now when you define admission control you do this cluster-wide. You can define you want to tolerate 1 failure for instance, but some VMs simply may be more important than other VMs. What do you do in that case?

Well if that is the case then with today’s implementation you are stuck. This became very clear when customers started using the vSAN policies and defined different “failures to tolerate” for different workloads, it just makes sense. But as mentioned, HA does not allow you to do this. So our proposal is the following: Per VM FTT Admission Control.

In this case you would be able to define Host Failures To Tolerate on a per VM basis. This would provide a couple of benefits in my opinion:

  • You can set a higher Host Failures To Tolerate for critical workloads, increasing the chances of being to restart them when a failure has occurred
  • Aligning the HA Host Failures To Tolerate with the vSAN Host Failures To Tolerate, resulting in similar availability from a compute and storage point of view
  • Lower resource fragmentation by providing on a per VM basis Admission Control, even when using “slot based algorithm”
  • Of course you can use the new admission control types as mentioned in my earlier post.

Hopefully that is clear, and hopefully, it is a proposal you appreciate. Please leave a comment if you find this useful, or if you don’t find this useful. Please help shape the future of HA!

HA Futures: VMCP for Networking – Part 3 of 4 – (Please comment!)

Duncan Epping · Oct 30, 2018 ·

VMCP, or VM Component Protection, has been around for a while. Many of you are probably using this to mitigate storage issues. However, what if the VM network fails? Well, that is a problem right now… if the VM network fails then there’s no response from HA. This by many customers is considered to be a problem. So what would we like to propose? VM Component Protection for Networking!

How would this work? Well the plan would be to allow you to enable VM Component Protection for Networking for any network on your host. This could be the vMotion network, different VM networks etc. On this network HA would need to have an IP address it could check “liveness” against of course, very similar to how it used the default gateway to verify “host isolation”.

On top of that, besides validating liveness through an IP address, of course, HA should also monitor the physical NIC. If either of the two would not work, well then HA should take action immediately. What this action will be will depend on the type of failure that has occurred. We are considering the following two types of responses to a failure:

  1. If vMotion still works, migrate the VM from impacted host to a healthy host
  2. If vMotion doesn’t work, restart the impacted VM on a healthy host

In addition to monitoring the health of the physical NIC, HA can also use in guest/VM monitoring techniques to monitor the network route from within the VM to a certain address/gateway. Would this technique be useful?

What do you think? Please provide feedback/comments below, even if it is just a “yes, please!” Please help shape the future of HA!

HA Futures: Admission Control – Part 2 of 4 – (Please comment, feedback needed!)

Duncan Epping · Oct 23, 2018 ·

Admission Control is always a difficult topic when I talk to customers. It seems that many people still don’t fully grasp the concept, or simply misunderstand how it works. To be honest, I can’t blame them. It doesn’t always make sense when you think things through. Most recently for Admission Control we introduced a mechanism in which you can specify what the “tolerated performance loss” should be for any given VM. This isn’t really admission control unfortunately as it doesn’t stop you from powering on new VMs, it does, however, warn you if you reach the threshold where a host failure would lead to the specified performance degradation.

After various discussion with the HA team over the past couple of years, we are now exploring what we can change about Admission Control to give you more options as a user to ensure VMs are not only restarted but also receive the resources you expect them to receive. As such, the HA team is proposing 3 different ways of doing Admission Control, and we would like to have your feedback on this potential change:

  • Admission Control based on reserved resources and VM overheads
    This is what you have today, nothing changes here. We use the static reservations and ensure that all VMs can be powered on!
  • Admission Control based on consumed resources
    This is similar to the “performance degradation tolerated” option. We will look at the average consumed CPU and Memory resources, let’s say past 24 hours), and base our admission control calculations on that. This will allow you to guarantee performance for workloads to be similar after a failure.
  • Admission Control based on configured resources
    This is a static way of doing admission control similar to the first. The only difference is that here Admission Control will do the calculations based on the resources configured. So if you configured a VM with 24GB of memory, then we will do the math with 24GB of memory for that VM. The big advantage, of course, is that the VMs will always be able to claim the resources they have assigned.

In our opinion, adding these options should help to ensure that VMs will receive the resources you (or your customers) would expect them to get. Please help us by leaving a comment/providing feedback. If you agree that this would be helpful then let us know, if you have serious concerns then we would also like to know. Please help shape the future of HA!

HA Futures: Orchestrated Restart and Restart Priority – Part 1 of 4 – (Please comment!)

Duncan Epping · Oct 15, 2018 ·

Last week I visited Palo Alto. I had a long conversation with my friends from the vSphere HA team. I have done a series of articles in the past asking for feedback/comments on future features/functions of HA that the team is looking to implement. One of the first features I would like to discuss is Orchestrated Restart and Restart Priority. Funny enough, this feature is one I discussed in the previous series as well.

For those who don’t know, today you can specify in the vSphere Client what the dependency is between VMs. If a host fails, or multiple hosts fail, and the VMs in a dependency chain are impacted than HA ensures that these VMs are powered on in a particular order. And actually, vSphere HA also has the ability to specify what the restart priority should be of VMs. I described both of these features here. The difference between the two is fairly straightforward though: restart priority are considered “soft rules” and restart orchestration is considered a hard rule. In other words: if one VM can’t be restarted when restart orchestration is used then the next batch will not start.

The UI, to be honest, is confusing, and having two similar concepts that more or less do the same is also confusing. We have discussed various things we would like to have your opinion on. Please leave your feedback in the comments using a valid email address, this way when needed we can follow up.

  • Would you like to have the ability to restart a full chain of VMs, when Orchestration or Priority is enabled and only a few VMs in the chain are impacted by a failure? In other words, would you like to have an option that allows you to restart running VMs that are part of a chain which is impacted by a failure?
  • Would you like to have Orchestrated Restarts / Restart Priority for APD impacted VMs and VM & App Monitoring as well?
  • Would you like to have Orchestrated Restarts and Restart Priority combined in a single feature?
    • Potentially have an option to have multi-level for Orchestration like Restart Priority has
    • Define if it is a “hard” or “soft” rule

And of course, if you feel anything else needs to change about this feature, then please also leave that in a comment. The HA team will be reading this, and are happy to take all feedback they can get. Please help shape the future of HA!

  • Go to page 1
  • Go to page 2
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist in the Office of the CTO in the Cloud Infrastructure Business Group (CIBG) at VMware. Besides writing on Yellow-Bricks, Duncan co-authors the vSAN Deep Dive book series and the vSphere Clustering Deep Dive book series. Duncan also co-hosts the Unexplored Territory Podcast.

Follow Me

  • Twitter
  • LinkedIn
  • Spotify
  • YouTube

Recommended Book(s)

Advertisements




Copyright Yellow-Bricks.com © 2023 · Log in