cloud

HA Admission control: How can I check how much reserved resources are used?

Duncan Epping · Sep 14, 2018 ·

I had this question twice in the past three months, so somehow this is something which isn’t clear to everyone. With HA Admission Control you set aside capacity for fail-over scenarios, this is capacity which is reserved. But of course VMs also use reservations, how can you see what the total combined used reserved capacity is including Admission Control?

Well, that is actually pretty simple it appears. If you open the Web Client, or the H5 client, and go to your cluster then you have a “Monitor” tab, under the Monitor tab there’s an option called “Resource Reservation” and it has graphs for both CPU as well as Memory. This actually includes admission control. To verify this I took a screenshot in the H5 client before enabling admission control, and after enabling it, as you can see a big increase in “reserved resources”, which indicates that admission control is taken into consideration.

Before:

After:

This is also covered in our Deep Dive book, by the way, if you want to know more, pick it up.

My top VMworld session picks

Duncan Epping · Aug 7, 2018 ·

Every year I post a list of my favorite VMworld sessions, my top picks. There are way too many sessions to see, but these are definitely the sessions I would like to attend personally. That could be because of the speaker, or the content, and preferably both. Yes I know, this list will have some great sessions missing, not because I did not like the abstract or speaker, but simply because I forced myself to limit this list to 10. Before we get started, here are the two sessions I have scheduled, make sure to sign up for those while you still can, as both seem to be at 80+ % capacity right now

The Power of Storage Policy-Based Management [HCI1270BU] – Cormac Hogan & Duncan Epping
Tuesday, Aug 28, 12:30 p.m. – 1:30 p.m.
The world of software-defined storage moves at a rapid pace, and VMware is one of the biggest enablers. In this session, Cormac and Duncan will guide you through the world of software-defined storage initiatives at VMware and provide a primer to VMware vSAN, VMware Virtual Volumes (VVol), persistent cloud-native storage options (Project Hatchway), the VMware vSphere APIs for I/O filtering, and the binding factor in these cases: storage policy-based management. Be warned: We will bring demos!
vSphere Clustering Deep Dive, Part 1: vSphere HA and DRS [VIN1249BU] – Frank Denneman & Duncan Epping
Monday, Aug 27, 12:30 p.m. – 1:30 p.m.
In this session, Duncan and Frank will take you through the trenches of VMware vSphere Distributed Resource Scheduler (DRS) and vSphere High Availability (HA). Find out about options to optimize your DRS settings for your specific requirements and goals, such as if you should be load balancing on active or consumed memory, as well as what has recently changed in the DRS algorithm and if it will impact DRS behavior. And for vSphere HA, you will learn about when it restarts virtual machines (VMs), what kind of restart times to expect, and where you can find evidence that a VM (or multiple) have been restarted. You will find out about all of these items and more. Prepare to dive deep, as the basics will not be covered.

Here are my top picks, note that although I picked Ravi’s session from the Extreme Performance Series, all of them are worth attending!

Extreme Performance Series: vCenter Performance Deep Dive [VIN1759BU] Ravi Soundararajan
Tuesday, Aug 28, 5:00 p.m. – 6:00 p.m.
In this talk, you will get a brief description of the internals of VMware vCenter before going into basic performance troubleshooting and monitoring techniques. Find out about various tools for analyzing resource usage, important metrics like sessions and API calls, and database performance (primarily for the vCenter Server Appliance, but also for vCenter Server for Windows). You will get to understand the differences between vCenter and Platform Services Controller, and consider the impact of linked mode and plug-ins/extensions. By the end of the talk, you’ll understand how your vCenter works, when you may need multiple vCenters, and how Platform Services Controller factors into performance. xPerfSeries
Tech Preview: The Road to a Declarative Compute Control Plane [VIN2256BU] Maarten Wiggers & Frank Denneman
Tuesday, Aug 28, 12:30 p.m. – 1:30 p.m.
Declarative control planes are becoming increasingly popular in the industry. Instead of explicitly defining configurations, declarative control planes tell the architecture what the desired state should be. The desired state could be high priority, or keep particular VMs or containers separate. Within the software-defined data center (SDDC), VMware vSphere offers two declarative control planes: one for networking and one for storage. However, there is no declarative control plane for compute yet.
Compute policy provides a framework to allow our customers the flexibility and control of VM placement and resourcing decisions based on the user’s encompassing application needs. In this session, you will learn about the capabilities introduced in the VMware Cloud SDDC as a path to achieve that goal.
Clustering Deep Dive 2: Quality Control with DRS and Network I/O Control [VIN1735BU] Niels Hagoort & Sahan Gamage
Tuesday, Aug 28, 2:00 p.m. – 3:00 p.m.
In this session, you will go through the trenches of network-aware VMware vSphere DRS and vSphere Network I/O Control. You may ask yourself what these two have to do with each other as, unfortunately, not many people know about the enhancements added to the DRS algorithm around network-aware load balancing. If you want to understand how this can help prevent problems from occurring with network-intensive workloads like NFV, then this is a session you cannot miss!
Project Fractal – The Easy Button for Edge Computing [IOT2593BU] – Dennis Lu & Sridevi Ravuri
Tuesday, Aug 28, 4:00 p.m. – 5:00 p.m.
Come and learn about how VMware can accelerate your adoption of Edge Computing by dealing with the additional complexity and cost of infrastructure management at the Edge, helping you quickly achieve the cost savings and revenue growth benefits of Edge Computing. This is also a great opportunity to shape the direction of VMware’s edge services to help fit customer needs.
vSAN Deployment Topology and Availability Deep Dive: What You Need to Know [HCI2040BU] Paudie O’Riordan & Mansi Shah
Wednesday, Aug 29, 8:00 a.m. – 9:00 a.m.
Today, VMware vSAN can be deployed in many different form factors; for example, vSAN 2-Node ROBO, vSAN Fault domains, Stretch Cluster with and without local protection, and more. These deployment models make vSAN quite flexible and unique. This session will help you understand the different trade-offs and focus on the benefits and overheads of the choice you’ve made in your vSAN proposed design. Join Mansi and Paudie as they discuss these topologies in depth from both an engineering perspective and a practical real-world implementation. Paudie and Mansi will take a no-nonsense review of how to approach designing a fault-tolerant vSAN deployment and give real-world examples of how to achieve the best design from both an availability and performance perspective.
Top 10 Automation Requests and How You Can Save Time [VIN2527BU] Alan Renouf & William Lam
Monday, Aug 27, 2:00 p.m. – 3:00 p.m.
After working firstly as customers and secondly at VMware, Alan and William have encountered hundreds of ways to save time through automation. In this session, they will take you through the top automation requests and how they were completed, teaching you not only how to reproduce them yourself, but also giving you a framework to enable you to automate your top 10 requests.
This session will include a number of techniques and languages, such as PowerShell, PowerCLI, Python, Java, .NET, and simple web applications with JavaScript.
Data Lifecycle Management in Hybrid Clouds [HCI1705BU] Christos Karamanolis & Ilya Languev
Tuesday, Aug 28, 2:00 p.m. – 3:00 p.m.
The focus of IT and DevOps organizations is shifting from storage toward data management independent of infrastructure and locations. This trend is partly driven by a new generation of applications that extract business value from data (big data, analytics, machine learning). Customers need cost-effective data storage but also data mobility, copy management, and on-demand access as business requirements and IT investments evolve. Join Christos Karamanolis (CTO, Storage and Availability) and Ilya Languev (Principal Engineer) as they outline the VMware vision around data lifecycle management that spans private data centers and public clouds. They will discuss VMware’s R&D investments in this space and use real-world examples and demos to highlight the benefits for our customers, both for traditional and cloud-native applications.
VMware CTO Panel: What’s Over the Horizon? [CTO3496PU] Ray O’Farrell, Christos Karamanolis, Chris Wolf, Shawn Bass, Pere Monclus
Tuesday, Aug 28, 5:30 p.m. – 6:30 p.m.
VMware CTOs spend significant time assessing emerging technology trends, taking a practical look at their potential impacts and opportunities for VMware. This session explores emerging areas, inclusive of edge, the Internet of things, artificial intelligence (AI)/machine learning (ML), SD-WAN and network service mesh, distributed data management, and more. There will also be ample time for you to have your most pressing questions answered.
Smart Placement of Workloads in Tomorrow’s Distributed Cloud [CTO2161BU] Daniel Beveridge
Tuesday, Aug 28, 1:00 p.m. – 2:00 p.m.
This session will offer a look at the evolution of cloud as we move from a nega-cloud-focused experience into a more distributed cloud experience where compute evolves toward a mesh of resources. Find out about a technology project sponsored by VMware’s Office of the CTO that has developed a novel approach to the placement of workloads in a vast marketplace of providers, resulting in a seamless cloud burst experience across a range of providers. You will learn about some cutting-edge cloud technology that points toward a new way of consuming cloud services with an emphasis on reducing cost, improving user experience, and offering increased flexibility and agility in workload management.
Optimizing vSAN for Performance [HCI1246BU] Cormac Hogan & Paudie O’Riordan
Tuesday, Aug 28, 3:30 p.m. – 4:30 p.m.
The VMware vSAN team gets many questions on performance. For example, does adding a second disk group improve performance? Does adding a stripe width to an object make things faster? Does increasing the MTU size matter? Does mixing SAS and SATA make a difference? Join this session for answers to these sorts of questions. Paudie and Cormac will discuss the results of various performance tests they initiated in their labs to reach these conclusions. You will learn about the benchmark tool of choice, HCIBench, as well as all the different nuances that can make a difference to your benchmarking results.

Also note, there’s a long list of “deep dive” session at vmworld this year, do a search and register before it is too late!

Now Available: vSphere 6.7 Clustering Deep Dive book!

Duncan Epping · Jul 30, 2018 ·

Over the past couple of months Frank, Niels and I have worked ferociously to update the vSphere Clustering Deep Dive. Some of the material was already brought up to date to vSphere 6.0 U2, but the majority was never updated after vSphere 5.1. As you can imagine, this was a tremendous undertaking. Not only did we need to validate every sentence, all diagrams needed to be updated, and with the introduction of the HTML-5 Client also all screenshots had to be retaken.

Now, just a couple of weeks before VMworld, we are finally at the point where we can press “publish”.

What can you expect? Well, we have said this with previous books, this is not a beginners guide! This is a deep dive, and we aimed to take you in to the trenches of vSphere Clustering technologies. We cover a multitude of different features, and for those who haven’t read the previous books expect the following features to be covered:

vSphere HA
vSphere DRS
vSphere Storage DRS
vSphere Storage I/O Control
vSphere Network I/O Control

We also have a chapter on stretched clusters, in this chapter we describe how to design and implement a vSphere Metro Storage Cluster, leveraging all of the knowledge gained in the previous chapters.

For your convenience, I copied/pasted some of the Amazon info below.

—

Paperback: 566 pages
Publisher: CreateSpace Independent Publishing Platform; 1 edition (July 29, 2018)
Language: English
ISBN-10: 1722625325
ISBN-13: 978-1722625320
Product Dimensions: 5.5 x 1.3 x 8.5 inches
Shipping Weight: 1.8 pounds

—

I hope all of you will enjoy the book as much as we enjoyed writing it. And before I forget, I want to thank my co-authors for the late night discussions, the hard work, insights and fun/laughter at times.

Get it while it is hot! (Look on the right side column for the links to the book!)

Instant Clone in vSphere 6.7 rocks!

Duncan Epping · May 1, 2018 ·

I wrote a blog post a while back about VMFork, which then afterwards got rebranded to Instant Clone. In the vSphere 6.7 release there has been a major change to the architecture of VMFork aka Instant Clone, so I figured I would do an update on the post. As an update doesn’t stand out from the rest of the content I am sharing it as a new post.

Instant Clone was designed and developed to provide a mechanism that allows you to instantaneously create VMs. In the early days it was mainly used by folks who want to deploy desktops, by the desktop community this was often referred to as “just in time” desktops. These desktops would literally be created when the user tried to login, it is indeed that fast. How did this work? Well a good way to describe it is that it is essentially a “vMotion” of a VM on the same host with a linked clone disk. This essentially leads to a situation which looks as follows:

On a host you had a parent VM and a child VM associated with it. You would have a shared base disk, shared memory and then of course unique memory pages and a delta disk for (potential) changes written to disk. The reason customers primarily used this only with VDI at first was the fact that there was no public API for it. Of course folks like Alan Renouf and William Lam fought hard for public APIs internally and they managed to get things like the PowerCLI cmdlets and python vSphere SDK pushed through. Which was great, but unfortunately not 100% fully supported. On top of that there were some architectural challenges with the 1.0 release of Instant Clones. Mainly caused by the fact that VMs were pinned to a host (next to their parent VM) and as such things like HA, DRS, vMotion wouldn’t work. Now with version 2.0 this all changes. William already wrote an extensive blog post about it here. I just went over all of the changes and watched some internal training, and I am going to write some of my findings/learnings down as well, just so that it sticks… First let’s list the things that stood out to me:

Targeted use cases
- VDI
- Container hosts
- Big data / hadoop workers
- DevTest
- DevOps
There are two workflows for instant clone
- Instant clone a running a VM, source and generated VMs continue running
- Instant clone a frozen VM, source is frozen using guestRPC at point in time defined by customer
No UI yet, but “simple API” available
Integration with vSphere Features
- Now supported: HA, DRS, vMotion (Storage / XvMotion etc)
Even when TPS is disabled (default) VMFork still leverages the P-Share technology to collapse the memory pages for efficiency
There is no explicit parent-child relationship any longer

Let’s look at the use cases first, I think the DevTest / DevOps is interesting. You could for instance do an Instant Clone (live) of a VM and then test an upgrade for instance for the application running within the VM. For this you would use the first workflow that I mentioned above: instant clone a running VM. What happens here in the workflow is fairly straight forward. I am using William’s screenshots of the diagrams the developers created to explain it. Thanks William, and dev team 🙂

Now note that above when the first clone is created the source gets a delta disk as well. This is to ensure that the shared disk doesn’t change as that would cause problems for the target. Now when a 2nd VM is created and a 3 the source VM gets additional delta’s. This as you can imagine isn’t optimal and will over time even potentially slow down the source VM. Also one thing to point out is that although the mac address changes for the generated VM, you as the admin still need to make sure the Guest OS picks this up. As mentioned above, as there’s no UI in vSphere 6.7 for this functionality you need to use the API. If you look at the MOB you can actually find the InstantClone_Task and simply call that, for a demo scroll down. But as said, be careful as you don’t want to end up with the same VM with the same IP on the same network multiple times. You can get around the Mac/IP conflict issue rather easy and William has explained how in his post here. You can even change the port group for the NIC, for instance switch over to an isolated network only used for testing these upgrade scenarios etc.

That second workflow would be used for the following use cases: VDI, Container Hosts, Hadoop workers… all more or less the same type of use case. Scale out identical VMs fast! In this case, well lets look at the diagram first:

In the above scenario the Source VM is what they call “frozen”. You can freeze a VM by leveraging the vmware-rpctool and run it with “instantclone.freeze”. This needs to happen from within the guest, and note that you need to have VMware tools installed to have vmware-rpctool available. When this is executed, the VM goes in to a frozen state, meaning that no CPU instructions are executed. Now that you froze the VM you can go through the same instant clone workflow. Instant Clone will know that the VM is frozen. After the instant clone is create you will notice that there’s a single delta disk for the source VM and each generated VM will have its own delta disk as shown above. Big benefit is that the source VM won’t have many delta disks. Plus, you know for sure that every single VM you create from this frozen VM is 100% identical as they all resume from the exact same point in time. Of course when the instant clone is created the new VM is “unfrozen / resumed”, the source will remain frozen. Note that if for whatever reason the source is restarted / power cycled then the “frozen state” is lost. Another added benefit of the frozen VM is that you can automate the “identity / IP / mac” issue when leveraging the “frozen source VM” workflow. How do you do this? Well you disable the network, freeze it, instant clone it (unfreezes automatically), make network changes, enable network. William just did a whole blog post on how to do various required Guest changes, I would highly recommend reading this one as well!

So before you start using Instant Clone, first think about which of the two workflows you prefer and why. So what else did I learn?

As mentioned, and this is something I never realized, but even when TPS is disabled Instant Clone will still share the memory pages through the P-Share mechanism. P-Share is the same mechanism that TPS leverages to collapse memory pages. I always figured that you needed to re-enable TPS again (with or without salting), but that is not the case. You can’t even disable the use of P-Share at this point in time… Which personally I don’t think is a security concern, but you may think different about it. Either way, of course I tested this, below you see the screenshot of the memory info before and after an instant clone. And yes, TPS was disabled. (Look at the shares / saving values…)

Before:

After:

Last but not least, the explicit parent-child relationship caused several problems from a functionality stance (like HA, DRS, vMotion etc not being supported). Per vSphere 6.7 this is no longer the case. There is no strict relationship, and as such all the features you love in vSphere can be fully leveraged even for your Instant Clone VMs. This is why they call this new version of Instant Clone “parentless”.

If you are wondering how you can simply test it without diving in to the API too deep and scripting… You can use the Managed Object Browser (MOB) to invoke the method as mentioned earlier. I recorded this quick demo that shows this, which is based on a demo from one of our Instant Clone engineers. I recommend watching it in full screen as it is much easier to follow that way. (or watch it on youtube in a larger window…) Pay attention, as it is a quick demo, instant clone is extremely fast and the workflow is extremely simple.

And that’s it for now. Hope that helps those interested in Instant Clone / VMFork, and maybe some of you will come up with some interesting use cases that we haven’t thought about it. Would be good if you have use cases to share those in the comment section below. Thanks,

Startup update: Runecast

Duncan Epping · Feb 16, 2018 ·

A while ago I introduced Runecast on my blog. I have known these guys for a while and this week I had to pleasure to be briefed on their new release: Runecast 1.7. The big ticket item in this release for sure it the vSAN Support. You may ask yourself why you would need Runecast when you have things like the health check and the “online” health check, well it seems that Runecast’s implementation covers more detail. Anyway, what is Runecast? As a company they refer to themselves as the knowledge automation experts, and I think that is a fair statement.

Runecast has developed an appliance which can be connected to one or multiple vCenter Server instances. After linking these you can “scan” the environment and Runecast will tell you about the risks. Not just from a security perspective, but it will also assess logs, configuration and even best practices. Your whole environment will be assessed in a report will be provided in a simple HTML-5 interface, or in the Web Client or the vSphere H5 client even. I said “simple”, but the information provided and the detail is far from simple… When I say simple I refer to their user interface. It is slick, and very easy to use.

Since I discussed Runecast last they added some additional features, like for instance a VRO plugin, full rest API, improved log search, Web Client and H5 client plugins but more importantly for many government agencies: DISA STIG compliancy checks. Yes, Runecast can check your environment against DISA STIG and report on any potential issues. Nice right?

This new release, version 1.7, now brings vSAN support. It also includes a new dashboard widget, which provides faster insights in how your environment is behaving. For vSAN in particular they didn’t only include KB article checks, but also implemented all best practices from the Design and Sizing guide, Network Design guide and the Stretched Cluster white paper. And they even hinted about adding best practices which are listed in the Essential vSAN book Cormac and I wrote, how cool is that? What is also nice is that their appliance is supported with vSAN 5.x and 6.x, and requires no direct access to the internet. You can simply download the appliance and install, and then update with the latest dataset by downloading an ISO.

Oh and before I forget, of course they also provide all the guidance and info needed around Spectre/Meltdown. Where normally their trial is limited, they actually do provide ALL info needed for Spectre/Meltdown as they realized that this is very valuable to customers and felt they could not hold this back.

For the Runecast blog on the 1.7 release go here.