Various

VMworld Video: vSphere 6.7 Clustering Deep Dive

Duncan Epping · Sep 3, 2018 ·

As all videos are posted for VMworld (and nicely listed by William), I figured I would share the session Frank Denneman and I presented. It ended up in the Top 10 Sessions on Monday, which is always a great honor. We had a lot of positive feedback and comments, thanks for that! Most importantly, it was a lot of fun again to be up on stage at VMworld talking about this content after 6 years of absence or so. For those who missed it, watch it here:

Also very much enjoyed the book signing session at the Rubrik booth with Niels and Frank. I believe Rubrik gave away around 1000 copies of the book. Hoping we can repeat this huge success in EMEA. But more on that later. If you haven’t picked up the book yet and won’t be at VMworld Europe, consider picking it up through Amazon, e-book is 14.95 USD only.

UI Confusion: VM Dependency Restart Condition Timeout

Duncan Epping · Sep 3, 2018 ·

Various people have asked me, and I wrote about this before in several articles but as part of a longer article which makes it difficult to find. When specifying the restart priority or restart dependency you can specify when the next batch of VMs should be powered on. Is that when the VMs are powered on when they are scheduled for being powered on, when VMware Tools reports them as running or when the application heartbeat reports itself?

In most cases, customers appear to go for either “powered on” or “VMware Tools” heartbeat. But what happens when one of the VMs in the batch is not successfully restarted? Well HA waits… For how long? Well that depends:

In the UI you can specify how long HA needs to wait by using the option called “VM Dependency Restart Condition Timeout”. This is the time-out in seconds used when one (or multiple VMs) can’t be restarted. So we initiate the restart of the group, and we will start the next batch when the first is successfully restart or when the time-out has been exceeded. By default, the time-out is 600 seconds, and you can override this in the UI.

What is confusing about this setting is the name, it states “VM Dependency Restart Condition Timeout”. So does this time-out apply to “Restarts Priority” or does it apply to “Restart Dependency” or maybe both? The answer is simple, this only applies to “Restart Priority”. Restart Dependency is a rule, a hard rule, a must rule, which means there’s no time-out. We wait until all VMs are restarted when you use restart dependency. Yes, the UI is confusing as the option mentions “dependency” where it should really talk about “priority”. I have reported this to engineering and PM, and hopefully it will be fixed in one of the upcoming releases.

My top VMworld session picks

Duncan Epping · Aug 7, 2018 ·

Every year I post a list of my favorite VMworld sessions, my top picks. There are way too many sessions to see, but these are definitely the sessions I would like to attend personally. That could be because of the speaker, or the content, and preferably both. Yes I know, this list will have some great sessions missing, not because I did not like the abstract or speaker, but simply because I forced myself to limit this list to 10. Before we get started, here are the two sessions I have scheduled, make sure to sign up for those while you still can, as both seem to be at 80+ % capacity right now

The Power of Storage Policy-Based Management [HCI1270BU] – Cormac Hogan & Duncan Epping
Tuesday, Aug 28, 12:30 p.m. – 1:30 p.m.
The world of software-defined storage moves at a rapid pace, and VMware is one of the biggest enablers. In this session, Cormac and Duncan will guide you through the world of software-defined storage initiatives at VMware and provide a primer to VMware vSAN, VMware Virtual Volumes (VVol), persistent cloud-native storage options (Project Hatchway), the VMware vSphere APIs for I/O filtering, and the binding factor in these cases: storage policy-based management. Be warned: We will bring demos!
vSphere Clustering Deep Dive, Part 1: vSphere HA and DRS [VIN1249BU] – Frank Denneman & Duncan Epping
Monday, Aug 27, 12:30 p.m. – 1:30 p.m.
In this session, Duncan and Frank will take you through the trenches of VMware vSphere Distributed Resource Scheduler (DRS) and vSphere High Availability (HA). Find out about options to optimize your DRS settings for your specific requirements and goals, such as if you should be load balancing on active or consumed memory, as well as what has recently changed in the DRS algorithm and if it will impact DRS behavior. And for vSphere HA, you will learn about when it restarts virtual machines (VMs), what kind of restart times to expect, and where you can find evidence that a VM (or multiple) have been restarted. You will find out about all of these items and more. Prepare to dive deep, as the basics will not be covered.

Here are my top picks, note that although I picked Ravi’s session from the Extreme Performance Series, all of them are worth attending!

Extreme Performance Series: vCenter Performance Deep Dive [VIN1759BU] Ravi Soundararajan
Tuesday, Aug 28, 5:00 p.m. – 6:00 p.m.
In this talk, you will get a brief description of the internals of VMware vCenter before going into basic performance troubleshooting and monitoring techniques. Find out about various tools for analyzing resource usage, important metrics like sessions and API calls, and database performance (primarily for the vCenter Server Appliance, but also for vCenter Server for Windows). You will get to understand the differences between vCenter and Platform Services Controller, and consider the impact of linked mode and plug-ins/extensions. By the end of the talk, you’ll understand how your vCenter works, when you may need multiple vCenters, and how Platform Services Controller factors into performance. xPerfSeries
Tech Preview: The Road to a Declarative Compute Control Plane [VIN2256BU] Maarten Wiggers & Frank Denneman
Tuesday, Aug 28, 12:30 p.m. – 1:30 p.m.
Declarative control planes are becoming increasingly popular in the industry. Instead of explicitly defining configurations, declarative control planes tell the architecture what the desired state should be. The desired state could be high priority, or keep particular VMs or containers separate. Within the software-defined data center (SDDC), VMware vSphere offers two declarative control planes: one for networking and one for storage. However, there is no declarative control plane for compute yet.
Compute policy provides a framework to allow our customers the flexibility and control of VM placement and resourcing decisions based on the user’s encompassing application needs. In this session, you will learn about the capabilities introduced in the VMware Cloud SDDC as a path to achieve that goal.
Clustering Deep Dive 2: Quality Control with DRS and Network I/O Control [VIN1735BU] Niels Hagoort & Sahan Gamage
Tuesday, Aug 28, 2:00 p.m. – 3:00 p.m.
In this session, you will go through the trenches of network-aware VMware vSphere DRS and vSphere Network I/O Control. You may ask yourself what these two have to do with each other as, unfortunately, not many people know about the enhancements added to the DRS algorithm around network-aware load balancing. If you want to understand how this can help prevent problems from occurring with network-intensive workloads like NFV, then this is a session you cannot miss!
Project Fractal – The Easy Button for Edge Computing [IOT2593BU] – Dennis Lu & Sridevi Ravuri
Tuesday, Aug 28, 4:00 p.m. – 5:00 p.m.
Come and learn about how VMware can accelerate your adoption of Edge Computing by dealing with the additional complexity and cost of infrastructure management at the Edge, helping you quickly achieve the cost savings and revenue growth benefits of Edge Computing. This is also a great opportunity to shape the direction of VMware’s edge services to help fit customer needs.
vSAN Deployment Topology and Availability Deep Dive: What You Need to Know [HCI2040BU] Paudie O’Riordan & Mansi Shah
Wednesday, Aug 29, 8:00 a.m. – 9:00 a.m.
Today, VMware vSAN can be deployed in many different form factors; for example, vSAN 2-Node ROBO, vSAN Fault domains, Stretch Cluster with and without local protection, and more. These deployment models make vSAN quite flexible and unique. This session will help you understand the different trade-offs and focus on the benefits and overheads of the choice you’ve made in your vSAN proposed design. Join Mansi and Paudie as they discuss these topologies in depth from both an engineering perspective and a practical real-world implementation. Paudie and Mansi will take a no-nonsense review of how to approach designing a fault-tolerant vSAN deployment and give real-world examples of how to achieve the best design from both an availability and performance perspective.
Top 10 Automation Requests and How You Can Save Time [VIN2527BU] Alan Renouf & William Lam
Monday, Aug 27, 2:00 p.m. – 3:00 p.m.
After working firstly as customers and secondly at VMware, Alan and William have encountered hundreds of ways to save time through automation. In this session, they will take you through the top automation requests and how they were completed, teaching you not only how to reproduce them yourself, but also giving you a framework to enable you to automate your top 10 requests.
This session will include a number of techniques and languages, such as PowerShell, PowerCLI, Python, Java, .NET, and simple web applications with JavaScript.
Data Lifecycle Management in Hybrid Clouds [HCI1705BU] Christos Karamanolis & Ilya Languev
Tuesday, Aug 28, 2:00 p.m. – 3:00 p.m.
The focus of IT and DevOps organizations is shifting from storage toward data management independent of infrastructure and locations. This trend is partly driven by a new generation of applications that extract business value from data (big data, analytics, machine learning). Customers need cost-effective data storage but also data mobility, copy management, and on-demand access as business requirements and IT investments evolve. Join Christos Karamanolis (CTO, Storage and Availability) and Ilya Languev (Principal Engineer) as they outline the VMware vision around data lifecycle management that spans private data centers and public clouds. They will discuss VMware’s R&D investments in this space and use real-world examples and demos to highlight the benefits for our customers, both for traditional and cloud-native applications.
VMware CTO Panel: What’s Over the Horizon? [CTO3496PU] Ray O’Farrell, Christos Karamanolis, Chris Wolf, Shawn Bass, Pere Monclus
Tuesday, Aug 28, 5:30 p.m. – 6:30 p.m.
VMware CTOs spend significant time assessing emerging technology trends, taking a practical look at their potential impacts and opportunities for VMware. This session explores emerging areas, inclusive of edge, the Internet of things, artificial intelligence (AI)/machine learning (ML), SD-WAN and network service mesh, distributed data management, and more. There will also be ample time for you to have your most pressing questions answered.
Smart Placement of Workloads in Tomorrow’s Distributed Cloud [CTO2161BU] Daniel Beveridge
Tuesday, Aug 28, 1:00 p.m. – 2:00 p.m.
This session will offer a look at the evolution of cloud as we move from a nega-cloud-focused experience into a more distributed cloud experience where compute evolves toward a mesh of resources. Find out about a technology project sponsored by VMware’s Office of the CTO that has developed a novel approach to the placement of workloads in a vast marketplace of providers, resulting in a seamless cloud burst experience across a range of providers. You will learn about some cutting-edge cloud technology that points toward a new way of consuming cloud services with an emphasis on reducing cost, improving user experience, and offering increased flexibility and agility in workload management.
Optimizing vSAN for Performance [HCI1246BU] Cormac Hogan & Paudie O’Riordan
Tuesday, Aug 28, 3:30 p.m. – 4:30 p.m.
The VMware vSAN team gets many questions on performance. For example, does adding a second disk group improve performance? Does adding a stripe width to an object make things faster? Does increasing the MTU size matter? Does mixing SAS and SATA make a difference? Join this session for answers to these sorts of questions. Paudie and Cormac will discuss the results of various performance tests they initiated in their labs to reach these conclusions. You will learn about the benchmark tool of choice, HCIBench, as well as all the different nuances that can make a difference to your benchmarking results.

Also note, there’s a long list of “deep dive” session at vmworld this year, do a search and register before it is too late!

Now Available: vSphere 6.7 Clustering Deep Dive book!

Duncan Epping · Jul 30, 2018 ·

Over the past couple of months Frank, Niels and I have worked ferociously to update the vSphere Clustering Deep Dive. Some of the material was already brought up to date to vSphere 6.0 U2, but the majority was never updated after vSphere 5.1. As you can imagine, this was a tremendous undertaking. Not only did we need to validate every sentence, all diagrams needed to be updated, and with the introduction of the HTML-5 Client also all screenshots had to be retaken.

Now, just a couple of weeks before VMworld, we are finally at the point where we can press “publish”.

What can you expect? Well, we have said this with previous books, this is not a beginners guide! This is a deep dive, and we aimed to take you in to the trenches of vSphere Clustering technologies. We cover a multitude of different features, and for those who haven’t read the previous books expect the following features to be covered:

vSphere HA
vSphere DRS
vSphere Storage DRS
vSphere Storage I/O Control
vSphere Network I/O Control

We also have a chapter on stretched clusters, in this chapter we describe how to design and implement a vSphere Metro Storage Cluster, leveraging all of the knowledge gained in the previous chapters.

For your convenience, I copied/pasted some of the Amazon info below.

—

Paperback: 566 pages
Publisher: CreateSpace Independent Publishing Platform; 1 edition (July 29, 2018)
Language: English
ISBN-10: 1722625325
ISBN-13: 978-1722625320
Product Dimensions: 5.5 x 1.3 x 8.5 inches
Shipping Weight: 1.8 pounds

—

I hope all of you will enjoy the book as much as we enjoyed writing it. And before I forget, I want to thank my co-authors for the late night discussions, the hard work, insights and fun/laughter at times.

Get it while it is hot! (Look on the right side column for the links to the book!)

vSphere 6.5 and limits, do they still apply to SvMotion?

Duncan Epping · Jul 11, 2018 ·

I had a question this week and I thought I wrote about this before but apparently, I did not. Hopefully by now most of you the I/O scheduler has changed over the past couple of years. Within ESXi, we moved to a new scheduler, which often is referred to as mClock. This new scheduler, and a new version of SIOC (storage i/o control) also resulted in some behavioral changes. Some which may be obvious, others which are not. I want to explicitly point out two things which I have discussed in the past which changed with vSphere 6.5 (and 6.7 as such) and both are around the use of limits.

First of all: Limits and SvMotion. In the past when a limit was applied to a VM, this would also artificially limit SvMotion. As of vSphere 6.5 (may apply to 6.0 as well but I have not verified this) this is no longer the case. Starting vSphere 6.0 the I/O scheduler creates a queue for every file on the file system (VMFS), this to avoid for instance a VM stalling other types (metadata) of IO. The queues are called SchedQ and briefly described by Cormac Hogan here. Of course, there’s a lot more to it than Cormac or I discuss here, but I am not sure how much I can share so I am not going to go there. Either way, if you were used to limits being applied to SvMotion as well you are warned… this is no longer the case.

Secondly, the normalization of I/Os changed with limits. In the past when a limit was applied IOs were normalized at 32KB, meaning that a 64KB I/O would count as 2 I/Os and a 4KB I/O would count as 1. This was confusing for a lot of people and as of vSphere 6.5 this is no longer the case. When you place a limit of 100 IOPS the VMDKs will be limited at 100 IOPS, regardless of the I/O size. This, by the way, was documented here on storagehub, not sure though how many people realized this.