Server

My top VMworld session picks

Duncan Epping · Aug 7, 2018 ·

Every year I post a list of my favorite VMworld sessions, my top picks. There are way too many sessions to see, but these are definitely the sessions I would like to attend personally. That could be because of the speaker, or the content, and preferably both. Yes I know, this list will have some great sessions missing, not because I did not like the abstract or speaker, but simply because I forced myself to limit this list to 10. Before we get started, here are the two sessions I have scheduled, make sure to sign up for those while you still can, as both seem to be at 80+ % capacity right now

The Power of Storage Policy-Based Management [HCI1270BU] – Cormac Hogan & Duncan Epping
Tuesday, Aug 28, 12:30 p.m. – 1:30 p.m.
The world of software-defined storage moves at a rapid pace, and VMware is one of the biggest enablers. In this session, Cormac and Duncan will guide you through the world of software-defined storage initiatives at VMware and provide a primer to VMware vSAN, VMware Virtual Volumes (VVol), persistent cloud-native storage options (Project Hatchway), the VMware vSphere APIs for I/O filtering, and the binding factor in these cases: storage policy-based management. Be warned: We will bring demos!
vSphere Clustering Deep Dive, Part 1: vSphere HA and DRS [VIN1249BU] – Frank Denneman & Duncan Epping
Monday, Aug 27, 12:30 p.m. – 1:30 p.m.
In this session, Duncan and Frank will take you through the trenches of VMware vSphere Distributed Resource Scheduler (DRS) and vSphere High Availability (HA). Find out about options to optimize your DRS settings for your specific requirements and goals, such as if you should be load balancing on active or consumed memory, as well as what has recently changed in the DRS algorithm and if it will impact DRS behavior. And for vSphere HA, you will learn about when it restarts virtual machines (VMs), what kind of restart times to expect, and where you can find evidence that a VM (or multiple) have been restarted. You will find out about all of these items and more. Prepare to dive deep, as the basics will not be covered.

Here are my top picks, note that although I picked Ravi’s session from the Extreme Performance Series, all of them are worth attending!

Extreme Performance Series: vCenter Performance Deep Dive [VIN1759BU] Ravi Soundararajan
Tuesday, Aug 28, 5:00 p.m. – 6:00 p.m.
In this talk, you will get a brief description of the internals of VMware vCenter before going into basic performance troubleshooting and monitoring techniques. Find out about various tools for analyzing resource usage, important metrics like sessions and API calls, and database performance (primarily for the vCenter Server Appliance, but also for vCenter Server for Windows). You will get to understand the differences between vCenter and Platform Services Controller, and consider the impact of linked mode and plug-ins/extensions. By the end of the talk, you’ll understand how your vCenter works, when you may need multiple vCenters, and how Platform Services Controller factors into performance. xPerfSeries
Tech Preview: The Road to a Declarative Compute Control Plane [VIN2256BU] Maarten Wiggers & Frank Denneman
Tuesday, Aug 28, 12:30 p.m. – 1:30 p.m.
Declarative control planes are becoming increasingly popular in the industry. Instead of explicitly defining configurations, declarative control planes tell the architecture what the desired state should be. The desired state could be high priority, or keep particular VMs or containers separate. Within the software-defined data center (SDDC), VMware vSphere offers two declarative control planes: one for networking and one for storage. However, there is no declarative control plane for compute yet.
Compute policy provides a framework to allow our customers the flexibility and control of VM placement and resourcing decisions based on the user’s encompassing application needs. In this session, you will learn about the capabilities introduced in the VMware Cloud SDDC as a path to achieve that goal.
Clustering Deep Dive 2: Quality Control with DRS and Network I/O Control [VIN1735BU] Niels Hagoort & Sahan Gamage
Tuesday, Aug 28, 2:00 p.m. – 3:00 p.m.
In this session, you will go through the trenches of network-aware VMware vSphere DRS and vSphere Network I/O Control. You may ask yourself what these two have to do with each other as, unfortunately, not many people know about the enhancements added to the DRS algorithm around network-aware load balancing. If you want to understand how this can help prevent problems from occurring with network-intensive workloads like NFV, then this is a session you cannot miss!
Project Fractal – The Easy Button for Edge Computing [IOT2593BU] – Dennis Lu & Sridevi Ravuri
Tuesday, Aug 28, 4:00 p.m. – 5:00 p.m.
Come and learn about how VMware can accelerate your adoption of Edge Computing by dealing with the additional complexity and cost of infrastructure management at the Edge, helping you quickly achieve the cost savings and revenue growth benefits of Edge Computing. This is also a great opportunity to shape the direction of VMware’s edge services to help fit customer needs.
vSAN Deployment Topology and Availability Deep Dive: What You Need to Know [HCI2040BU] Paudie O’Riordan & Mansi Shah
Wednesday, Aug 29, 8:00 a.m. – 9:00 a.m.
Today, VMware vSAN can be deployed in many different form factors; for example, vSAN 2-Node ROBO, vSAN Fault domains, Stretch Cluster with and without local protection, and more. These deployment models make vSAN quite flexible and unique. This session will help you understand the different trade-offs and focus on the benefits and overheads of the choice you’ve made in your vSAN proposed design. Join Mansi and Paudie as they discuss these topologies in depth from both an engineering perspective and a practical real-world implementation. Paudie and Mansi will take a no-nonsense review of how to approach designing a fault-tolerant vSAN deployment and give real-world examples of how to achieve the best design from both an availability and performance perspective.
Top 10 Automation Requests and How You Can Save Time [VIN2527BU] Alan Renouf & William Lam
Monday, Aug 27, 2:00 p.m. – 3:00 p.m.
After working firstly as customers and secondly at VMware, Alan and William have encountered hundreds of ways to save time through automation. In this session, they will take you through the top automation requests and how they were completed, teaching you not only how to reproduce them yourself, but also giving you a framework to enable you to automate your top 10 requests.
This session will include a number of techniques and languages, such as PowerShell, PowerCLI, Python, Java, .NET, and simple web applications with JavaScript.
Data Lifecycle Management in Hybrid Clouds [HCI1705BU] Christos Karamanolis & Ilya Languev
Tuesday, Aug 28, 2:00 p.m. – 3:00 p.m.
The focus of IT and DevOps organizations is shifting from storage toward data management independent of infrastructure and locations. This trend is partly driven by a new generation of applications that extract business value from data (big data, analytics, machine learning). Customers need cost-effective data storage but also data mobility, copy management, and on-demand access as business requirements and IT investments evolve. Join Christos Karamanolis (CTO, Storage and Availability) and Ilya Languev (Principal Engineer) as they outline the VMware vision around data lifecycle management that spans private data centers and public clouds. They will discuss VMware’s R&D investments in this space and use real-world examples and demos to highlight the benefits for our customers, both for traditional and cloud-native applications.
VMware CTO Panel: What’s Over the Horizon? [CTO3496PU] Ray O’Farrell, Christos Karamanolis, Chris Wolf, Shawn Bass, Pere Monclus
Tuesday, Aug 28, 5:30 p.m. – 6:30 p.m.
VMware CTOs spend significant time assessing emerging technology trends, taking a practical look at their potential impacts and opportunities for VMware. This session explores emerging areas, inclusive of edge, the Internet of things, artificial intelligence (AI)/machine learning (ML), SD-WAN and network service mesh, distributed data management, and more. There will also be ample time for you to have your most pressing questions answered.
Smart Placement of Workloads in Tomorrow’s Distributed Cloud [CTO2161BU] Daniel Beveridge
Tuesday, Aug 28, 1:00 p.m. – 2:00 p.m.
This session will offer a look at the evolution of cloud as we move from a nega-cloud-focused experience into a more distributed cloud experience where compute evolves toward a mesh of resources. Find out about a technology project sponsored by VMware’s Office of the CTO that has developed a novel approach to the placement of workloads in a vast marketplace of providers, resulting in a seamless cloud burst experience across a range of providers. You will learn about some cutting-edge cloud technology that points toward a new way of consuming cloud services with an emphasis on reducing cost, improving user experience, and offering increased flexibility and agility in workload management.
Optimizing vSAN for Performance [HCI1246BU] Cormac Hogan & Paudie O’Riordan
Tuesday, Aug 28, 3:30 p.m. – 4:30 p.m.
The VMware vSAN team gets many questions on performance. For example, does adding a second disk group improve performance? Does adding a stripe width to an object make things faster? Does increasing the MTU size matter? Does mixing SAS and SATA make a difference? Join this session for answers to these sorts of questions. Paudie and Cormac will discuss the results of various performance tests they initiated in their labs to reach these conclusions. You will learn about the benchmark tool of choice, HCIBench, as well as all the different nuances that can make a difference to your benchmarking results.

Also note, there’s a long list of “deep dive” session at vmworld this year, do a search and register before it is too late!

Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster

Duncan Epping · Jul 12, 2018 ·

I was talking to the HA team this week, specifically about the upcoming HA book. One thing they mentioned is that people still seem to struggle with the concept of admission control. This is the reason I wrote a whole chapter on it years ago, yet there still seems to be a lot of confusion. One thing that is not clear to people is the percentage calculations. We have some customers with VMs with extremely large reservations, in that case instead of using the “slot policy” they typically switch to “percentage based policy”. Simply as the percentage based policy is a lot more flexible.

However, recently we have had some customers that hit the following error message:

Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster

This error message, in the case of these particular situations (yes there was a bug as well, read this article on that), set the percentage lower than what would equal a full host. In other words, in a 4 host environment, a single host would equal 25%. In some cases, customers would set the percentage to a value lower than 25%. I am personally not sure why anyone would do this as it contradicts the whole essence of admission control. Nevertheless, it happens.

This message indicates that you may not have sufficient resources, in the case of a host failure, to restart all the VMs. This of course is the result of the percentage being set lower than the value that would equal a single host. Note though, this does not stop you from powering on new VMs. You will only be stopped from powering on new VMs when you exceed the available unreserved resources.

So if you are seeing this error message, please verify the configured percentage if you set it manually. Ensure that at a minimum it equals the largest host in the cluster.

** back to finalizing the book **

vSphere 6.5 and limits, do they still apply to SvMotion?

Duncan Epping · Jul 11, 2018 ·

I had a question this week and I thought I wrote about this before but apparently, I did not. Hopefully by now most of you the I/O scheduler has changed over the past couple of years. Within ESXi, we moved to a new scheduler, which often is referred to as mClock. This new scheduler, and a new version of SIOC (storage i/o control) also resulted in some behavioral changes. Some which may be obvious, others which are not. I want to explicitly point out two things which I have discussed in the past which changed with vSphere 6.5 (and 6.7 as such) and both are around the use of limits.

First of all: Limits and SvMotion. In the past when a limit was applied to a VM, this would also artificially limit SvMotion. As of vSphere 6.5 (may apply to 6.0 as well but I have not verified this) this is no longer the case. Starting vSphere 6.0 the I/O scheduler creates a queue for every file on the file system (VMFS), this to avoid for instance a VM stalling other types (metadata) of IO. The queues are called SchedQ and briefly described by Cormac Hogan here. Of course, there’s a lot more to it than Cormac or I discuss here, but I am not sure how much I can share so I am not going to go there. Either way, if you were used to limits being applied to SvMotion as well you are warned… this is no longer the case.

Secondly, the normalization of I/Os changed with limits. In the past when a limit was applied IOs were normalized at 32KB, meaning that a 64KB I/O would count as 2 I/Os and a 4KB I/O would count as 1. This was confusing for a lot of people and as of vSphere 6.5 this is no longer the case. When you place a limit of 100 IOPS the VMDKs will be limited at 100 IOPS, regardless of the I/O size. This, by the way, was documented here on storagehub, not sure though how many people realized this.

Opvizor Performance Analyzer for vSAN

Duncan Epping · Jul 10, 2018 ·

At a VMUG a couple of months ago I bumped into my old friend Dennis Zimmer. Dennis told me that he was working on something cool for vSAN but couldn’t reveal what it was just yet. Last week I had a call with Dennis about what that thing was. Dennis is the CEO for Opvizor, and some of you may recall the different tooling that Opvizor has produced over the years, of which the Health Analyzer was probably the most famous one back then. I’ve used it in the past on various occasions and I had various customers using it. During the briefing, Dennis explained to me that Opvizor started focussing on performance monitoring and analytics a while ago as the health analyzer market was overly crowded and had the issue that is was a one-off business (checks once in a while instead of daily use). On top of that, many products now come with some form of health analysis included. (See vSAN for instance.) I have to agree with Dennis, so this pivot towards Performance Monitoring makes much sense to me.

Dennis explained to me how they are seeing more and more customer demand for vSAN performance monitoring especially combined with VMware ESXi, VM and App data. Although vCenter has various metrics, and there’s VROps, he told me that Opvizor has many customers who need more than vCenter or vROPS standard has to offer today and don’t own VROps advanced. This is where Opvizor Performance Analyzer comes in to play and that is why today Opvizor announced they are including vSAN specific dashboards. Now, this isn’t just for vSAN of course. Opvizor Performance Analyzer includes not just vSAN but also vSphere and various other parts of the stack. When talking with Dennis one thing became clear, Opvizor is taking a different approach than most other solutions. Where most focus on simplifying, hiding, and aggregating, the focus for Opvizor is on providing as much relevant detail as possible to fulfill the needs of beginner and professional.

So how does it work? Opvizor provides a virtual appliance. You simply deploy it in your environment and connect it to vCenter and you are ready to go. The appliance collects data every 5 minutes (but 20 seconds intervals of these 5 minutes) and has a retention of up to 5 years. As I said, the focus is on infrastructure statistics and performance analytics and as such Opvizor delivers all the data you ever need.

It doesn’t just provide you with all the info you will ever need. It will also allow you to overlay different metrics, which makes performance troubleshooting a lot easier, and will allow you to correlate and pinpoint particular problems. Opvizor comes with dashboards for various aspects, here are the ones included in the upcoming release for vSAN:

Capacity and Balance
Storage Diskgroup Stats
VM View
Physical disk latency breakdown
Cache Diskgroup stats
vSAN Monitor

Now I said this is the expert´s troubleshooting tool, but Opvizor Performance Analyzer also provided in-depth information about what each metric is / means and provides starter dashboards for beginners. You can simply click on the “i” in the top left corner of the widget and you get all the info about that particular widget.

When you do know what you are looking for you can click, hover, and zoom when needed. Hover over the specific section in the graph and the point in time values of the metrics will pop up. In the case below I was drilling down on a VM in the vSAN cluster and looking at write latency in specific. As you can see we have 3 objects and in particular 2 disks and a “vm name space”.

And this is just a random example, there are many metrics to look at and many different widgets and overviews. Just to give you an idea, here are some of the metrics you can find in the UI:

Latency (for all different components of the stack)
IOPs (for all different components of the stack)
Bandwidth (for all different components of the stack)
Congestion (for all different components of the stack)
Outstanding I/O (for all different components of the stack)
Read Cache Hit rate (for all different components of the stack)\
ESXi vSAN host disk usage
ESXi vSAN host cpu usage
Number of Components
Disk Usage
Cache Usage

And there;s much more, too many to list in this blog. And again, not just vSAN, but there are many dashboards to chose from. If you don’t have a performance monitoring solution yet and you are evaluating solutions like SolarWinds, Turbunomics and others make sure to add Opvizor to that list. One thing I have to say, I spotted a couple of things that I liked to see changed, and I think within 24hrs the Opvizor guys managed to incorporate the feedback. That was a crazy fast turnaround, good to see how receptive they are.

Oh, one more thing I found in the interface, it is these dashboards that deal with things like NUMA. But also things like the Top 10 VMs in terms of IOPS. Both very useful, especially when doing deep performance troubleshooting and optimizing.

I hope that gives you a sense of what they can do. There’s a fully functional 30-day trial, check it out if you want to find out more about Performance Analyzer or simply just want to play around with it. Opvizor announced this brand new version on their own blog here, make sure to give that a read as well!

Adding a fifth (virtual) ESXi host to vCenter Foundation

Duncan Epping · Jul 6, 2018 ·

When running a 4 node stretched cluster environment it should be possible to use “cheaper” vCenter Server licenses, namely vCenter Foundation. One of the limitations of vCenter Foundation is that you can only manage 4 hosts with it. This is where some customers who wanted to manage a stretched cluster hit some issues. The issue occurs at the point where you want to add the Witness VM to the inventory. Deploying the VM, of course, works fine, but it becomes problematic when you add the virtual ESXi host (Witness Appliance) to the vCenter Foundation instance as vCenter simply will not allow you to add a 5th host. Yes, this 5th host would be a witness, and will not be running any VMs, and even has a special license. Yet, the “add host” wizard does not differentiate between a regular host and a virtual witness appliance.

Fortunately, there’s a workaround. It is fairly straightforward, and it has to do with the order in which you add hosts to vCenter Foundation. If you add the witness VM before the physical hosts then the appliance is not counted against the license. The license count (and allocation) apparently happens after the host has been added, but somehow vCenter does validate beforehand. I guess we do this to avoid abuse.

So if you have vCenter Foundation, and want to build a stretched cluster leveraging a 2+2+1 configuration, meaning 4 physical hosts and 1 witness VM, then simply add the Witness VM to the inventory as a host first and then add the rest. For those wondering, yes this is documented in the release notes of vSphere 6.5 Update, all the way at the bottom.