ESX

ESXi Management Network Resiliency

Duncan Epping · Mar 22, 2011 ·

When we wrote the HA/DRS book both Frank and I were still very much in an “ESX Classic” mindset. Over the last weeks I had questions around resilient network configurations for ESXi. I referred people back to the book but the comments that I got were that the examples were very much ESX Classic instead of ESXi. Now in my opinion the configuration looks very much the same except that “Service Console” will need to be replace with “Management Network” but I figured I might as well just document my preference for a resilient ESXi Management Network as I needed to do it anyway as part of an update of the book to a future version of vSphere.

In our book we give two examples, one of which is the simple version with a single “Service Console Network” and one with a dual “Service Console Network” configuration. Now I figured I could update both but I’d rather do just one and explain why I prefer to use this one. The one that I have picked is the single “Management Network” setup. The main reason for it being is the reduced complexity that it brings and on top of that multiple Management Networks will make sense in an environment where you have many NICs and Switches but with all these converged architectures flying around it doesn’t really make sense anymore to have 4 virtual links when you only have 2 physical. Yes I understand that something can happen to a subnet as well, but if that is the case you have far bigger problems than your HA heartbeat network failing. Another thing to keep in mind is that you can also mitigate some of the risks of running into a false positive by selected a different “Isolation Response”, typically we see these set to “Leave Powered On”.

The below is an excerpt from the book.

Although there are many configurations possible and supported we recommend a simple but highly resilient configuration. We have included the vMotion (VMkernel) network in our example as combining the Management Network and the vMotion network on a single vSwitch is the most commonly used configuration and an industry accepted best practice.

Requirements:

2 physical NICs
VLAN trunking

Recommended:

2 physical switches

The vSwitch should be configured as follows:

vSwitch0: 2 Physical NICs (vmnic0 and vmnic1)
- When multiple physical PCI devices are available make sure to use a port of each to increase resiliency
2 Portgroups (Management Network and vMotion VMkernel)
Management Network active on vmnic0 and standby on vmnic1
vMotion VMkernel active on vmnic1 and standby on vmnic0
Failback set to No

Each portgroup has a VLAN ID assigned and runs dedicated on its own physical NIC; only in the case of a failure it is switched over to the standby NIC. We highly recommend setting failback to “No” to avoid chances of a false positive which can occur when a physical switch routes no traffic during boot but the ports are reported as “up”. (NIC Teaming Tab)

Pros: Only 2 NICs in total are needed for the Management Network and vMotion VMkernel, especially useful in Blade environments. This setup is also less complex.

Cons: Just a single active path for heartbeats.

The following diagram depicts the active/standby scenario:

To increase resiliency we also recommend implementing the following Advanced Settings where the ip-address for “das.isolationaddress” should be a “pingable” device which is reachable by the ESXi hosts, preferably on the same subnet with as little hops as possible:

 das.isolationaddress = <ip-address>
 das.failuredetectiontime = 20000

Thin provisioned disks and VMFS fragmentation, do I really need to worry?

Duncan Epping · Mar 8, 2011 ·

I’ve seen this myth floating around from time to time and as I never publicly wrote about it I figured it was time to write an article to debunk this myth. The question that is often posed is if thin disks will hurt performance due to fragmentation of the blocks allocated on the VMFS volume. I guess we need to rehash (do a search on VMFS for more info) some basics first around Think Disks and VMFS volumes…

When you format a VMFS volume you can select the blocksize (1MB, 2MB, 4MB or 8MB). This blocksize is used when the hypervisor allocates storage for the VMDKs. So when you create a VMDK on an 8MB formatted VMFS volume it will create that VMDK out of 8MB blocks and yes indeed in the case of a 1MB formatted VMFS volume it will use 1MB. Now this blocksize also happens to be the size of the extend that is used for Think Disks. In other words, every time your thin disks needs to expand it will grow in extends of 1MB. (Related to that, with a lazy-thick disk the zero-out also uses the blocksize. So when something needs to be written to an untouched part of the VMDK it will zero out using the blocksize of the VMFS volume.)

So using a thin disk in combination with a small blocksize cause more fragmentation? Yes, more than possibly it would. However the real question is if it will hurt your performance. The answer to that is: No it won’t. The reason for it being that the VMFS blocksize is totally irrelevant when it comes to Guest OS I/O. So lets assume you have an regular Windows VM and this VM is issuing 8KB writes and reads to a 1MB blocksize formatted volume, the hypervisor won’t fetch 1MB as that could cause a substantial overhead… no it would request from the array what was requested by the OS and the array will serve up whatever it is configured to do so. I guess what people are worried about the most is sequential I/O, but think about that for a second or two. How sequential is your I/O when you are looking at it from the Array’s perspective? You have multiple hosts running dozens of VMs accessing who knows how many volumes and subsequently who knows how many spindles. That sequential I/O isn’t as sequential anymore all of a sudden it is?!

<edit> As pointed out many arrays recognize sequential i/o and prefetch which is correct, this doesn’t mean that when contiguous blocks are used it is faster as fragmented blocks also means more spindles etc </edit>

I guess the main take away here is, stop worrying about VMFS it is rock solid and it will get the job done.

How many pages can be shared if Large Pages are broken up?

Duncan Epping · Nov 7, 2010 ·

I have written multiple articles(1, 2, 3, 4) on this topic so hopefully by now everyone knows that Large Pages are not shared by TPS. However when there is contention the large pages will be broken up in small pages and those will be shared based on the outcome of the TPS algorythm. Something I have always wondered and discussed with the developers a while back is if it would be possible to have an indication of how many pages can possibly be shared when Large Pages would be broken down. (Please note that we are talking about Small Pages backed by Large Pages here.) Unfortunately there was no option to reveal this back then.

While watching the VMworld esxtop session “Troubleshooting using ESXTOP for Advanced Users, TA6720” I noticed something really cool. Which is shown in the quote and the screenshot below which is taken from the session. Other new metrics that were revealed in this session and shown in this screenshot are around Memory Compression. I guess the screenshot speaks for itself.

COWH : Copy on Write Pages hints – amount of memory in MB that are potentially shareable,

Potentially shareable which are not shared. for instance when large pages are used, this is a good hint!!

There was more cool stuff in this session that I will be posting about this week, or at least adding to my esxtop page for completeness.

RT: VMware Event – London – 8th Oct – Not to be missed!

Duncan Epping · Oct 4, 2010 ·

I was just talking to Mr Alan Renouf and it appears that there are a couple of free slots left at the API/PowerCLI event that VMware has organized in coöperation with Alan on the 8th of October in London.

If you are in London on the 8th October 2010 then you could be in for a treat, VMware are arranging a fantastic event, well worth the visit and best of all its free !

The event is called: Managing vSphere in large environments using APIs and PowerCLI

There are limited spaces available so act now or you will miss out, some of the most fantastic minds of VMware will be gracing London with their presence before heading out to VMworld Copenhagen.

Think of this as a taster of the kind of things you can expect from Technology Exchange, the contents are listed below, I would recommend this to any VMware admins who are managing large implementations of vCenter, there will be some great detail in these sessions.

If you would like to attend please send an email to PowerCLIEvent@virtu-al.net with your name and company, this will strictly be on a first come first serve basis as there are limited numbers.

Exploring VMware APIs

Speaker: Preetham Gopalaswamy

vSphere APIs for Performance Monitoring

Speaker: Balaji Parimi, Ravi Soundararajan

Of course Alan really focussed on the API part of the event, but that is not all there is. If you thought my esxtop page was useful, make sure to attend this event as this is the best part of the day in my opinion:

Advanced performance troubleshooting using esxtop

Level: Advanced

Length: 60 minutes

This talk will teach you how to spot tricky performance issues using the various counters in esxtop.

Speaker: Krishna Raj Raja, Staff Engineer, Performance Team

If you are in the UK and can’t make it to VMworld, this is your chance to catch some of the top experts and get to know the API and esxtop inside out!

vCD – Allocation Models

Duncan Epping · Sep 22, 2010 ·

Over the last two weeks I received multiple questions about the VMware vCloud Director (vCD) Allocation Models. vCD has three different type of allocation models which are used to allocate resources to a tenant. We will first reiterate some of the basic concepts of VMware Cloud Director before we dive into these allocation models.

vCD is an abstraction layer. vCD is a layer on top of vCenter and abstracts all the resources vCenter manages. All these resources are combined into large pools for cloud tenants to consume. (Read Frank’s excellent article on these resource groups.) Within VMware vSphere clusters and resource pools are used as resource containers. As a cloud tenant is unaware of the technology used underneath vCD has introduced the concept of Virtual Data Center’s (vDC).

There are two different types of vDC’s:

Provider vDC
Organization vDC (Org vDC)

A Provider vDC is a pool of CPU and memory resources combined with the storage resources you selected. A provider vDC can be a VMware vSphere Cluster or a Resource Pool and a provider vDC is the source for Org vDCs. Org vDCs are used within vCD to partition Provider vDCs and to allocate resources to an organization. vCD uses regular vSphere Resource Pools to partition these resources. So each Org vDC corresponds with a resource pool on the vSphere layer. So what is this Allocation Model that we will be discussing? Where does the Allocation Model come into play?

An Allocation Model is defined on an Org vDC level. The Diagram below depicts a scenario where four Org vDCs have been created in a single Provider vDC. The Provider vDC has a 1:1 relationship with the cluster in this case.

Allocation Models

Allocation Models within vCD are used to determine how resources are allocated to the Organization vDC and more than likely how your customer will be charged for these resources. Currently vCD has three different types of allocation models. These allocation models are listed below, with the original description provided by vCD.

Allocation Pool
Only a percentage of the resources you allocate are committed to the organization vDC. You can specify the percentage, which allows you to overcommit resources.
Pay-As-You-Go
Allocated resources are only committed when users create vApps in the organization vDC. You can specify the maximum amount of CPU and memory resources to commit to the organization vDC.
Reservation Pool
All of the resources you allocate are committed to the organization vDC.

Each allocation model has its own characteristics. These characteristics can be placed in two categories:

Virtual Machine Level
Resource Pool Level

The distinction between each of the Allocation Models is how resources are allocated or to use vSphere terminology how resources are reserved or limited. In order to guarantee resources to a VM or to an entire pool vCD will set a reservation. To avoid extreme contention vCD will also cap your resources by setting a limit. These reservations and limits will be set on a VM or Resource Pool level depending on the chosen allocation model.

For each allocation model shown in the screenshot below the characteristics will be broken down in the following sections.

Allocation Pool

This is the default model, the Allocation Pool. This model allows you to specify the amount of resources for your Org vDC and also specify how much of these resources are guaranteed. Guaranteed means that a reservation will be set and that the amount of guaranteed resources is carved out of your Provider vDC. The total amount of resources specified is your upper boundary, also known as your resource pool limit. I think a diagram will explain it a bit better in this case:

Characteristics

Each model as stated before has very specific characteristics. Each of these are listed below:

Pool of resources of which a percentage will be guaranteed
- A reservation will be set to guarantee resources on a resource pool level
- By default the resource pool reservations on CPU is 0% and memory 100%
- Tenant has a guaranteed set of resources and has the ability to burst to the upper limit
VM Level characteristics
- No reservations or limits set on a per VM level for CPU
- Reservations set on a per VM level for memory. This reservation is based on the percentage of guaranteed resources

These characteristics are more in detail explained in the following example.

Allocation Pool example

In the example below the tenant requested 10GHz of CPU resources with a guarantee of 25% and 10GB of memory with a guarantee of 100%.

Note that the default for CPU is a 0% Guarantee which has been changed to 25% on CPU as requested by the tenant. Also note that you can specify the maximum amount of VMs that can be deployed in this org vDC. This will also help you restrict the level of over commitment and will be discussed later.

These settings on a vCD level result in a resource pool with a reservation of 2500MHz and a limit of 10.000MHz. As vCD describes it, 2500MHz are guaranteed and the customer will have the ability to burst up to 10.000MHz. The resource pool that is created and its properties are shown in the below screenshot.

As described in the characteristics section no CPU reservation or limit is set on a VM level which is shown in the following screenshot.

This is different on memory however. On memory both a reservation and a limit has been defined. The limit always equals the provisioned memory and the reservation equals the guaranteed memory as defined of part of the Allocation Model. However, in our example we set the percentage of guaranteed memory to 100% which results in a reservation and a limit of both 512MB.

For the sake of completeness we have changed the guaranteed resources for memory to 50%. Of course the resource pool has changed accordingly. As shown in the screenshot below the reservation on memory has changed to 5120, which “happens” to be 50% of 10240.

This will in its turn also has set the per VM level memory reservation to 50% of the provisioned memory. In the screenshot below the VM was provisioned with 512MB of which 256MB has been reserved.

Summary

As shown in the example the percentage of guaranteed resources will have its impact on the implementation of the resource pool and the limits and reservations associated with these. It is recommended to align the maximum number of VMs with the specified amount of GHz and GB. When limiting the amount of VMs an extreme level of over commitment within the Org vDC can be avoided.

Pay-As-You-Go

This is the model many of you are familiar with and many also adopted in their company, pay-as-you-go. This model allows you to specify the amount of guaranteed resources per VM. When the percentage of guaranteed resources is set to 100% a reservation is set to 100% of what has been allocated that particular VM. Where this model differentiates itself from any of the other models is that it allows you to limit the vCPU speed. By default the vCPU speed is set to 0.26GHz, the impact of this vCPU speed will be shown in example below but be noted that this vCPU speed is set on the VM as a hard limit.

Characteristics

Percentage of resources guaranteed on a per VM level
- A reservation will be set on a VM level
- By default the VM reservation on CPU is 0% and memory 100%
- By default the vCPU speed is set to 0.26GHz, which means you vCPU will be limited to 0.26GHz
The Org vDC resource pool is just an accumulation of all reservations set on a per VM level

These characteristics are more in detail explained in the following example.

Pay-As-You-Go example

In the example below the customer requests an Org vDC based on the Pay-As-You-Go Org allocation model with a vCPU speed of 0.26GHz of which 25% is guaranteed. Memory will be left to the default setting which is a 100% guarantee.

Note that the default for CPU is a 0% Guarantee which I changed to 25% on CPU to see the impact. Also note that the vCPU Speed is set by default to 0.26GHz. This is something that you will want to increase! When the Org vDC is created it results in a resource pool with the following characteristics.

As described in the characteristics when vApps are created the associated reservations will be accumulated into the resource pool. When a vApp is created with two VMs (each having a single vCPU and 512MB of memory) this will result in the following changes, note that both the reservation on memory and CPU have changed.

As can be seen on a Resource Pool level a reservation is set of 130MHz which is 25% of 2 x 0.26GHz. A guarantee of 100% was set on memory which translates to 1221MB in total. (Note that a resource pool includes the Memory Overhead of virtualization. See the Resource Management Guide for more details)

As stated the resource pool is jut an accumulation of all VMs. The following is what has been set on a per VM level, first screenshot shows the CPU details and the second shows the Memory details:

As can be seen a reservation has been set of 65MHz and a limit of 260MHz on CPU. For memory a 512MB reservation and limit has been set. If the “guarantee” on memory is set to 50% than the reservation of memory on this VM would be 256MB.

Summary

The tenant will have guaranteed resources per VM. As shown the vCPU speed defined is the limiting factor for a VM. For user experience it is recommended to select at least 1GHz as the vCPU speed. The resource pool created as part of the Org vDC only accumulates reserved resources and does not limit the VMs. Limits are placed on a per VM level. Please note that if the “guarantee” is set to 100% a reservation equal to the limit is set. In the case of a 1GHz reservation this can lead to very conservative consolidation ratios.

Reservation Pool

The third and final Allocation Model is the Reservation Pool. In this model all resources are guaranteed by default. It could be compared to an allocation pool with all guarantees set to 100% and is fairly straight forward.

Characteristics

Fully guaranteed pool of resources
- A reservation will be set to guarantee resources on a resource pool level
No reservations or limits set on a per VM level for CPU

These characteristics are more in detail explained in the following example.

Reservation Pool example

In the example below the customer requested 10GHz of CPU resources. As stated, the default guarantee is 100%. As can be seen in the screenshot there is no way to dictate the amount of MHz per vCPU or a Guarantee for that matter.

As expected that results in a resource pool with a limit equal to the reservation as shown in below screenshot.

On a per VM level no reservations or limits are set. This is shown in the following two screenshots.

Summary

As shown the Reservation Pool allocation model is fairly straight forward. A limit equal to the reservation is set on a resource pool level which gives the tenant a guaranteed pool of resources. There are no limits or reservations set on a per VM level which gives the tenant a lot of flexibility. However, this flexibility could lead to severe over commitment within the Org vDC. As such it is recommended to limit the amount of VMs to avoid a degraded user experience.

Conclusion

Each of the three allocation models has its own unique characteristics to fit your customers needs. Is your customer looking for a pool of resources of which a chunk is guaranteed so that they can burst while having a fixed price per month and only pay for the burst space when they are using it? Allocation Pool would be the right answer! Do they have very transient workloads and do they only want to pay for these when using it? Pay-as-you-go would be the best fit. Is your customer just looking to expand their Datacenter with a “dedicated” chunk of resources? Reservation Pool does just that.

However, think before you decide which Allocation Model you will use, think about the amount of resources guaranteed and make sure the amount of resources add up with the max amount of VMs that can be deployed in the Org vDC. One of the most important thing when you are selling a service is user experience!

After I wrote all of this one of my colleagues(Thanks Andy!) was so kind to transform it into a table. This says it all in just two simple tables. The first table describes the characteristics on a Resource Pool level and the second on a Virtual Machine level. Enjoy: