design

Management Cluster / vShield Resiliency?

Duncan Epping · Feb 14, 2011 ·

I was reading Scott’s article about using dedicate clusters for management applications. Which was quickly followed by a bunch of quotes turned into an article by Beth P. from Techtarget. Scott mentions that he had posed the original question on twitter if people were doing dedicated management clusters and if so why.

As he mentioned only a few responded and the reason for that is simple, hardly anyone is doing dedicated management clusters these days. The few environments that I have seen doing it were large enterprise environments or service providers where this was part of an internal policy. Basically in those cases a policy would state that “management applications cannot be hosted on the platform it is managing”, and some even went a step further where these management applications were not even allowed to be hosted in the same physical datacenter. Scott’s article was quickly turned in to a “availability concerns” article by Techtarget to which I want to respond. I am by no means a vShield expert, but I do know a thing or two about the product and the platform it is hosted on.

I’ll use vShield Edge and vShield Manager as an example as in Scott’s article vCloud Director is mentioned which leverages vShield Edge. This means that vShield Manager needs to be deployed in order to manage the edge devices. I was part of the team who was responsible for the vCloud Reference Architecture but also part of the team who designed and deployed the first vCloud environment in EMEA. Our customer had their worries as well about resiliency of vShield Manager and vShield Edge, but as they are virtual they can easily be “protected” by leveraging vSphere features. One thing I want to point out though, if vShield Manager is down vShield Edge will continue to function so no need to worry there. I created the following table to display how vShield Manager and vShield Edge can be “protected”.

Product	vShield Manager	VMware HA	VM Monitoring	VMware FT
vShield Manager	Yes (*)	Yes	Yes	Yes
vShield Edge	Yes (*)	Yes	Yes	Yes

Not only would you be able to leverage these standard vSphere technologies there is more that can be leveraged:

Please don’t get me wrong here, there are always methods to get locked out but as Edward Haletky stated “In fact, the way vShield Manager locks down the infrastructure upon failure is in keeping with longstanding security best practices”. (Quote from Beth P’s article) I also would not want my door to be opened up automatically when there is something wrong with my lock. The trick though is to prevent a “broken lock” situation from occurring and to utilize vSphere capabilities in such a way that the last known state can be safely recovered if it would.

As always an architect/consultant will need to work with all the requirements and constraints and based on the capabilities of a product come up with a solution that offers maximum resiliency and with the mentioned options above you can’t tell me that VMware doesn’t provide these

Using the vSphere Plan & Design Kit

Duncan Epping · Feb 2, 2011 ·

As part of my role I very often review design documents that other consultants/architect have written, and not only those of VMware employees but also from external people. On top of that of course I also see a lot of VCDX application packages pass by. Something struck me the other day when I was doing the 3rd review in just a couple of hours and I started thinking about the designs I had reviewed so far and noticed there was a common theme.

Before I get started I want to make sure everyone understands that I believe there’s a very strong value to using standardized templates / frameworks. So don’t misinterpreted this article.

I know that many of you are consultants/architects and leverage the Plan & Design kit that VMware PSO created or have an internally developed template that might or might not be based on this P&D Kit. (If your a VMware Partners and wonder what this kit is, log in to the partner portal and look around!) The Plan & Design kit is basically a template, although the hot word these days is framework, that lays out the foundation for a vSphere 4.x design. I guess “framework” or “template” already reveals how it should be used but lately I have been noticing, and yes even VCDX submissions, that people are trying to cut corners and skip sections or use the defaults. I guess by now most of you are thinking “well that doesn’t apply to me”, but lets be honest here when you use the same template for years you start to get lazy. I know I do.

While there is absolutely nothing wrong with using the template and adopting the best practices mentioned in this template, this only goes when they are used in the right context. The framework that VMware for instance provides contains many examples of how you could implement something, and the ones provided are usually the best practice. That doesn’t necessarily mean though that this best practice meets your customer’s requirement or can be used based on the constraints this environment/customer has. Just to give an example of something that I see in 90% of the designs I review:

Max amount of VMs per datastore 15
Datastore size 500GB
Justification: To reduce SCSI reservations

This used to be a best practice and probably a very valid design decision in most cases. However over the last 3 version the locking mechanism has been severely improved. On top of that even more recently VAAI was introduced and the risks were reduced because of that. Along the way the number 15 got bumped up to 20-25, depending on the workload and the RTO. Based on those technology changes your best practice and template should have been updated, or at a minimum explain what the “new” reason is for sticking with these values.

Every single time you write that new design challenge your decisions, go over these best practices and make sure they still apply. Every time a new version of the product is released validate the best practices and standardized design decisions and change them accordingly to benefit from these features.

Storage IO Control Best Practices

Duncan Epping · Oct 19, 2010 ·

After attending Irfan Ahmad’s session on Storage IO Control at VMworld I had the pleasure to sit down with Irfan and discuss SIOC. Irfan was so kind to review my SIOC articles(1, 2) and we discussed a couple of other things as well. The discussion and the Storage IO Control session contained some real gems and before my brain resets itself I wanted to have these documented.

Storage IO Control Best Practices:

Enable Storage IO Control on all datastores
Avoid external access for SIOC enabled datastores
- To avoid any interference SIOC will stop throttling, more info here.
When multiple datastores share the same set of spindles ensure all have SIOC enabled with comparable settings and all have sioc enabled.
Change latency threshold based on used storage media type:
- For FC storage the recommended latency threshold is 20 – 30 MS
- For SAS storage the recommended latency threshold is 20 – 30 MS
- For SATA storage the recommended latency threshold is 30 – 50 MS
- For SSD storage the recommended latency threshold is 15 – 20 MS
Define a limit per VM for IOPS to avoid a single VM flooding the array
- For instance limit the amount of IOPS per VM to a 1000

SIOC, tying up some loose ends

Duncan Epping · Oct 8, 2010 ·

After my initial post about Storage IO Control I received a whole bunch of questions. Instead of replying via the commenting system I decided to add them to a blog post as it would be useful for everyone to read this. Now I figured this stuff out be reading the PARDA whitepaper 6 times and by going through the log files and CLI of my ESXi host, this is not cast in stone. If anyone has any additional question don’t hesitate to ask them and I’ll be happy to add them and try to answer them!

Here are the questions with the answers underneath in italic:

Q: Why is SIOC not enabled by default?
A: As datastores can be shared between clusters, clusters could be differently licensed and as such SIOC is not enabled by default.
Q: If vCenter is only needed when enabling the feature, who will keep track of latencies when a datastore is shared between multiple hosts?
A: Latency values are actually stored on the Datastore itself. From the PARDA academic paper, I figured two methods could be used for this either through network communication or as stated by using the Datastore. Notice the file “iormstat.sf” in green in the screenshot below, I guess that answers the question… the datastore itself is used to communicate the latency of a datastore. I also confirmed with Irfan that my assessment was true.
Q: Where does datastore-wide disk scheduler run from?
A: The datastore-wide disk scheduler is essentially SIOC or also known as the “PARDA Control Algorithm” and runs on each host sharing that datastore. PARDA consists of two key components which are “latency estimation” and “window size computation”. Latency estimation is used to detect if SIOC needs to throttle queues to ensure each VM gets its fair share. Window size computation is used to calculate what this queue depth should be for your host.
Q: Is PARDA also responsible for throttling the VM?
A: No, PARDA itself or better said the two major processes that form PARDA (latency estimation and window size computation) don’t control “host local” fairness, the Local scheduler (SFQ) is responsible for that.
Q: Can we in any way control the I/O contention in vCD VM environment (say one VM running high I/O impacting another VM running on same host/datastore)
A: I would highly recommend to enable this in vCloud Environments to prevent storage based DoS attacks (or just noisy neighbors) and to ensure IO fairness can be preserved. This is one of the reasons VMware developed this mechanism.
Q: I can’t enable SIOC with an Enterprise licence – “License not available to perform the operation”. Is it Enterprise Plus only?
A: SIOC requires Enterprise Plus
Q: Can I verify what the Latency is?
A: Yes you can, go to the Host – Performance Tab and select “Datastore”, “Real Time”, select the datastore and select “Storage I/O Control normalized latency”. Please note that the unit for measurement is microseconds!
Q: This doesn’t appear to work in NFS?
A: SIOC can only be enabled on VMFS volumes currently.

If you happen to be at VMworld next week, make sure to attend this session: TA8233 Prioritizing Storage Resource Allocation in ESX Based Virtual Environments Using Storage I/O Control!

Storage I/O Fairness

Duncan Epping · Sep 29, 2010 ·

I was preparing a post on Storage I/O Control (SIOC) when I noticed this article by Alex Bakman. Alex managed to capture the essence of SIOC in just two sentences.

Without setting the shares you can simply enable Storage I/O controls on each datastore. This will prevent any one VM from monopolizing the datatore by leveling out all requests for I/O that the datastore receives.

This is exactly the reason why I would recommend anyone who has a large environment, and even more specifically in cloud environments, to enable SIOC. Especially in very large environments where compute, storage and network resources are designed to accommodate the highest common factor it is important to ensure that all entities can claim their fair share of resource and in this case SIOC will do just that.

Now the question is how does this actually work? I already wrote a short article on it a while back but I guess it can’t hurt to reiterate thing and to expand a bit.

First a bunch of facts I wanted to make sure were documented:

SIOC is disabled by default
SIOC needs to be enabled on a per Datastore level
SIOC only engages when a specific level of latency has been reached
SIOC has a default latency threshold of 30MS
SIOC uses an average latency across hosts
SIOC uses disk shares to assign I/O queue slots
SIOC does not use vCenter, except for enabling the feature

When SIOC is enabled disk shares are used to give each VM its fair share of resources in times of contention. Contention in this case is measured in latency. As stated above when latency is equal or higher than 30MS, and the statistics around this are computed every 4 seconds, the “datastore-wide disk scheduler” will determine which action to take to reduce the overall / average latency and increase fairness. I guess the best way to explain what happens is by using an example.

As stated earlier, I want to keep this post fairly simple and I am using the example of an environment where every VM will have the same amount of shared. I have also limited the amount of VMs and hosts in the diagrams. Those of you who attended VMworld session TA8233 (Ajay and Chethan) will recognize these diagrams, I recreated and slightly modified them.

The first diagram shows three virtual machines. VM001 and VM002 are hosted on ESX01 and VM003 is hosted on ESX02. Each VM has disk shares set to a value of 1000. As Storage I/O Control is disabled there is no mechanism to regulate the I/O on a datastore level. As shown in the bottom by the Storage Array Queue in this case VM003 ends up getting more resources than VM001 and VM002 while all of them from a shares perspective were entitled to the exact same amount of resources. Please note that both Device Queue Depth’s are 32, which is the key to Storage I/O Control but I will explain that after the next diagram.

As stated without SIOC there is nothing that regulates the I/O on a datastore level. The next diagram shows the same scenario but with SIOC enabled.

After SIOC has been enabled it will start monitoring the datastore. If the specified latency threshold has been reached (Default: Average I/O Latency of 30MS) for the datastore SIOC will be triggered to take action and to resolve this possible imbalance. SIOC will then limit the amount of I/Os a host can issue. It does this by throttling the host device queue which is shown in the diagram and labeled as “Device Queue Depth”. As can be seen the queue depth of ESX02 is decreased to 16. Note that SIOC will not go below a device queue depth of 4.

Before it will limit the host it will of course need to know what to limit it to. The “datastore-wide disk scheduler” will sum up the disk shares for each of the VMDKs. In the case of ESX01 that is 2000 and in the case of ESX02 it is 1000. Next the “datastore-wide disk scheduler” will calculate the I/O slot entitlement based on the the host level shares and it will throttle the queue. Now I can hear you think what about the VM will it be throttled at all? Well the VM is controlled by the Host Local Scheduler (also sometimes referred to as SFQ), and resources on a per VM level will be divided by the the Host Local Scheduler based on the VM level shares.

I guess to conclude all there is left to say is: Enable SIOC and benefit from its fairness mechanism…. You can’t afford a single VM flooding your array. SIOC is the foundation of your (virtual) storage architecture, use it!

ref:
PARDA whitepaper
storage i/o control whitepaper
vmworld storage drs session
vmworld storage i/o control session