vSphere

Requirements Driven Data Center

Duncan Epping · Apr 22, 2015 ·

I’ve been thinking about the term Software Defined Data Center for a while now. It is a great term “software defined” but it seems that many agree that things have been defined by software for a long time now. When talking about SDDC with customers it is typically referred to as the ability to abstract, pool and automate all aspects of an infrastructure. To me these are very important factors, but not the most important, well at least not for me as they don’t necessarily speak to the agility and flexibility a solution like this should bring. But what is an even more important aspect?

I’ve had some time to think about this lately and to me what is truly important is the ability to define requirements for a service and have the infrastructure cater to those needs. I know this sounds really fluffy, but ultimately the service doesn’t care what is running underneath, and typically the business owner and the application owners also don’t when all requirements can be met. Key is delivering a service with consistency and predictability. Even more important consistency and repeatability increase availability and predictability, and nothing is more important for the user experience.

When it comes to user experience and providing a positive one of course it is key to figure out first what you want and what you need first. Typically this information comes from your business partner and/or application owner. When you know what those requirements are then they can be translated to technical specifications and ultimately drive where the workloads end up. A good example of how this works or would look like is VMware Virtual Volumes. VVols is essentially requirements driven placement of workloads. Not just placement, but of course also all other aspects when it comes to satisfying requirements that determine user experience like QoS, availability, recoverability and whatever more is desired for your workload.

With Virtual Volumes placement of a VM (or VMDK) is based on how the policy is constructed and what is defined in it. The Storage Policy Based Management engine gives you the flexibility to define policies anyway you like, of course it is limited to what your storage system is capable of delivering but from the vSphere platform point of view you can do what you like and make many different variations. If you specify that the object needs to thin provisioned, or has a specific IO profile, or needs to be deduplicated or… then those requirements are passed down to the storage system and the system makes its placement decisions based on that and will ensure that the demands can be met. Of course as stated earlier also requirements like QoS and availability are passed down. This could be things like latency, IOPS and how many copies of an object are needed (number of 9s resiliency). On top of that, when requirements change or when for whatever reason SLA is breached then in a requirements driven environment the infrastructure will assess and remediate to ensure requirements are met.

That is what a requirements driven solution should provide: agility, availability, consistency and predictability. Ultimately your full data center should be controlled through policies and defined by requirements. If you look at what VMware offers today, then it is fair to say that we are closing in on reaching this ideal fast.

vMSC for 6.0, any new recommendations?

Duncan Epping · Apr 15, 2015 ·

I am currently updating the vSphere Metro Storage Cluster best practices white paper, over the last two weeks I received various questions if there were any new recommendation for vMSC for 6.0. I have summarized the recommendations below for your convenience, the white paper is being reviewed and I am updating screenshots, hopefully will be done soon.

In order to allow vSphere HA to respond to both an APD and a PDL condition vSphere HA needs to be configured in a specific way. VMware recommends enabling VM Component Protection. After the creation of the cluster VM Component Protection needs to be enabled.
The configuration for PDL is basic. In the “Failure conditions and VM response” section it can be configured what the response should be after a PDL condition is detected. VMware recommends setting this to “Power off and restart VMs”. When this condition is detected a VM will be restarted instantly on a healthy host within the vSphere HA cluster.
When an APD condition is detected a timer is started. After 140 seconds the APD condition is officially declared and the device is marked as APD time out. When the 140 seconds has passed HA will start counting, the default HA time out is 3 minutes. When the 3 minutes has passed HA will restart the impacted virtual machines, but you can configure VMCP to respond differently if desired. VMware recommends configuring it to “Power off and restart VMs (conservative)”.
- Conservative refers to the likelihood of HA being able to restart VMs. When set to “conservative” HA will only restart the VM that is impacted by the APD if it knows another host can restart it. In the case of “aggressive” HA will try to restart the VM even if it doesn’t know the state of the other hosts, which could lead to a situation where your VM is not restarted as there is no host that has access to the datastore the VM is located on.
It is also good to know that if the APD is lifted and access to the storage is restored before the time-out has passed that HA will not unnecessarily restart the virtual machine, unless you explicitly configure it do so. If a response is desired even when the environment has recovered from the APD condition then “Response for APD recovery after APD timeout” should be configured to “Reset VMs”. VMware recommends leaving this setting disabled.

ForceAffinePowerOn what is it?

Duncan Epping · Apr 1, 2015 ·

I’ve seen a lot of confusion around the ForceAffinePowerOn setting, and even the VMware documentation is incorrect around what this feature is / does. First and foremost: ForceAffinePowerOn is an advanced DRS setting (Yes I filed a doc bug for it). I’ve seen many people stating it is an HA setting, but it is not. You need to configure this in the advanced settings section of your DRS configuration.

Secondly, ForceAffinePowerOn can be used to ensure VM to VM affinity rules are respected when powering on a VM. ForceAffinePowerOn has absolutely nothing to do with VM to VM anti-affinity rules, it only applies to “affinity”.

Lets be crystal clear:

When ForceAffinePowerOn is set to 0, it means that VM to VM affinity can be dropped if necessary to power on a VM.
When ForceAffinePowerOn is set to 1, it means that VM to VM affinity should not be dropped and power-on should fail if the rule cannot be respected.

I hope that helps!

DRS rules still active when DRS disabled?

Duncan Epping · Mar 30, 2015 ·

I just received a question around DRS rules and why they are still active when DRS is disabled. I was under the impression this was something I already blogged about, but I cannot find it. I know some others did, but they reported this behaviour as a bug… which it isn’t actually.

Below is a screenshot of the VM/Host Rules screen for vSphere 6.0, it allows you to create rules for clusters… Now note I said “clusters” not DRS in specific. In 6.0 the wording in the UI has changed to align with the functionality vSphere offers. These are not DRS rules, but rather cluster rules. Whether you use HA or DRS, these rules can be used when either of the two is configured.

Note that not all types of rules will automatically be respected by vSphere HA. One thing which you can now also do in the UI is specify if HA should ignore or respect rules, very useful if you ask me and makes life a bit easier:

What does support for vMotion with active/active (a)sync mean?

Duncan Epping · Mar 23, 2015 ·

Having seen so many cool features being released over the last 10 years by VMware you sometimes wonder what more they can do. It is amazing to see what level of integration we’ve see between the different datacenter components. Many of you have seen the announcements around Long Distance vMotion support by now.

When I saw this slide something stood out to me instantly and that is this part:

Replication Support
- Active/Active only
  - Synchronous
  - Asynchronous

What does this mean? Well first of all “active/active” refers to “stretched storage” aka vSphere Metro Storage Cluster. So when it comes to long distance vMotion some changes have been introduced for sync stretched storage. (** note that “active/active” storage is not required for long distance vMotion**)With stretched storage writes can come from both sides at any time to a volume and will be replicated synchronously. Some optimizations have been done to the vMotion process to avoid writes during switchover to avoid any delay during the process as a result of replication traffic.

For active/active asyncronous the story is a bit different. Here again we are talking about “stretched storage” but in this case the asynchronous flavour. One important aspect which was not mentioned in the deck is that async requires Virtual Volumes. Now, at the time of writing there is no vendor yet who has a VVol capable solution that offers active/active async. But more important, is this process any different than the sync process? Yes it is!

During the migration of a virtual machine which uses virtual volumes, with an “active/active async” configuration backing it, the array is informed that a migration of the virtual machine is taking place and is requested to switch from asynchronous replication to synchronous. This to ensure that the destination is in-sync with the source when the VM is switched over from side A to side B. Besides switching from async to sync when the migration has completed the array is informed that the migration has completed. This allows the array to switch the “bias” of the VM for instance, especially in a stretched environment this is important to ensure availability.

I can’t wait for the first vendor to announce support for this awesome feature!