VAAI support in vSphere Standard and up as of 6.0!

After some internal discussions over the last months it was decided to move VAAI (vSphere APIs for Array Integration) and Multi-Pathing down to vSphere Standard as of 6.0. Main reason for this was that Virtual Volumes, by many considered as the natural evolution of VAAI, is also part of vSphere Standard. So if you have vSphere Standard and a VAAI capable array and looking to move to 6.0, make sure to check the configuration of your hosts and use this great functionality! Note that VAAI did indeed already work in lower editions, but from a licensing point of view you weren’t entitled to it… I guess many folks never really looked at enabling / disabling it explicitly, but for those who did… now you can use it. More details on what is included with which license can be found here: http://www.vmware.com/au/products/vsphere/compare.html

VAAI support in vSphere Standard

This host currently has no network management redundancy

Bumped in to this a billion times by now, and I wouldn’t recommend applying this in production but for your lab when you need to take clean screenshots it works great. I’ve mentioned this setting before but as it was part of a larger article it doesn’t stand out when searching so I figured I would dedicate a short and simple article to it. Here is what you will need to do if you see the following message in the vSphere Web Client: this host currently has no network management redundancy.

  • Go to your Cluster object
  • Go to Settings
  • Go to “vSphere HA”
  • Click “Edit”
  • Add an advanced setting called “das.ignoreRedundantNetWarning”
  • Set the advanced setting to “true”
  • On each host right click and select “reconfigure for vSphere HA”

This is what it should look like in the UI:
This host currently has no network management redundancy

You can also do this in PowerCLI by the way, note that “Stretched-Bluefin-Frimley” is the name of my cluster.

New-AdvancedSetting -Entity Stretched-Bluefin-Frimley -type ClusterHA -Name "das.ignoreRedundantNetWarning" -Value "true" -force

 

Deploy VCSA 6.0 firstboot error

I was doing some tests in my lab and while deploying a new VCSA 6.0 I received an error that firstboot was unsuccessful. Not really a great error message if you ask me but okay. I had already validated DNS twice before I got started, but I checked it again just in case… DNS was all good, what else could it be? Figured NTP could be another problem and my friend William Lam confirmed that. I checked the host if NTP was configured and it was not for some reason. So I configured NTP on my ESXi hosts which was straight forward, but what about the VCSA I had deployed? Also not too complicated, I logged in via SSH and did the following:

  • ntp.get
    Will show “Status: Down”
  • ntp.server.add –servers 10.17.0.1
    This configures VCSA to fetch the time from ntp server to 10.17.0.1
  • timesync.set –mode NTP
    Make sure that the time sync is set to ntp
  • ntp.get
    Should show “Status: Up”

That should do it… By the way, you can simply check “resolv.conf” for DNS to see how it is configured today, also look at “hosts” for the host name etc.

Requirements Driven Data Center

I’ve been thinking about the term Software Defined Data Center for a while now. It is a great term “software defined” but it seems that many agree that things have been defined by software for a long time now. When talking about SDDC with customers it is typically referred to as the ability to abstract, pool and automate all aspects of an infrastructure. To me these are very important factors, but not the most important, well at least not for me as they don’t necessarily speak to the agility and flexibility a solution like this should bring. But what is an even more important aspect?

I’ve had some time to think about this lately and to me what is truly important is the ability to define requirements for a service and have the infrastructure cater to those needs. I know this sounds really fluffy, but ultimately the service doesn’t care what is running underneath, and typically the business owner and the application owners also don’t when all requirements can be met. Key is delivering a service with consistency and predictability. Even more important consistency and repeatability increase availability and predictability, and nothing is more important for the user experience.

When it comes to user experience and providing a positive one of course it is key to figure out first what you want and what you need first. Typically this information comes from your business partner and/or application owner. When you know what those requirements are then they can be translated to technical specifications and ultimately drive where the workloads end up. A good example of how this works or would look like is VMware Virtual Volumes. VVols is essentially requirements driven placement of workloads. Not just placement, but of course also all other aspects when it comes to satisfying requirements that determine user experience like QoS, availability, recoverability and whatever more is desired for your workload.

With Virtual Volumes placement of a VM (or VMDK) is based on how the policy is constructed and what is defined in it. The Storage Policy Based  Management engine gives you the flexibility to define policies anyway you like, of course it is limited to what your storage system is capable of delivering but from the vSphere platform point of view you can do what you like and make many different variations. If you specify that the object needs to thin provisioned, or has a specific IO profile, or needs to be deduplicated or… then those requirements are passed down to the storage system and the system makes its placement decisions based on that and will ensure that the demands can be met. Of course as stated earlier also requirements like QoS and availability are passed down. This could be things like latency, IOPS and how many copies of an object are needed (number of 9s resiliency). On top of that, when requirements change or when for whatever reason SLA is breached then in a requirements driven environment the infrastructure will assess and remediate to ensure requirements are met.

That is what a requirements driven solution should provide: agility, availability, consistency and predictability. Ultimately your full data center should be controlled through policies and defined by requirements. If you look at what VMware offers today, then it is fair to say that we are closing in on reaching this ideal fast.

vMSC for 6.0, any new recommendations?

I am currently updating the vSphere Metro Storage Cluster best practices white paper, over the last two weeks I received various questions if there were any new recommendation for vMSC for 6.0. I have summarized the recommendations below for your convenience, the white paper is being reviewed and I am updating screenshots, hopefully will be done soon.

  • In order to allow vSphere HA to respond to both an APD and a PDL condition vSphere HA needs to be configured in a specific way. VMware recommends enabling VM Component Protection. After the creation of the cluster VM Component Protection needs to be enabled.
  • The configuration for PDL is basic. In the “Failure conditions and VM response” section it can be configured what the response should be after a PDL condition is detected. VMware recommends setting this to “Power off and restart VMs”. When this condition is detected a VM will be restarted instantly on a healthy host within the vSphere HA cluster.
  • When an APD condition is detected a timer is started. After 140 seconds the APD condition is officially declared and the device is marked as APD time out. When the 140 seconds has passed HA will start counting, the default HA time out is 3 minutes. When the 3 minutes has passed HA will restart the impacted virtual machines, but you can configure VMCP to respond differently if desired. VMware recommends configuring it to “Power off and restart VMs (conservative)”.
    • Conservative refers to the likelihood of HA being able to restart VMs. When set to “conservative” HA will only restart the VM that is impacted by the APD if it knows another host can restart it. In the case of “aggressive” HA will try to restart the VM even if it doesn’t know the state of the other hosts, which could lead to a situation where your VM is not restarted as there is no host that has access to the datastore the VM is located on.
  • It is also good to know that if the APD is lifted and access to the storage is restored before the time-out has passed that HA will not unnecessarily restart the virtual machine, unless you explicitly configure it do so. If a response is desired even when the environment has recovered from the APD condition then “Response for APD recovery after APD timeout” should be configured to “Reset VMs”. VMware recommends leaving this setting disabled.