Server

vMSC for 6.0, any new recommendations?

Duncan Epping · Apr 15, 2015 ·

I am currently updating the vSphere Metro Storage Cluster best practices white paper, over the last two weeks I received various questions if there were any new recommendation for vMSC for 6.0. I have summarized the recommendations below for your convenience, the white paper is being reviewed and I am updating screenshots, hopefully will be done soon.

In order to allow vSphere HA to respond to both an APD and a PDL condition vSphere HA needs to be configured in a specific way. VMware recommends enabling VM Component Protection. After the creation of the cluster VM Component Protection needs to be enabled.
The configuration for PDL is basic. In the “Failure conditions and VM response” section it can be configured what the response should be after a PDL condition is detected. VMware recommends setting this to “Power off and restart VMs”. When this condition is detected a VM will be restarted instantly on a healthy host within the vSphere HA cluster.
When an APD condition is detected a timer is started. After 140 seconds the APD condition is officially declared and the device is marked as APD time out. When the 140 seconds has passed HA will start counting, the default HA time out is 3 minutes. When the 3 minutes has passed HA will restart the impacted virtual machines, but you can configure VMCP to respond differently if desired. VMware recommends configuring it to “Power off and restart VMs (conservative)”.
- Conservative refers to the likelihood of HA being able to restart VMs. When set to “conservative” HA will only restart the VM that is impacted by the APD if it knows another host can restart it. In the case of “aggressive” HA will try to restart the VM even if it doesn’t know the state of the other hosts, which could lead to a situation where your VM is not restarted as there is no host that has access to the datastore the VM is located on.
It is also good to know that if the APD is lifted and access to the storage is restored before the time-out has passed that HA will not unnecessarily restart the virtual machine, unless you explicitly configure it do so. If a response is desired even when the environment has recovered from the APD condition then “Response for APD recovery after APD timeout” should be configured to “Reset VMs”. VMware recommends leaving this setting disabled.

vCenter Server Appliance watchdog

Duncan Epping · Apr 9, 2015 ·

I was reviewing a paper on vCenter availability for 6.0 and it listed a watchdog service which monitors “VPXD” (the vCenter Server service) on the vCenter Server Appliance. I had seen the service before but never really looked in to it. With 5.5 the watchdog service (/usr/bin/vmware-watchdog) was only used to monitor vpxd and tomcat but in 6.0 the watchdog service seems to monitor some more services. I did a “grep” of vmware-watchdog within the 6.0 appliance and the below is the outcome, it shows the services which are being watched:

ps -ef | grep vmware-watchdog
 root 7398 1 0 Mar27 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s rhttpproxy -u 30 -q 5 /usr/sbin/rhttpproxy -r /etc/vmware-rhttpproxy/config.xml -d /etc/vmware-rhttpproxy
 root 11187 1 0 Mar27 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s vws -u 30 -q 5 /usr/lib/vmware-vws/bin/vws.sh
 root 12041 1 0 Mar27 ? 00:09:58 /bin/sh /usr/bin/vmware-watchdog -s syslog -u 30 -q 5 -b /var/run/rsyslogd.pid /sbin/rsyslogd -c 5 -f /etc/vmware-rsyslog.conf
 root 12520 1 0 Mar27 ? 00:09:56 /bin/sh /usr/bin/vmware-watchdog -b /storage/db/vpostgres/postmaster.pid -u 300 -q 2 -s vmware-vpostgres su -s /bin/bash vpostgres
 root 29201 1 0 Mar27 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -a -s vpxd -u 3600 -q 2 /usr/sbin/vpxd

As you can see vmware-watchdog is ran with a couple of parameters, which seem to different for some services. As it is the most important service, lets have a look at VPXD. It shows the following parameters:

-a
-s vpxd
-u 3600
-q 2

What the above parameters result in is the following: the service, named vpxd (-s vpxd), is monitored for failures and will be restarted twice (-q 2) at most. If it fails for a third time within 3600 seconds/one hour (-u 3600) the guest OS will be restarted (-a).

Note that the guest OS will only be restarted when vpxd has failed multiple times. With other services this is not the case as the “grep” above shows. There are some more watchdog related processes, but I am not going to discuss those at this point as the white paper which is being worked on by Technical Marketing will discuss these in a bit more depth and should be the authoritative resource.

** Please do not make changes to ANY of the above parameters as this is totally unsupported, I am mere showing the details for educational purposes and to provide a better insight around vCenter availability when it comes to the VCSA. **

Share your Orchestrator workflow through FlowGrab

Duncan Epping · Apr 7, 2015 ·

A while back when attending/presenting at some VMUGs I stumbled in to this company called FlowGrab. They are fairly new and pitched their solution to me and I must say it sounded very interesting. I guess if you want to dumb it down you can label it as a code repository solution for vRO/vCO. FlowGrab describes itself as follows:

FlowGrab provides a versionable code repository and collaboration functionalities for workflow developers and consumers and allows working in your everyday environment by using FlowGrab Plug-in for vRO which connects your vRO directly to FlowGrab.

I think what is important here is the “collaboration” piece. FlowGrab is very focused on creating a community. We’ve had that community in the PowerCLI space for a long time, and people have been sharing scripts forever, and now that can also easily be done through a central location for vRO. The great thing about FlowGrab is that they have a community edition which can be used for free, it has less functionality than the paid version but I must say that the available functionality should be sufficient for most!

If you have an interest in vRO and like to share your work than now may be the right time. FlowGrab just announced a great contest where you can win an Apple Watch simply by uploading new workflows to their repository.

There’s a great Brownbag session on FlowGrab which I recommend watching if you are interested and want to figure out how this all works and how you can contribute:

Implementing a Hybrid Cloud Strategy white paper

Duncan Epping · Apr 7, 2015 ·

Last week I already posted this up on the VMware Office of CTO blog, and I figured I would share it to my regular readers here as well. A couple of months ago I stumbled across a great diagram which was developed by Hany Michael, (Consulting Architect, VMware PSO) who is part of the VMware CTO Ambassador program. The CTO Ambassadors are members of a small group of our most experienced and talented customer-facing, individual contributor technologists. The diagram explained an interesting architecture– namely hybrid cloud. After a brief discussion with Hany I decided to reach out to David Hill (Senior Technical Marketing Architect, vCloud Air) and asked if he was interested in getting this work published. Needless to say, David was very interested. Together we worked on expanding on the great content that Hany had already developed. Today, the result is published.

The architecture described in this white paper is based on a successful real-world customer implementation. Besides explaining the steps required it also explains the use case for this particular customer. We hope that you find the paper useful and that it will help implementing or positioning a hybrid cloud strategy.

Implementing a Hybrid Cloud Strategy

IT has long debated the merits of public and private cloud. Public clouds allow organizations to gain capacity and scale services on-demand, while private clouds allow companies to maintain control and visibility of business-critical applications. But there is one cloud model that stands apart: hybrid cloud. Hybrid clouds provide the best of both worlds: secure, on-demand access to IT resources with the flexibility to move workloads onsite or offsite to meet specific needs. It’s the security you need in your private cloud with the scalability and reach of your public cloud. Hybrid cloud implementations should be versatile, easy to use, and interoperable with your onsite VMware vSphere® environment. Interoperability allows the same people to manage both onsite and offsite resources while leveraging existing processes and tools and lowering the operational expenditure and complexity…

VMware EMEA Online Technology Forum 15th of April

Duncan Epping · Apr 2, 2015 ·

On the 15th of April there is an awesome online event planned called the “Online Technology Forum“. During this day you will hear all about what is new with vSphere 6.0. What can you expect:

Sign up now to this free online event where you will be able to engage in a live Q&A with VMware technical experts, including Joe Baguley, CTO, EMEA; Duncan Epping, Chief Technologist; and Mike Laverick, Senior Cloud Infrastructure Evangelist.

Join your peers at technology updates and a number of self-paced hands-on labs on the technologies driving IT efficiency and business advantage:

vSphere 6 – the Foundation for the Hybrid Cloud

Virtual SAN 6 and Virtual Volumes – What’s New?

Introducing VMware Integrated OpenStack

Enabling Micro-Segmentation with NSX

Introducing Hyper-Convergence with EVO:RAIL

App Volumes – Revolutionising Application Delivery

Full agenda can be found here. Note that these sessions are recorded, HOWEVER, there are live Q&As (one with Joe Baguley, Mike Laverick and I after the first two sessions. Another one at the end of the event with Joe Baguley, Mike Laverick, Richard Munro, Spencer Pitts, Jeremy Van Doorn, Yuval Tenenbaum and I.) All speakers (and other experts) will be handling questions via the chat windows though out the sessions, so make sure to register and dial in on the 15th of April.