vCenter Server Appliance watchdog

I was reviewing a paper on vCenter availability for 6.0 and it listed a watchdog service which monitors “VPXD” (the vCenter Server service) on the vCenter Server Appliance. I had seen the service before but never really looked in to it. With 5.5 the watchdog service (/usr/bin/vmware-watchdog) was only used to monitor vpxd and tomcat but in 6.0 the watchdog service seems to monitor some more services. I did a “grep” of vmware-watchdog within the 6.0 appliance and the below is the outcome, it shows the services which are being watched:

ps -ef | grep vmware-watchdog
 root 7398 1 0 Mar27 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s rhttpproxy -u 30 -q 5 /usr/sbin/rhttpproxy -r /etc/vmware-rhttpproxy/config.xml -d /etc/vmware-rhttpproxy
 root 11187 1 0 Mar27 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s vws -u 30 -q 5 /usr/lib/vmware-vws/bin/vws.sh
 root 12041 1 0 Mar27 ? 00:09:58 /bin/sh /usr/bin/vmware-watchdog -s syslog -u 30 -q 5 -b /var/run/rsyslogd.pid /sbin/rsyslogd -c 5 -f /etc/vmware-rsyslog.conf
 root 12520 1 0 Mar27 ? 00:09:56 /bin/sh /usr/bin/vmware-watchdog -b /storage/db/vpostgres/postmaster.pid -u 300 -q 2 -s vmware-vpostgres su -s /bin/bash vpostgres
 root 29201 1 0 Mar27 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -a -s vpxd -u 3600 -q 2 /usr/sbin/vpxd

As you can see vmware-watchdog is ran with a couple of parameters, which seem to different for some services. As it is the most important service, lets have a look at VPXD. It shows the following parameters:

-a
-s vpxd
-u 3600
-q 2

What the above parameters result in is the following: the service, named vpxd (-s vpxd), is monitored for failures and will be restarted twice (-q 2) at most. If it fails for a third time within 3600 seconds/one hour (-u 3600) the guest OS will be restarted (-a).

Note that the guest OS will only be restarted when vpxd has failed multiple times. With other services this is not the case as the “grep” above shows. There are some more watchdog related processes, but I am not going to discuss those at this point as the white paper which is being worked on by Technical Marketing will discuss these in a bit more depth and should be the authoritative resource.

** Please do not make changes to ANY of the above parameters as this is totally unsupported, I am mere showing the details for educational purposes and to provide a better insight around vCenter availability when it comes to the VCSA. **

Implementing a Hybrid Cloud Strategy white paper

Last week I already posted this up on the VMware Office of CTO blog, and I figured I would share it to my regular readers here as well. A couple of months ago I stumbled across a great diagram which was developed by Hany Michael, (Consulting Architect, VMware PSO) who is part of the VMware CTO Ambassador program. The CTO Ambassadors are members of a small group of our most experienced and talented customer-facing, individual contributor technologists. The diagram explained an interesting architecture– namely hybrid cloud. After a brief discussion with Hany I decided to reach out to David Hill (Senior Technical Marketing Architect, vCloud Air) and asked if he was interested in getting this work published. Needless to say, David was very interested. Together we worked on expanding on the great content that Hany had already developed. Today, the result is published.

The architecture described in this white paper is based on a successful real-world customer implementation. Besides explaining the steps required it also explains the use case for this particular customer. We hope that you find the paper useful and that it will help implementing or positioning a hybrid cloud strategy.

Implementing a Hybrid Cloud Strategy

IT has long debated the merits of public and private cloud. Public clouds allow organizations to gain capacity and scale services on-demand, while private clouds allow companies to maintain control and visibility of business-critical applications. But there is one cloud model that stands apart: hybrid cloud. Hybrid clouds provide the best of both worlds: secure, on-demand access to IT resources with the flexibility to move workloads onsite or offsite to meet specific needs. It’s the security you need in your private cloud with the scalability and reach of your public cloud. Hybrid cloud implementations should be versatile, easy to use, and interoperable with your onsite VMware vSphere® environment. Interoperability allows the same people to manage both onsite and offsite resources while leveraging existing processes and tools and lowering the operational expenditure and complexity…

Cloud native inhabitants

When ever I hear the term “cloud native” I think about my kids. It may sounds a bit strange as many of you will think about “apps” probably first when “cloud native” is dropped. Cloud native to me is not about an application, but about a problem which has been solved and a solution which is offered in a specific way. A week or so ago someone made a comment on twitter around how “Generation X” will adopt cloud faster than the current generation of IT admins…

Some even say that “Generation X” is more tech savvy, just look at how a 3 year old handles an iPad, they are growing up with technology. To be blunt… that has nothing to do with the technical skills of the 3 year old kid, but is more about the intuitive user interface that took years to develop. It comes natural to them as that is what they are exposed to from day 1. They see there mom or dad swiping a screen daily, mimicking them doesn’t require deep technical understanding of how an iPad works, they move their finger from right to left… but I digress.

My kids don’t know what a video tape is and even a CD to play music is so 2008, which for them is a lifetime, my kids are cloud native inhabitants. They use Netflix to watch TV, they use Spotify to listen to music, they use Facebook to communicate with friends, they use Youtube / Gmail and many other services running somewhere in the cloud. They are native inhabitants of the cloud. They won’t adopt cloud technology faster, for them it is a natural choice as it is what they are exposed to day in day out.

Startup intro: Rubrik. Backup and recovery redefined

Some of you may have seen the article by The Register last week about this new startup called Rubrik. Rubrik just announced what they are working on and announced their funding at the same time:

Rubrik, Inc. today announced that it has received $10 million in Series A funding and launched its Early Access Program for the Rubrik Converged Data Management platform. Rubrik offers live data access for recovery and application development by fusing enterprise data management with web-scale IT, and eliminating backup software. This marks the end of a decade-long innovation drought in backup and recovery, the backbone of IT. Within minutes, businesses can manage the explosion of data across private and public clouds.

The Register made a comment, which I want to briefly touch on. They mentioned it was odd that a venture capitalist is now the CEO for a startup and how it normally is the person with the technical vision who heads up the company. I can’t agree more with The Register. For those who don’t know Rubrik and their CEO, the choice for Bipul Sinha may come as a surprise it may seem a bit odd. Then there are some who may say that it is a logical choice considering they are funded by Lightspeed… Truth of the matter is that Bipul Sinha is the person with the technical vision. I had the pleasure to see his vision evolve from a couple of scribbles on the whiteboard to what Rubrik is right now.

I still recall having a conversation with Bipul talking about the state of the “backup industry”, and I recall we agreed the different components of a datacenter had evolved over time but that the backup industry was still very much stuck in the old world. (We agreed backup and recovery solutions suck in most cases…) Back when we had this discussion there was nothing yet, no team, no name, just a vision. Knowing what is coming in the near future and knowing their vision I do think this quote from the press release embraces best what Rubrik is working on and it will do:

Today we are excited to announce the first act in our product journey. We have built a powerful time machine that delivers live data and seamless scale in a hybrid cloud environment. Businesses can now break the shackles of legacy and modernize their data infrastructure, unleashing significant cost savings and management efficiencies.

Of course Rubrik would not be possible without a very strong team of founding members. Arvind Jain, Arvind Nithrakashyap and Soham Mazumdar are probably the strongest co-founders one can wish. The engineering team has deep experience in building distributed systems, such as Google File System, Google Search, YouTube, Facebook Data Infrastructure, Amazon Infrastructure, and Data Domain File System. Expectations just raised a couple of notches right?!

I agree that even the statement above is still a bit fluffy so lets add some more details, what are they working on? Rubrik is working on a solution which combines backup software and a backup storage appliance in to a single solution and initially will target VMware environments. They are building (and I hate using this word) a hyperconverged backup solution and it will scale from 3 to 1000s of nodes. Note that this solution will be up and running in 15 minutes and includes the option to age out data to the public cloud. What impressed me most is that Rubrik can discover your datacenter without any agents, it scales-out in a fully automated fashion and will be capable of deduplicating / compressing data but also offer the ability to mount data instantly. All of this through a slick UI or you can leverage the REST APIs , fully programmable end-to-end.

I just went over “instant mount” quickly, but I want to point out that this is not just for “restoring VMs”. Considering the REST APIs you can also imagine that this would be a perfect solution to enable test/dev environments or running Tier 2/3 workloads. How valuable is it to have instant copies of your production data available and test your new code against production without any interruption to your current environment? To throw a buzzword in there: perfectly fit for a devops world and continuous development.

That is about all I can say for now unfortunately… For those who agree that backup/recovery has not evolved and are interested in a backup solution for tomorrow, there is an early access program and I urge you to sign up to learn more but also help shaping the product! The solution is targeting environments of 200 VMs and upwards, make sure you meet those requirements. Read more here and/or follow them on twitter (or Bipul).

Good luck Rubrik, I am sure this is going to be a great journey!

Get your download engines running, vSphere 6.0 is here!

Yes the day is finally there, vSphere 6.0 / SRM / VSAN (and more) finally available. So where do you find it? Well that is simple… here:

Have fun!