• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

Server

Guess the VMworld band aka #VMworldBand

Duncan Epping · Jul 19, 2013 ·

It is that time of the year again, guess the VMworld band! So lets join the fun, the “first” correct guess will be rewarded with the opportunity to meet the band that you choose before the VMworld party on Wednesday August 28th at AT&T Park aka home of the Giants!

Founded in a previous VMworld host city, the lead singer of this band almost joined the FBI #VMworldband http://t.co/dthTFdgijd

— VMware Explore (@VMwareExplore) July 17, 2013

Our second #VMworld band has appeared on CSI, been parodied on SNL and has 6 studio albums #VMworldband http://t.co/QQXeCgbaTx

— VMware Explore (@VMwareExplore) July 17, 2013

https://twitter.com/DuncanYB/status/358229465190502401

https://twitter.com/DuncanYB/status/358232140699271169

Want to attend PuppetConf 22/23 August in San Francisco for free?

Duncan Epping · Jul 18, 2013 ·

Do you want to attend PuppetConf 22/23 August in San Francisco for free? The Puppet Labs folks were so kind to provide me with 2 registration codes, if you want to attend their conference in San Francisco this year just drop a comment with why you love Puppet and I will randomly pick two winners.

I hoped I would be able to attend myself but unfortunately it clashes with another conference for me. I believe they will have over 70 sessions and they have labs. So a great way to get to learn about Puppet and meet like minded people! More info can be found on PuppetConf.com. Anyway, if you like to go but haven’t secured your ticket yet, drop a comment here and maybe you will win one of the two tickets I have available. Good luck.

For those who aren’t lucky enough to win a ticket… I have a discount code which gives you $ 200,- off, just use “DuncanEpping” when you register!

Deadline: July 25th 2013

Testing your infrastructure!

Duncan Epping · Jul 16, 2013 ·

Last week I was helping someone on the VMTN community forums. They were hitting what appeared to be strange HA behavior. After some standard questions this person told me that all VMs were powered down after a network outage. Sounds like a familiar problem? Yes I can hear most of you think: Isolation response set to “power off” and no proper network redundancy?

Well yes and no. They had the isolation response indeed configured to “power off” all VMs when the host is isolated. They did however have proper network redundancy, so how on earth did this happen? With 2 physical NICs and 2 physical switches and only 1 being impacted this should not have happened right?!?

Wrong! In this case the fail-over from a “vmkernel” perspective worked fine. The first “path” went down, so the second was used for this management vmkernel. All VMs were up and running until this point, and they remained running until… network connection was restored and the vmnic returned to the original physical NIC. Meaning that the mac address that showed up on port 1 popped up on port 2 and then went back to 1 again. The switch was not impressed and went through the spanning tree process and traffic was blocked instantly as a result of it. Now when traffic is blocked bad things can happen, especially when you configure HA to “power off” VMs. Basically what caused this issue to happen was the fact the spanning tree was not set to the recommended “port fast”, more details here.

I knew instantly that this was the reason for this problem, not because I know stuff about HA but because I had seen this many times in the past while testing environments I configured and designed. Not just testing after implementing a new infrastructure, but also testing after making changes to an infrastructure or introducing a new version / feature. I guess this kind of comes back to the “disaster” scenario as well, test it if you want to know if it works as expected. Just a simple example, I want to introduce QoS for my vMotion network and make changes to my physical network. Now what? How do I test these changes? How many times do I run through my test scenarios? What kind of “problems” do I introduce during my tests?

So I guess by now some might wonder why on earth I brought this up… well the problem above could have been prevented by simply testing the infrastructure when implemented and after changes have been introduced, and maybe even on a regular basis. If HA / Networking was tested properly, those VMs would not have been powered off…

Network port diagram for vSphere 5.x

Duncan Epping · Jul 10, 2013 ·

Somehow I missed this one, but as I reviewed the diagram and helped selecting the right format I figured I would still share it. This Network port diagram for vSphere 5.x is one awesome resource for those folks who want to get to the bottom of how components interact with each other.

I don’t think there is a lot more I can say about it, those who love diagrams and like to know the details make sure to hit: http://kb.vmware.com/kb/2054806

Prepare for the worst…

Duncan Epping · Jul 9, 2013 ·

Over the last couple of months I have been contacted by various folks who thought long and hard about their Business Continuity and Disaster Recovery design. They bought a great backup solution which integrated with vSphere and they replicated their SAN to a second site. In their mind they were definitely prepared for the worst… I agree on that to a certain extend, their design was well thought-out indeed and carefully covered all aspects there are for BC/DR. From an operational perspective though things look different, first significant failure occurred and then they couldn’t fully recall the steps to recovery. That is what my tweet below was inspired by…

https://twitter.com/DuncanYB/status/352832506552262658

Funny thing is that this tweet also triggered some responses like “Go SRM” or “that is where Zerto comes in”, and again I agree that an orchestration layer should be part of your DR plan but when talking about BC/DR I think it is more about the strategy, the processes that will need to be triggered in a particular scenario. What is involved typically? I am not going in to the business specific side of things even and all the politics that comes along with it. But instead look at you process, take one step back and ask yourself: what if this part of the process fails?

One of the things Lee and I will mention multiple times during our VMworld session on Stretched Clusters is: Test It! Not once, not twice but various times and be prepared for the worst to happen. Yes, none of us likes to test the most destructive and disruptive failure scenario, but you bet when something goes wrong it will be that scenario you did not test. Although I think for instance SRM is a rock solid solution, what if for whatever reason your recovery plan does not work as planned? While testing make sure you document your recovery plan, even though you might have a bunch of scripts laying around who knows if they will work as expected? Some scripts (or SRM type of solutions) have a dependency on certain components / services to be up, what if they are not? Besides your BC/DR strategy of course a lot of procedures will need to be documented. What kind of procedures are we talking about? Just a couple of random ones I would suggest you document while testing your scenarios at a bare minimum:

  • Order in which to power-on all physical components in your Datacenter (and power-off)
  • Location of infrastructure related services (AD, DNS, vCenter, Syslogging, NTP, etc), when virtual and on SAN document the datastore for instance
  • Order in which to power-on all infrastructure related services
  • Order in which to power-on all remaining virtual machines /vApps
  • How to get your vCenter Server up and running from the commandline (this will make it a lot easier to get the rest of your VMs up and running)
  • How to power-on virtual machines from the commandline after a failure
  • How to re-register a virtual machine from the commandline after a failure
  • How to mount a LUN from the commandline after a failover
  • How to resignature  a LUN from the commandline after a failover
  • How to restore a full datastore
  • How to restore a virtual machine
  • etc etc

Now I can hear some of you think why would I document that, I know all of that stuff inside out? Well what if you are on a holiday or at home sick? Just imagine your junior colleague is by himself when disaster strikes, does he know in which order the services of that business critical multi tier application need to start?

When you do document these, make sure to have a (physical) copy available outside of your infrastructure, believe me … you wouldn’t be the first finding yourself locked out of a system and trying to find the documents to recover and then realizing they are stored on the system they need to recover. Those who have ever been in a total datacenter outage know what I am talking about. I have been in the situation where a full datacenter went down due to a power-outage, believe me when I say that bringing up over 300 VMs and all associated physical components without documentation was a living nightmare.

Although you probably get it by now… it is not the tool but a proper strategy, procedures and documentation are the key to success! Just do it.

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 122
  • Page 123
  • Page 124
  • Page 125
  • Page 126
  • Interim pages omitted …
  • Page 336
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Also visit!

For the Dutch-speaking audience, make sure to visit RunNerd.nl to follow my running adventure, read shoe/gear/race reviews, and more!

Do you like Hardcore-Punk music? Follow my Spotify Playlist!

Do you like 80s music? I got you covered!

Copyright Yellow-Bricks.com © 2026 · Log in