Yellow Bricks

Survey time…

Duncan Epping · Oct 24, 2012 ·

I would like to draw your attention to two surveys. I got asked if I could help promote these and I hope you are willing to set aside a couple of minutes to fill these out.

Autoscaling of Applications in vSphere – As the infrastructure layer matures, the IT problems and bottlenecks are moving closer to the application layer. Autoscaling of application workloads is a feature that is important to customers building private clouds. VMware is uniquely positioned to be able to deliver autoscaling of infrastructure layer using insights from how application are performing. With technologies like Hyperic APM, Orchestration technologies like vCO and DynamicOps VMware already owns key pieces of the puzzle to providing a powerful autoscaling solution. We are looking for feedback on what you feel is important for your applications. If you complete this survey you will have a chance of winning one of the two $50 Amazon gift cards. (Randomly picked!)
http://www.surveymethods.com/EndUser.aspx?D0F49881D3908787D0
Project Virtual Reality Check “State of the VDI and SBC union” survey – With so many VDI and SBC deployments out there, the differences are huge. It is only logical to wonder how these real-world VDI and SBC environments are used and how they are built, especially when you consider how the VDI/SBC market is quickly changing. Since it’s driven as much by innovation as it is by marketing campaigns, there is a clear need to better understand what is really out there.
http://www.brianmadden.com/blogs/jeroenvandekamp/archive/2012/09/26/announcing-project-vrc-s-quot-state-of-the-vdi-and-sbc-union-quot-survey.aspx

The State of vSphere Clustering by @virtualirfan

Duncan Epping · Oct 23, 2012 ·

The state of vSphere clustering
By Irfan Ahmad

Some of my colleagues at CloudPhysics and I spent years at VMware and were lucky to have participated in one of the most rapid transformations in enterprise technology history. A big part of that is VMware’s suite of clustering features. I worked alongside Carl Waldspurger in the resource management team at VMware that brought to the world the ESX VMkernel CPU and memory schedulers, DRS, DPM, Storage I/O Control and Storage DRS among other features. As a result, I am especially interested in analyzing and improving how IT organizations use clustering.

Over a series of blog posts, I’ll try to provide a snapshot of how IT teams are operationalizing vSphere. One of my co-founders, Xiaojun Liu and I performed some initial analysis on the broad community dataset that is continually expanding as more virtualization engineers securely connect us to their systems.

First, we segmented our analysis based on customer size. The idea was to isolate the effect of various deployment sizes including test labs, SMBs, commercial and large enterprise, etc. Our segmentation was in terms of total VMs in customer deployments and divided up as: 1-50 VMs, 51-200, 201-500, 501-upwards. Please let us know if you believe an alternative segmentation would warrant better analysis.

Initially we compared various ESX versions deployed in the field. We found ESXi 5.0 already captured the majority of installations in large deployments. However, 4.0 and 3.5 versions continue to be deployed in the field in small numbers. Version 4.1, on the other hand, continues to be more broadly deployed. If you are still using 4.1, 4.0, and 3.5, we recommend upgrading to 5.0 which provides greatly improved HA clustering, amongst many other benefits. This data shows the 5.0 version has been broadly adopted by our peers and is user-verified production ready.

Next, we looked at cluster sizes. A key question for VMware product managers was often, “How many hosts are there in a typical cluster?” This was a topic of considerable debate, and it is critically important to know when prioritizing features. For example, how much emphasis should go into scalability work for DRS.

For the first time, CloudPhysics is able to leverage real customer data to provide answers. The highest frequency cluster size is two hosts per cluster for customers with greater than 500 VMs. Refer to the histogram. This result is surprisingly low and we do not yet know all the contributing reasons, though we can speculate on some of the causes. These may be a combination of small trainiång clusters, dedicated clusters for some critical applications, Oracle clustering license restrictions, or perhaps a forgotten pair of older servers. Please tell us why you may have been keeping your clusters small.

Despite the high frequency of two-host clusters, we see opportunities for virtualization architects to increase their resource pooling. By pooling together hosts into larger clusters, DRS can do a much better job at placement and providing resource management. That means real dollars in savings. It also allows for more efficient HA policy management since the absorption of spare capacity needed for infrequent host failures is now spread out over a larger set of hosts. Additionally, having fewer clusters makes for fewer management objects to configure, keep in sync with changing policies, etc. This reduces management complexity and makes for a safer and more optimized environment.

Several caveats arise with regard to the above findings. First is potential sample bias. For instance, it might be the case that companies using CloudPhysics are more likely to be early adopters and that early adopters might be more inclined to upgrade to ESX 5.0 faster. Another possible issue is imbalanced dataset composition. It might be that admins are setting up small training or beta labs, official test & development, and production environments mixed in the same environment thus skewing the findings.

CloudPhysics is the first to provide a method of impartially determining answers based on real customer data, in order to dampen the controversy.

Xiaojun and I will continue to report back on these topics as the data evolves. In the meantime, the CloudPhysics site is growing with new cards being added weekly. Each card solves daily problems that virtualization engineers have described to us in our Community Cards section. I hope you will take the time to send us your feedback on the CloudPhysics site.

Database clustering support for vCloud Director added in version 5.1!

Duncan Epping · Oct 18, 2012 ·

Those who have been architecting vCloud Director environments from the early days know that this has always been a pain point. I personally have had many discussions with product management and engineering to get support for database clustering like Oracle RAC or Microsoft clustering services for MS SQL. Unfortunately neither 1.0 and 1.5 supported it. So the big questions always was, when will database clustering support for vCloud Director be added?

I had a couple of discussions around this again last week and noticed it was still not listed until someone pointed me to the vCAT 3.0 documents. Hidden on page 110 of document “3a Architecting a VMware vCloud.pdf” I found the following statement:

VMware vCloud component database resiliency is provided through database clustering. Microsoft Cluster Service for SQL and Oracle RAC are supported.

Yes I do realize that this is not a KB article, or even mentioned in the vCloud Director documentation. I have requested the docs to be revised and a KB to be created. Hopefully those will follow soon, for now this statement is all we needed! When the docs are revised or a KB is published I will add the references to this article.

<update – 18/Oct/2012> KB just got added – http://kb.vmware.com/kb/2037802 </update>

VMworld #NotSupported lightning talk slides – Hacking SRM

Duncan Epping · Oct 18, 2012 ·

I presented this 15 minute talk at VMworld about hacking SRM or actually hacking the Storage Replication Adapter which is part of SRM. I noticed William Lam shared his slides so I figured I would do the same. This slidedeck was based on two articles I did a while back around hacking the SRA, you might want to read them as well. ( 1 , 2 )

I hope they are useful. Once again, thanks to Randy Keener for coming up with this excellent idea and thanks to the brownbag guys for helping hosting this great initiative. Lets hope we will see more of this next year at VMworld,

vSphere HA fail-over in action – aka reading the log files

Duncan Epping · Oct 17, 2012 ·

I had a discussion with Benjamin Ulsamer at VMworld and he had a question about the state of a host when both the management network and storage network was isolated. My answer was that in that case the host will be reported as “dead” as there is no “network heartbeat” and no “datastore heartbeat”. (more info about heartbeating here) Funny thing is when you look at the log files you do see isolated instead of dead. Why is that? Before we answer it lets go through the log files and paint the picture:

Two hosts (esx01 and esx02) with a management network and an iSCSI storage network. vSphere 5.0 is used and Datastore Heartbeating is configured. For whatever reason for the network of esx02 is isolated (both storage and management as it is a converged environment. So what can you see in the log files?

Lets look at “esx02” first:

16:08:07.478Z [36C19B90 info ‘Election’ opID=SWI-6aace9e6] [ClusterElection::ChangeState] Slave => Startup : Lost master
- At 16:08:07 the network is isolated
16:08:07.479Z [FFFE0B90 verbose ‘Cluster’ opID=SWI-5185dec9] [ClusterManagerImpl::CheckElectionState] Transitioned from Slave to Startup
- The host recognizes it is isolated and drops from Slave to “Startup” so that it can elect itself as master to take action
16:08:22.480Z [36C19B90 info ‘Election’ opID=SWI-6aace9e6] [ClusterElection::ChangeState] Candidate => Master : Master selected
- The host has elected itself as master
16:08:22.485Z [FFFE0B90 verbose ‘Cluster’ opID=SWI-5185dec9] [ClusterManagerImpl::CheckHostNetworkIsolation] Waited 5 seconds for isolation icmp ping reply. Isolated
- Can I ping the isolation address?
16:08:22.488Z [FFFE0B90 info ‘Policy’ opID=SWI-5185dec9] [LocalIsolationPolicy::Handle(IsolationNotification)] host isolated is true
- No I cannot, and as such I am isolated!
16:08:22.488Z [FFFE0B90 info ‘Policy’ opID=SWI-5185dec9] [LocalIsolationPolicy::Handle(IsolationNotification)] Disabling execution of isolation policy by 30 seconds.
- Hold off for 30 seconds as “das.config.fdm.isolationPolicyDelaySec” was configured
16:08:52.489Z [36B15B90 verbose ‘Policy’] [LocalIsolationPolicy::GetIsolationResponseInfo] Isolation response for VM /vmfs/volumes/a67cdaa8-9a2fcd02/VMWareDataRecovery/VMWareDataRecovery.vmx is powerOff
- There is a VM with an Isolation Response configured to “power off”
16:10:17.507Z [36B15B90 verbose ‘Policy’] [LocalIsolationPolicy::DoVmTerminate] Terminating /vmfs/volumes/a67cdaa8-9a2fcd02/VMWareDataRecovery/VMWareDataRecovery.vmx
- Lets kill that VM!
16:10:17.508Z [36B15B90 info ‘Policy’] [LocalIsolationPolicy::HandleNetworkIsolation] Done with isolation handling
- And it is gone, done with handling the isolation

Lets take a closer look at “esx01”, what does this host see with regards to the management network and storage network isolation of “esx02”:

16:08:05.018Z [FFFA4B90 error ‘Cluster’ opID=SWI-e4e80530] [ClusterSlave::LiveCheck] Timeout for slave @ host-34
- The host is not reporting itself any longer, the heartbeats are gone…
16:08:05.018Z [FFFA4B90 verbose ‘Cluster’ opID=SWI-e4e80530] [ClusterSlave::UnreachableCheck] Beginning ICMP pings every 1000000 microseconds to host-34
- Lets ping the host itself, it could be the FDM agent is dead.
16:08:05.019Z [FFFA4B90 verbose ‘Cluster’ opID=SWI-e4e80530] Reporting Slave host-34 as FDMUnreachable
16:08:05.019Z [FFD5BB90 verbose ‘Cluster’] ICMP reply for non-existent pinger 3 (id=isolationAddress)
- As it is just a 2 node cluster, lets make sure I am not isolated myself, I got a reply so I am not isolated!
16:08:10.028Z [FFFA4B90 verbose ‘Cluster’ opID=SWI-e4e80530] [ClusterSlave::UnreachableCheck] Waited 5 seconds for icmp ping reply for host host-34
16:08:14.035Z [FFFA4B90 verbose ‘Cluster’ opID=SWI-e4e80530] [ClusterSlave::PartitionCheck] Waited 15 seconds for disk heartbeat for host host-34 – declaring dead
- There is also no datastore heartbeat so the host must be dead. (Note that it cannot see the difference between a fully isolated host and a dead host when using IP based storage on the same network.)
16:08:14.035Z [FFFA4B90 verbose ‘Cluster’ opID=SWI-e4e80530] Reporting Slave host-34 as Dead
- It is officially dead!
16:08:14.036Z [FFE5FB90 verbose ‘Invt’ opID=SWI-42ca799] [InventoryManagerImpl::RemoveVmLocked] marking protected vm /vmfs/volumes/a67cdaa8-9a2fcd02/VMWareDataRecovery/VMWareDataRecovery.vmx as in unknown power state
- We don’t know what is up with this VM, power state unknown…
16:08:14.037Z [FFE5FB90 info ‘Policy’ opID=SWI-27099141] [VmOperationsManager::PerformPlacements] Sending a list of 1 VMs to the placement manager for placement.
- We will need to restart one VM, lets provide its details to the Placement Manager
16:08:14.037Z [FFE5FB90 verbose ‘Placement’ opID=SWI-27099141] [PlacementManagerImpl::IssuePlacementStartCompleteEventLocked] Issue failover start event
- Issue a failover event to the placement manager.
16:08:14.042Z [FFE5FB90 verbose ‘Placement’ opID=SWI-e430b59a] [DrmPE::GenerateFailoverRecommendation] 1 Vms are to be powered on
- Lets generate a recommendation on where to place the VM
16:08:14.044Z [FFE5FB90 verbose ‘Execution’ opID=SWI-898d80c3] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/a67cdaa8-9a2fcd02/VMWareDataRecovery/VMWareDataRecovery.vmx on __localhost__ (cmd ID host-28:0)
- We know where to place it!
16:08:14.687Z [FFFE5B90 verbose ‘Invt’] [HalVmMonitor::Notify] Adding new vm: vmPath=/vmfs/volumes/a67cdaa8-9a2fcd02/VMWareDataRecovery/VMWareDataRecovery.vmx, moId=12
- Lets register the VM so we can power it on
16:08:14.714Z [FFDDDB90 verbose ‘Execution’ opID=host-28:0-0] [FailoverAction::ReconfigureCompletionCallback] Powering on vm
- Power on the impacted VM

That is it, nice right… and is just a short version of what is actually in the log files. It contains a massive amount of details! Anyway, back to the question… if not already answered, the remaining host in the cluster sees the isolated host as dead as there is no:

network heartbeat
response to a ping to the host
datastore heartbeat

The only thing the master can do at that point is to assume the “isolated” host is dead.

** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **