High latency VPLEX configuration and vMotion optimization

This week someone asked me about an advanced setting to optimize vMotion for VPLEX configurations. This person referred to the vSphere 5.5 Performance Best Practices paper and more explicitly the following section:

Add the VMX option (extension.converttonew = “FALSE”) to virtual machine’s .vmx files. This option optimizes the opening of virtual disks during virtual machine power-on and thereby reduces switch-over time during vMotion. While this option can also be used in other situations, it is particularly helpful on VPLEX Metro deployments.

I had personally never heard of this advanced setting and I did some searches both internally and externally and couldn’t find any references other than in the vSphere 5.5 Performance paper. Strange, as you could expect with a generic recommendation like the above that it would be mentioned at least in 1 or 2 other spots. I reached out to one of the vMotion engineers and after going back and forth I figured out what the setting is for and when it should be used.

During testing with VPLEX and VMs using dozens of VMDKs in a “high latency” situation it could take longer than expected before the switchover between hosts had happened. First of all, when I say “high latency” we are talking about close to the max tolerated for VPLEX which is around 10ms RTT. When “extension.converttonew” is used the amount of IO needed during the switchover is limited, and when each IO takes 10ms you can imagine that has a direct impact on the time it takes to switchover. Of course these enhancements where also tested in scenarios where there wasn’t high latency, or a low number of disks were used, and in those cases the benefits of the enhancements were negligible and the operation overhead of configuring this setting did not weigh up against the benefits.

So to be clear, this setting should only be used in scenarios where high latency and a high number of virtual disks results in a long switchover time during migrations of VMs between hosts in a vMSC/VPLEX configuration. I hope that helps.

Shift in focus… Go Storage & Availability OCTO!

Almost a year ago I joined the Office of CTO under Paul Strong. My main focus was SDDC, but I naturally gravitated towards the core platform (vSphere) and software defined storage and topics like availability. Not just my personal preference, but also a common requested topic for public speaking engagements. Most VMUG speaking requests I receive are around VSAN, VVols or vSphere HA. Each year I take some time to reflect on where I am, what I do, and where I want to go. This year I asked myself what really excited me in todays world of IT/infrastructure? What am I most passionate about? What do I enjoy talking and writing about the most?

Having written books on Virtual SAN and vSphere Clustering, and countless blog posts on the topic of software defined storage, BC/DR and availability it was pretty obvious what I am most passionate about. I like talking and writing about Virtual SAN, Virtual Volumes, Site Recovery Manager and it is safe to say that I am a vSphere HA fanboy. I am most passionate about Storage & Availability, that much was obvious

At an internal event I had a conversation with Charles Fan and Christos Karamanolis. The Storage & Availability BU was considering creating an Office of CTO and they asked if I would be interested in collaborating in some shape or form. For me this was a no-brainer. Knowing what is coming for Virtual SAN and Virtual Volumes (and future products we are working on) I asked myself if collaborating would be the best option or if I should take that next step. The decision was easy, as of this week I have officially joined the Office of CTO of the Storage & Availability BU.

In the Office of CTO I will be responsible for connecting our R&D team with customers, partners and our field. I will be evangelizing software defined storage and availability, primarily in EMEA and APJ. I will focus on defining and communicating VMware’s vision and strategy, and be an active advisor for our product roadmap and portfolio. I can’t be more excited than this, I am super enthusiastic about all what is to come out of our business unit and it is extremely energizing to say the least to talk to our customers about what we do today and what is coming tomorrow. As a big plus I get to work with my friend Rawlinson Rivera once again, and report in to someone I greatly respect namely Christos who will be heading up the team. Make sure to read Christos’s blog post on the team that has being formed and some hints of what you can expect in the future. Lets get busy!

Thanks Charles, Christos and Paul for this great opportunity!

vSphere Metro Storage Cluster with vSphere 6.0 paper released

I’d already blogged about this on the VMware blog, but I figured I would share it here as well. The vSphere Metro Storage Cluster with vSphere 6.0 white paper has been released. I worked on this paper together with my friend Lee Dilworth, it is an updated version of the paper we did in 2012. It contains all of the new best practices for vSphere 6.0 when it comes to vSphere Metro Storage Cluster implementations, so if you are looking to implement one or upgrade an existing environment make sure to read it!

VMware vSphere Metro Storage Cluster Recommended Practices

VMware vSphere Metro Storage Cluster (vMSC) is a specific configuration within the VMware Hardware Compatibility List (HCL). These configurations are commonly referred to as stretched storage clusters or metro storage clusters and are implemented in environments where disaster and downtime avoidance is a key requirement. This best practices document was developed to provide additional insight and information for operation of a vMSC infrastructure in conjunction with VMware vSphere. This paper explains how vSphere handles specific failure scenarios, and it discusses various design considerations and operational procedures. For detailed information about storage implementations, refer to documentation provided by the appropriate VMware storage partner.

Horizon View and All-Flash VSAN

I typically don’t do these short posts which simply point to a white paper, but I really liked this paper on the topic of VMware Horizon View and All-Flash VSAN. In the paper it is demonstrated how to build an all-flash VSAN cluster using Dell servers, SanDisk flash and Brocade switches. Definitely recommended read if you are looking to deploy Horizon View anytime soon.

VMware Horizon View and All Flash Virtual SAN Reference Architecture
This Reference Architecture demonstrates how enterprises can build a cost-effective VDI infrastructure using VMware All Flash Virtual SAN combined with the fast storage IO performance offered by SSDs. The combination of Virtual SAN and all flash storage can significantly improve ROI without compromising on the high availability and scalability that customers demand.

vSphere Metro Storage Cluster with vSphere 5.5

I had a couple of questions around the exact settings for vSphere Metro Storage Clusters with vSphere 5.5. It was the third time in two weeks I shared the same info about vMSC with vSphere 5.5 so I figured I would write a quick blog making the information a bit easier to find through google. Below you can find the settings required for a vSphere Metro Storage Cluster with vSphere 5.5. Note that in-depth details around operations / testing can be found in this white paper: version 5.x // version 6.0.

  1. VMkernel.Boot.terminateVMOnPDL = True
  2. Das.maskCleanShutdownEnabled = True 
  3. Disk.AutoremoveOnPDL = 0 

I want to point out that if you migrate from 5.0 or 5.1 that Host Advanced Setting “VMkernel.Boot.terminateVMOnPDL” replaces disk.terminateVMOnPDLDefault (/etc/vmware/settings). Das.maskCleanShutdownEnabled is actually configured to “true” by default as of vSphere 5.1 and later, but personally I prefer to set it anyway so that I know for sure it has been configured accurately. Then there is Disk.AutoremoveOnPDL, this setting is new in vSphere 5.5 as discussed here. Make sure to disable it, as PDLs are likely to be temporary there is no point removing the devices and then having to do a rescan to have them reappear, it only slows down your process recovery. (EMC also recommends this by the way, see page 21 of this PDF on vMSC/VPLEX).