Converged compute and storage solutions

Lately I have been looking more and more in to converged compute and storage solutions, or “datacenter in a box” solutions as some like to call them. I am a big believer of this concept as some of you may have noticed. Those who have never heard of these solutions, an example of this would be Nutanix or Simplivity. I have written about both Nutanix and Simplivity in the past, and for a quick primer on those respective solutions I suggest to read those articles. In short, these solutions run a hypervisor with a software based storage solution that creates a shared storage platform from local disks. In others, no SAN/NAS required, or as stated… a full datacenter experience in just a couple of U’s.

One thing that stood out to me though in the last 6 months is that for instance Nutanix is often tied to VDI/View solutions, in a way I can understand why as it has been part of their core message / go-to-market strategy for a long time. In my opinion though there is no limit to where these solutions can grow and go. Managing storage, or better said your full virtualization infrastructure, should be as simple as creating or editing a virtual machine. One of the core principles mentioned during the vCloud Distributed Storage talk at VMworld, by the way vCloud Distributed Storage is a VMware software defined storage initiative.

Hopefully people are starting to realize that these so-called Software Defined Storage solutions will fit in most, if not all, scenarios out there today. I’ve been having several discussions with people about these solutions and wanted to give some examples of how it could fit in to your strategy.

Just a week ago I was having a discussion with a customer around disaster recovery. They wanted to add a secondary site and replicate their virtual machines to that site. The cost associated with a second storage array was holding them back. After an introduction to converged storage and compute solutions they realized they could step in to the world of disaster recovery slowly. They realized that these solutions allowed them to protect their Tier-1 applications and expand their DR protected estate when required. By using a converged storage and compute solutions they can avoid the high upfront cost and it allows them to scale out when needed (or when they are ready).

One of the service providers I talk to on a regular basis is planning on creating a new cloud service. Their current environment is reaching its limits and predicting how this new environment will grow in the upcoming 12 months is difficult due to the agile and dynamic nature of this service they are developing. The great thing though about a converged storage and compute solution is that they can scale out whenever needed, without a lot of hassle. Typically the only requirement is the availability of 10Gbps ports in your network. For the provider though the biggest benefit is probably that services are defined by software. They can up-level or expand their offerings when they please or when there is a demand.

These are just two simple examples of how a converged infrastructure solution could fit in to your software defined datacenter strategy. The mentioned vendors Nutanix and Simplivity are also just two examples out of various companies offering these. I know of multiple start-ups who are working on a similar products and of course there are the likes of Pivot3 who already offer turnkey converged solutions. As stated earlier, personally I am a big believer in these architectures and if you are looking to renew your datacenter or at the verge of a green-field deployment… I highly recommend researching these solutions.

Go Software Defined – Go Converged!

Faking an SSD in your virtualized vSphere lab

I have written about this before (and so has William Lam, so all credits go to William), but I wanted to note down these commands for my own use as I find myself digging around often for the same commands these days. So what is my goal: Faking an SSD in my virtualized vSphere lab.

In my lab I have a bunch of virtualized ESXi hosts. Those hosts have multiple disks and I want to mark one of those disks as SSD. To keep things simple I set things up as follows. Just to point out, I use 0:0 / 1:0 / 2:0 so that each device gets a new controller and is easy to identifiy:

  • First Disk – ESXi install disk – 5GB – SCSI 0:0
  • Second Disk – Fake SSD – 40GB – SCSI 1:0
  • Third Disk – Large disk – 1TB – SCSI 2:0

When I boot all disks are recognized as regular disks and in some cases as non-local. In my testing I need local disks and need SSD. So this is what I did to get exactly that. With the first command I mark the “second disk” as SSD and local. With the second command I mark the third disk as local. Next I reclaim the devices so that the new SATP rules are applied.

esxcli storage nmp satp rule add --satp VMW_SATP_LOCAL --device mpx.vmhba2:C0:T0:L0 --option "enable_local enable_ssd"
esxcli storage nmp satp rule add --satp VMW_SATP_LOCAL --device mpx.vmhba3:C0:T0:L0 --option "enable_local"
esxcli storage core claiming reclaim -d mpx.vmhba2:C0:T0:L0
esxcli storage core claiming reclaim -d mpx.vmhba3:C0:T0:L0

Next you can simply validate if it has worked by typing the following for device vmhba2 and 3 (if you replace the 2 with a 3 ofcourse) :

esxcli storage core device list --device=mpx.vmhba2:C0:T0:L0

As you can see, faking an SSD is fairly straight forward. Note that even if you have an SSD drive you still might need to do this. In some cases the SSD drive is not recognized and you will need to create a rule for it manually.

Percentage Based Admission Control gives lower VM restart guarantee?

Those who have configured vSphere HA have all seen that section where it asks if you want to use admission control or not. Of course if you decide you want to use it, and you should want this, then the next question that comes is which one do you want to use? I have always preferred the “Percentage Based Admission Control” policy. For some reason though there are people who think that the percentage based admission control policy rules out large VMs from being restarted or offers a lower guarantee.

The main perception that people have is that the percentages based admission control policy gives lower guarantees of virtual machines being restarted than the “host failures” admission control policy. So let break it down, and I mean BREAK IT DOWN, by using an example.

Example

  • 5 hosts
  • 200GB of Memory in cluster
  • 20GHz of CPU in cluster

If no reservations are set:

Percentage Based will do the following:

  1. The Percentage Based policy will take the total amount of resources and subtract the amount of resources reserved for fail-over. If that percentage is for instance 20% than 40GB and 4GHz are subtracted. Which means 160GB and 16GHz are left.
  2. The reserved resources for every virtual machine that is powered on is subtracted from what the outcome of 1. was. If no reservation is set memory then memory overhead is subtracted, if the memory overhead is 200MB then 200MB is subtracted from the 160GB that was left resulting in 159,8GB being available. For CPU the default of 32MHz will be used.
  3. You can power-on virtual machines until the amount of available resources, according to HA Admission Control, is depleted, yes many VMs in this case.

Host Failures will do the following:

  1. The Host Failures policy will calculate the amount of slots. A slot is formed out of two components: memory and cpu. As no reservation is used the default for CPU is used which is 32MHz, with vSphere 5.0 and higher. For memory the largest memory overhead size is used, in this scenario there could be a variety of sizes lets say the smallest is 64MB and the largest 300MB. Now 300MB will be used for the Memory Slot size.
  2. Now that the slotsize is known Admission Control will look for the host with the most slots (available resources / slot size) and subtract those slots from the total amount of available slots. (If one host failure is specified). Every time a VM is started a slot is subtracted. If a VM is started with a higher memory reservation we go back to 1 and the math will need to be done again.
  3. You can power-on virtual machines until you are out of slots, again… many VMs.

If reservations are set:

Percentage Based will do the following:

  1. The Percentage Based policy will take the total amount of resources and subtract the amount of resources reserved for fail-over. If that percentage is for instance 20% than 40GB and 4GHz are subtracted. Which means 160GB and 16GHz are left.
  2. The reserved resources for every virtual machine that is powered on is subtracted from what the outcome of 1 was. So if 10GB of memory was reserved, then 10GB is subtracted resulting in 150GB being available.
  3. You can power-on virtual machines until available resources are depleted (according to HA Admission Control), but as reservations are used you are “limited” in terms of the amount of VMs you can power-on.

Host Failures will do the following:

  1. The Host Failures policy will calculate the amount of slots. A slot is formed out of two components: memory and cpu. As a reservation is used for memory but not for CPU the default for CPU is used which is 32MHz, with vSphere 5.0 and higher. For memory there is a 10GB reservation set. 10GB will be used for the Memory Slot size.
  2. Now that the slotsize is known Admission Control will look for the host with the most slots (available resources / slot size) and subtract those slots from the total amount of available slots. (If one host failure is specified). Every time a VM is started a slot is subtracted, yes that is a 10GB memory slot, even if it has for instance a 2GB reservation. If a VM is started with a higher memory reservation we go back to 1 and the math will need to be done again.
  3. You can power-on virtual machines until you are out of slots, as a high reservation is set you will be severely limited!

Now you can imagine that “Host Failures” can be on the safe side… If you have 1 reservation set the math will be done with that reservation. This means that a single 10GB reservation will impact how many VMs you can power-on until HA screams that it is out of resources. But at least you are guaranteed you can power them on right? Well yes, but realistically speaking people disable Admission Control at this point as that single 10GB reservation allows you to power on just a couple of VMs. (16 to be precise.)

But but that beats Percentage Based right… because if I have a lot of VMs who says my VM with 10GB reservation can be restarted? First of all, if there are no “unreserved resources” available on any given host to start this virtual machine then vSphere HA will ask vSphere DRS to defragment the cluster.As HA Admission Control had already accepted this virtual machine to begin with, chances are fairly high that DRS can solve the fragmentation.

Also, as the percentage based admission control policy uses reservations AND memory overhead… how many virtual machines do you need to have powered-on before your VM with 10 GB memory reservation is denied to be powered-on? It would mean that none of the hosts has 10GB of unreserved memory available. That is not very likely as that means you would need to power-on hundreds of VMs… Probably way too many for your environment to ever perform properly. So chances of hitting this scenario are limited, extremely small.

Conclusion

Although theoretically possible, it is very unlikely you will end up in situation where one or multiple virtual machines can not be restarted when using the Percentage Based Admission Control policy. Even if you are using reservations on all virtual machines then this is unlikely as the virtual machines have been accepted at some point by HA Admission Control and HA will leverage DRS to defragment resources at that point. Also keep in mind that when using reservations on all virtual machines that Host Failures is not an option as it will skew your numbers as it does the math with “worst case scenario”, a single 10GB reservation can kill your ROI/TCO.

In short: Go Percentage Based!

Isolation detection in vSphere 5.1 versus 5.0

I received a question today from someone who wanted to know the difference for isolation detection between vSphere 5.0 and 5.1. I described this in our book, but I figured I would share it here as well. Note that this is an outtake from the book.

The isolation detection mechanism has changed substantially since previous versions of vSphere. The main difference is the fact that HA triggers a master election process before it will declare a host is isolated. In this timeline, “s” refers to seconds. The following timeline is the timeline for a vSphere 5.0 host:

  • T0 – Isolation of the host (slave)
  • T10s – Slave enters “election state”
  • T25s – Slave elects itself as master
  • T25s – Slave pings “isolation addresses”
  • T30s – Slave declares itself isolated and “triggers” isolation response

For a vSphere 5.1 host this timeline slightly differs due the insertion of a minimum 30s delay after the host declares itself isolated before it applies the configured isolation response. This delay can be increased using the advanced option das.config.fdm.isolationPolicyDelaySec.

  • T0 – Isolation of the host (slave)
  • T10s – Slave enters “election state”
  • T25s – Slave elects itself as master
  • T25s – Slave pings “isolation addresses”
  • T30s – Slave declares itself isolated
  • T60s – Slave “triggers” isolation response

Or as Frank would say euuuh show:

Isolation detection in vSphere 5.1 versus 5.0

When the isolation response is triggered, with both 5.0 and 5.1, HA creates a “power-off” file for any virtual machine HA powers off whose home datastore is accessible. Next it powers off the virtual machine (or shuts down) and updates the host’s poweron file. The power-off file is used to record that HA powered off the virtual machine and so HA should restart it. These power-off files are deleted when a virtual machine is powered back on or HA is disabled.

After the completion of this sequence, the master will learn the slave was isolated through the “poweron” file as mentioned earlier, and will restart virtual machines based on the information provided by the slave.

vSphere 5.1 Performance History issue!

I was just made aware of this issue with the vSphere 5.1 performance history charts. The symptoms are straight forward:

  • You see only the last 30 days of performance history in the ‘past year’ view
  • Only one month of performance history appears when trying to see the past year

VMware is working on a solution for the problem currently. It seems the problem has been identified (stored procedure purging data) and hopefully a fix will be released soon.

Please subscribe to the following KB to be kept up to date:
http://kb.vmware.com/kb/2042164