Drag and drop vMotion not working with the 5.5 Web Client?

A couple of weeks I bumped into this issue where I constantly received a red cross when I wanted to “drag and drop” vMotion a virtual machine using the vSphere 5.5 Web Client. Annoying as it is something which I was waiting for to use as I used this all the time with the vSphere Client. Unfortunately it so happened that I stumbled in to a bug. Apparently when you do a drag and drop migration certain scenarios are filtered out to avoid issues. I guess the filter is too aggressive as today it filters out drag and drop to a host without the use of resource pools. The screenshot shows what this problem looks like in the UI.

I filed the bug of course, but unfortunately it was too late for the fix to make it in to the release. The engineering team has told me they are aiming to fix this in the first update release. So consider this an FYI to avoid getting frustrated around not being able to get this drag and drop thingie working. The support team just published a KB article on this matter as well.

Start your engines, time to download vSphere 5.5

What a way to start the Sunday / Monday (depending on where you are) right? Yes, the day has finally come… Start your engines, it is time to download vSphere 5.5. Just like last year I decided to make a nice short post with all the links to the required download pages, I hope it makes your life easier!

Core vSphere and automation/tools:

Suite components:

Now start downloading and update those test environments / labs!

Be careful when defining a VM storage policy for VSAN

I was defining a VM storage policy for VSAN and it resulted in something unexpected. You might have read that when no policy is defined within vCenter that VSAN defaults to the following for availability reasons:

  • Failures to tolerate = 1

So I figured I would define a new policy and include “stripe width” in this policy. I wanted to have a stripe width of 2 and “failures to tolerate” set to the default of 1. I figured as “failures to tolerate” is set to 1 anyway by default I would specify it, but would just specify stripe width. Why add rules which already have the correct value right?

VM storage policy for VSAN

Well that is what I figured, no point in adding it… and this was the result:

Do you notice something in the above screenshot? I do… I see no “RAID 1″ mentioned and all components reside on the same host, esx014, in this case. So what does that mean? It means that when you create a profile and do not specify “failures to tolerate” that is default to 0 and no mirror copies are created. This is not the situation you want to find yourself in! So when you define stripe width, make sure you also define “failures to tolerate”. Even better, when you create a VM Storage Policy always include “failures to tolerate. Below is an example of what my policy should have looked like.

VM storage policy for VSAN

So remember this: When defining a new VSAN VM Storage Policy always include “Number of failures to tolerate”! If you did forget to specify it, the nice thing here is that you can change VM Storage Policies on the fly and apply them directly to your VMs. Cormac has a nice article on this subject!

Isolation / Partition scenario with VSAN cluster, how is this handled?

After explaining how a disk or host failure worked in a VSAN cluster, it only made sense to take the next step… How are Isolations or Partitions in a Virtual SAN cluster handled? I guess lets start with the beginning, and I am going to try to keep it simple, first a recap of what we learned in the disk/host failures article.

Virtual SAN (VSAN) has the ability to create mirrors of objects. This ability is defined within a policy (VM Storage Policy aka Storage Policy Based Management). You can define option called “failures to tolerate” anywhere between 0 and 3 at the moment. By default this option is set to 1. This means you will have two copies of your data. On top of that VSAN will need a witness / quorum to help figuring out who takes ownership in the case of an event. So what does this look like? Note that in the below diagram I used the term “vmdk” and “witness” to simplify things, in reality this could be any type of component of a VM.

So what did we learn from this (hopefully) simple diagram?

  • A VM does not necessarily have to run on the same host as where its storage objects are sitting
  • The witness lives on a different host than the components it is associated with in order to create an odd number of hosts involved for tiebreaking under a network partition
  • The VSAN network is used for communication, IO and HA

Lets recap some of the HA changes first for a VSAN cluster before we dive in to the details:

  • When HA is turned on in the cluster, FDM agent (HA) traffic uses the VSAN network and not the Management Network. However, when a potential isolation is detected HA will ping the default gateway (or specified isolation address) using the Management Network.
  • When enabling VSAN ensure vSphere HA is disabled. You cannot enable VSAN when HA is already configured. Either configure VSAN during the creation of the cluster or disable vSphere HA temporarily when configuring VSAN.
  • When there are only VSAN datastores available within a cluster then Datastore Heartbeating is disabled. HA will never use a VSAN datastore for heartbeating as the VSAN network is already used for network heartbeating using the Datastore for heartbeating would not add anything,
  • When changes are made to the VSAN network it is required to re-configure vSphere HA!

As you can see the VSAN network plays a big roll here, and even bigger then you might realize as it is also used by HA for network heartbeating. So what if the host on which the VM is running gets isolated from the rest of the network? The following would happen:

  • HA will detect there are no network heartbeats received from “esxi-01″
  • HA master will try to ping the slave “esxi-01″
  • HA will declare the slave “esxi-01″ is unavailable
  • VM will be restarted on one of the other hosts… “esxi-02″ in this case, but that could be any, depicted in the diagram below

Simple right? Before I forget, for these scenarios it is important to ensure that your isolation response is set to power-off. But I guess the question now arises… what if “esxi-01″ and “esxi-02″ would be part of the same partition? What happens then? Well that is where the witness comes in to play. Let show the diagram first, as that will make it a bit easier to understand!

Now this scenario is slightly more complex. There are two partitions, one of the partition is running the VM with its VMDK and the other partition has a VMDK and a witness. Guess what happens? Right, VSAN uses the witness to see which partition has quorum and based on that fact one of the two will win. In this case Partition-2 has more than 50% of the components of this object and as such is the winner. This means that the VM will be restarted on either “esxi-03″ or “esxi-04″ by HA. Note that the VM in Partition-1 will not be powered off, even if you have configured the isolation response to do so, as this partition would re-elect a master and would be able to see each other!

But what if “esxi-01″ and “esxi-04″ were isolated, what would happen then? This is what it would look like:

Remember that rule which I slipped in to the previous paragraph? The winner is declared based on the % of components available within that partition. If the partition has access to more than 50% it has won. Meaning that when “esxi-01″ and “esxi-04″ are isolated, either “esxi-02″ or “esxi-03″ can restart the VM because 66% of the components reside within this part of the cluster. Nice right?!

I hope this makes isolations / partitions a bit clearer, I realize though concepts will be tough for the first weeks/months… I will try to explore some more (complex) scenarios in the near future.

How VSAN handles a disk or host failure

I have had this question multiple times by now, I wanted to answer it in the Virtual SAN FAQ but I figured I would need some diagrams and probably more than 2 or 3 sentences to explain this. How are host or disk failures in a Virtual SAN cluster handled? I guess lets start with the beginning, and I am going to try to keep it simple.

I explained some of the basics in my VSAN intro post a couple of weeks back, but it never hurts to repeat this. I think it is good to explain the IO path first before talking about the failures. Lets look at a 4 host cluster with a single VM deployed. This VM is deployed with the default policy, meaning “stripe width” of 1 and “failures to tolerate” to 1 as well. When deployed in this fashion the following is the result:

In this case you can see: 2 mirrors of the VMDKs and a witness. These VMDKs by the way are the same, they are an exact copy. What else did we learn from this (hopefully) simple diagram?

  • A VM does not necessarily have to run on the same host as where its storage objects are sitting
  • The witness lives on a different host than the components it is associated with in order to create an odd number of hosts involved for tiebreaking under a network partition
  • The VSAN network is used for communication / IO etc

Okay, so now that we know these facts it is also worth knowing that VSAN will never place the mirror on the same host for availability reasons. When a VM writes the IO is mirrored by VSAN and will not be acknowledged back to the VM until all have completed. Meaning that in the example above both the acknowledgement from “esxi-02″ and “esxi-03″ will need to have been received before the write is acknowledge to the VM. The great thing here is though that all writes will go to flash/ssd, this is where the write-buffer comes in to play. At some point in time VSAN will then destage the data to your magnetic disks, but this will happen without the guest VM knowing about it… [Read more...]