7.0 u1

Cleaning up old vSAN File Services OVF files on vCenter Server

Duncan Epping · Oct 3, 2022 ·

There was a question last week about the vSAN File Services OVF Files, the question was about the location where they were stored. I did some digging in the past, but I don’t think I ever shared this. The vSAN File Services OVF is stored on vCenter Server (VCSA) in a folder, for each version. The folder structure looks as show below, basically each version of an OVF has a directory with required OVF files.

root@vcsa-duncan [ ~ ]# ls -lha /storage/updatemgr/vsan/fileService/

total 24K

vsan-health users 4.0K Sep 16 16:09 .

vsan-health root  4.0K Nov 11  2020 ..

vsan-health users 4.0K Nov 11  2020 ovf-7.0.1.1000

vsan-health users 4.0K Mar 12  2021 ovf-7.0.2.1000-17692909

vsan-health users 4.0K Nov 24  2021 ovf-7.0.3.1000-18502520

vsan-health users 4.0K Sep 16 16:09 ovf-7.0.3.1000-20036589

root@vcsa-duncan [ ~ ]# ls -lha /storage/updatemgr/vsan/fileService/ovf-7.0.1.1000/

total 1.2G

vsan-health users 4.0K Nov 11  2020 .

vsan-health users 4.0K Sep 16 16:09 ..

vsan-health users 179M Nov 11  2020 VMware-vSAN-File-Services-Appliance-7.0.1.1000-16695758-cloud-components.vmdk

vsan-health users 5.9M Nov 11  2020 VMware-vSAN-File-Services-Appliance-7.0.1.1000-16695758-log.vmdk

vsan-health users  573 Nov 11  2020 VMware-vSAN-File-Services-Appliance-7.0.1.1000-16695758_OVF10.mf

vsan-health users  60K Nov 11  2020 VMware-vSAN-File-Services-Appliance-7.0.1.1000-16695758_OVF10.ovf

vsan-health users 998M Nov 11  2020 VMware-vSAN-File-Services-Appliance-7.0.1.1000-16695758-system.vmdk

I’ve asked the engineering team, and yes, you can simply delete obsolete versions if you need the disk capacity.

iSCSI IQN may have changed after upgrade to 7.0 U2

Duncan Epping · Jun 29, 2021 ·

Last week I noticed some folks reporting that they had an issue with upgrades to 7.0 U2 from 7.0 U1. The issue they experienced was not being able to access their iSCSI Datastores any longer. I did some digging internally and found out there was a change in how we store the iSCSI IQN when the IQN is randomly created. Now note, this problem only exists for randomly created IQNs, so if you have a custom-named iSCSI IQN then you can stop reading here. If you have a random IQN and also have access control defined for your initiators, then you will want to read this if you are planning on upgrading.

Basically what happens if you upgrade from 7.0 U1 to 7.0U2 and you use a randomly generated IQN is that we regenerate the IQN after a reboot. What does that look like, well on VMTN user Nebb2k8 posted this:

7U1d = iqn.1998-01.com.vmware:labesx06-4ff17c83

7U2a = iqn.1998-01.com.vmware:labesx07:38717532:64

As you can see, the format also changed.

So if you lose (or lost) access after the upgrade, you simply copy the newly generated IQN and add it to the access control list of your storage system for the LUNs it applies to. Make sure to remove the old IQNs. Another option of course is to configure the randomly generated IQN as a custom IQN, this is pretty straightforward as shown below for “vmhba67”. You could create new IQNs, or you could re-use the randomly generated old IQNs if you want to keep them the same.

$ esxcli iscsi adapter get -A vmhba67

 vmhba67
&nbsp; &nbsp;Name: iqn.1998-01.com.vmware:w1-hs3-n2503.eng.vmware.com:452738760:67

$ esxcli iscsi adapter set -A vmhba67 -n iqn.1998-01.com.vmware:w1-hs3-n2503.eng.vmware.com:452738760:67

If you would like to know more about this issue, make sure to read this KB article, or read this article by Jason Massae which also provides some PowerCLI code to get/set the IQN.

Does vSAN Enhanced Durability work when you have a limited number of hosts?

Duncan Epping · Apr 19, 2021 ·

Last week I had a question about how vSAN Enhanced Durability works when you have a limited number of hosts. In this case, the customer had a 3+3+1 stretched cluster configuration, and they wondered what would happen when they would place a host into maintenance mode. Although I was pretty sure I knew what would happen, I figured I would test it in the lab anyway. Let’s start with a high-level diagram of what the environment looks like. Note I use a single VM as an example, just to keep the scenario easy to follow.

In the diagram, we see a virtual disk that is configured to be stretched across locations, and protected by RAID-1 within each location. As a result, you will have two RAID-1 trees each with two components and a witness, and of course, you would have a witness component in the witness location. Now the question is, what happens when you place esxi-host-1 into maintenance mode? In this scenario, vSAN Enhanced Durability will want to create a “durability component”. This durability component is used to commit all new write IO to. This will allow vSAN to resync fast after maintenance mode, and enhances durability as we would still have 2 copies of the (new) data.

However, in the scenario above we only have 3 hosts per location. The question then is, where is this delta component created then? As normally with maintenance mode you would need a 4th host to move data to. Well, it is simple, in this case, what vSAN does is it creates a “durability component” on the host where the witness resides, within the location of course. Let me show you in a diagram, as that makes it clear instantly.

By adding the Durability component next to the witness on esxi-host-3, vSAN enhances durability even in this stretched cluster situation, as it provides a local additional copy of new writes. Now, of course I tested this in my lab. So for those who prefer to see a demo, check out the youtube video below.

Can I vMotion a VM while IO Insight is tracing it?

Duncan Epping · Mar 4, 2021 ·

Today during the Polish VMUG we had a great question, basically, the question was if you can vMotion a VM while vSAN IO Insight is tracing it. I did not know the answer as I had never tried it, so I had to test and validate it in the lab. While testing it became obvious that IO Insight and vMotion are not a supported combination today. Or better said, when you vMotion a VM which has IO Insight enabled and the VM is being traced, then the tracing will stop and you will not be able to inspect the results. When you click on view results you will see the error suggesting that the “monitored VMs might be deleted” as shown below.

For now, if you are tracing a VM for an extended period of time, make sure to override the DRS automation level for that VM so that DRS does not interfere with the tracing. (You can do this on a per VM basis.) I would also recommend informing other administrators to not manually migrate the VM temporarily to avoid the situation where the trace is stopped. You may wonder why this is the case, well it is pretty simple, tracing happens on a host level. We start a user world on the host where the VM is running to trace the IO. If you move the VM, the user world doesn’t know what has happened to the VM unfortunately. For now, who knows if this is something that may change over time… Either way, I would always recommend not migrating VMs while tracing, as that also impacts the data.

Hope that helps, and thank Tomasz for the great question!

VMs which are not stretched in a stretched cluster, which policy to use?

Duncan Epping · Dec 14, 2020 ·

I’ve seen this question popping up regularly. Which policy setting (“site disaster tolerance” and “failures to tolerate”) should I use when I do not want to stretch my VMs? Well, that is actually pretty straight forward, in my opinion, you really only have two options you should ever use:

None – Keep data on preferred (stretched cluster)
None – Keep data on non-preferred (stretched cluster)

Yes, there is another option. This option is called “None – Stretched Cluster” and then there’s also “None – Standard Cluster”. Why should you not use these? Well, let’s start with “None – Stretched Cluster”. In the case of “None – Stretched Cluster”, vSAN will per object decide where to place it. As you hopefully know, a VM consists of multiple objects. As you can imagine, this is not optimal from a performance point of view, as you could end up having a VMDK being placed in Site A and a VMDK being placed in Site B. Which means it would read and write from both locations from a storage point of view, while the VM would be sitting in a single location from a compute point of view. It is also not very optimal from an availability stance, as it would mean that when the intersite link is unavailable, some objects of the VM would also become inaccessible. Not a great situation. What would it look like? Well, potentially something like the below diagram!

Then there’s “None – Standard Cluster”, what happens in this case? When you use “None – Standard Cluster” with “RAID-1”, what is going to happen is that the VM is configured with FTT=1 and RAID-1, but in a stretched cluster “FTT” does not exist, and FTT automatically will become PFTT. This means that the VM is going to be mirrored across locations, and you will have SFTT=0, which means no resiliency locally. It is the same as “Dual Site Mirroring”+”No Data Redundancy”!

In summary, if you ask me, “none – standard cluster” and “none – stretched cluster” should not be used in a stretched cluster.