Episode 004 is out! This time we talk to Cody Hosterman, Director of Product Management at Pure Storage, about Virtual Volumes aka vVols! Cody shares with us the past, present, and future of vVols. I especially enjoyed his explanations around the benefits of vVols for traditional and cloud-native workload. It is also great to hear that VMware is working with Pure Storage on designing and developing a stretched cluster capability for vVols based environments. Listen below, or via Apple, Google, Spotify etc.
Storage
What is this Catalog folder on my datastore?
A question popped up on our internal slack earlier these days, and as I didn’t find anything online for it I figured I would write a quick article. When you look at your datastore, you may find various folders. Some you will recognize like the “.vSphere-HA” folder structure, which is used by vSphere HA, others you may not recognize, like the folder called “catalog” (see screenshot below), which has folders like “shard”, “mutex”, “tidy”, and “vclock” in it. The folder “catalog”, and all folders underneath, are created automatically when you use First Class Disk’s (FCD). FCD uses the folder structure to store it’s metadata in it. So please do not remove/delete or touch these folders. If you like to know more about FCD, make sure to read Cormac’s post on it.
Oh and if wonder why you are using FCD in the first place, it is often used for Kubernetes “persistent volumes”. So if you are using Tanzu/Kubernetes and have persistent volumes, chances are you are using FCD, which would result in those folders on your datastore. Nothing to worry about. 🙂
Running ESXi in “Degraded Mode”, what does that mean?
I received a question today, and I didn’t have the answer so I reached out to one of the developers. This person found this line in the ESXi documentation where it states the following, and the question was what does running ESXi in degrade mode actually means, or what is the impact?
If a local disk cannot be found, then ESXi 7.0 operates in degraded mode where certain functionality is disabled and the /scratch partition is on the RAM disk, linked to /tmp. You can reconfigure /scratch to use a separate disk or LUN. For best performance and memory optimization, do not run ESXi in degraded mode.
In other words “degrade mode” is a situation where you are running ESXi with a boot disk configuration which is undesired. In this case, the boot disk configuration (size, etc) results in the fact that /scratch is not stored on persistent media, but rather in RAM, which means that state is lost during a reboot. This could lead to various problems, hence it called degraded mode or state. Note that although you are now running in “degraded” mode, it could easily prevent you from upgrading potentially in the future.
So how do you resolve this problem? Follow the recommendations VMware provides for the ESXi configuration:
- An 8 GB USB or SD and an additional 32 GB local disk. The ESXi boot partitions reside on the USB or SD and the ESX-OSData volume resides on the local disk.
- A local disk with a minimum of 32 GB. The disk contains the boot partitions and ESX-OSData volume.
- A local disk of 142 GB or larger. The disk contains the boot partitions, ESX-OSData volume, and VMFS datastore.
Although not a requirement, I would urge to read and follow the next sections from the documentation:
- Although an 8 GB USB or SD device is sufficient for a minimal installation, you should use a larger device. The additional space is used for an expanded core dump file and the extra flash cells of a high-quality USB flash drive can prolong the life of the boot media. Use a 32 GB or larger high-quality USB flash drive.
- If you install ESXi on M.2 or other non-USB low-end flash media, delete the VMFS datastore on the device immediately after installation.
If you want to mitigate the situation after upgrading to ESXi 7.0 you can add a new local disk and enable “autoPartition=TRUE” and reboot. At reboot, the disk will be partitioned and populated for usage. The use of this advanced setting, and others which relate to ESXi 7.0, are described in this KB article here.
For those wondering, “ESXi-OSData” is the partition where we now store the content of what was previously stored in “scratch”, “core”, and “locker”. Niels wrote a deep-dive on the vSphere blog here, go check that out.
VMworld Reveals: HCI Present and Futures (#HCI2733BU)
At VMworld, various cool new technologies were previewed. In this series of articles, I will write about some of those previewed technologies. Unfortunately, I can’t cover them all as there are simply too many. This article is about HCI / vSAN futures, which was session HCI2733BU. For those who want to see the session, you can find it here. This session was presented by Srinivasan Murari and Vijay Ramachandran. Please note that this is a summary of a session which is discussing the roadmap of VMware’s HCI offering, these features may never be released, and this preview does not represent a commitment of any kind, and this feature (or it’s functionality) is subject to change. Now let’s dive into it, what is VMware planning for the future of HCI? Some of the features discussed during this session were also discussed last year, I wrote a summary here for those interested.
Vijay kicked off the session with an overview of the current state of HCI and more specifically VMware vSAN and Cloud Foundation. Some of the use cases were discussed, and it was clear that today the majority of VMware HCI solutions are running business-critical apps on top. More and more customers are looking to adopt full stack HCI as they need an end-to-end story that includes compute, networking, storage, security and business continuity for all applications running on top of it. As such VMware’s HCI solution has been focussed on lifecycle management and automation of all aspects of the SDDC. This is also the reason why VMware is currently the market leader in this space with over 20k customers and a market share of over 41%.
[Read more…] about VMworld Reveals: HCI Present and Futures (#HCI2733BU)
CTO2860BU & VIN2183BU: It is all about Persistent Memory
I was going through the list of sessions when I spotted a session Persistent Memory by Rich Brunner and Rajesh V. Quickly after that I noticed that there also was a PMEM session by the perf team available. Both CTO2860BU and VIN2183BU I would highly recommend watching. I would recommend starting with CTO2860BU though, is it gives a great introduction to what PMEM brings to the table. I scribbled down some notes, and they may appear somewhat random, considering I am covering 2 sessions in 1 article, but hopefully the main idea is clear.
I think the sub-title of the sessions make clear what PMEM is about: Storage at Memory Speed. This is what Richard talks about in CTO2860BU during the introduction. I thought this slide explained the difference pretty well, it is all about the access times:
- 10,000,000 ns – HDD
- 100,000 ns – SAS SSD
- 10,000 ns – NVMe
- 50-300 ns – PMEM
- 30-100ns – DRAM
So that is 10 million nanoseconds vs 50 to 300 nanoseconds. Just to give you an idea, that is roughly the speed difference between the space shuttle and a starfish. But that isn’t the only major benefit of persistent memory. Another huge advantage is that PMEM devices, depending on how they are used, are byte addressable. Compare this to 512KB, 8KB / 4KB reads many storage systems require. When you have to change a byte, you no longer incur that overhead.
As of vSphere 6.7, we have PMEM support. A PMEM can be accessed as a block device or as a disk, but the other option would be to access it as “PMEM”. Meaning that in the latter case we serve a virtual PMEM device to the VM and the Guest OS sees this as PMEM. What also was briefly discussed in Richard’s talk was the different types of PMEM. In general, there are 4 different types, but most commonly talked about are 2. These two are NVDIMM-N and Intel Optane. With the difference being that NVDIMM-N has DRAM memory backed by NAND, and where persistence is achieved by writing to NAND only during shutdown / power-fail. Whereas with Intel Optane there’s what Intel calls 3D XPoint Memory on the DIMM directly addressable. The other two mentioned were “DRAM backed to NVMe” and NVDIMM-P, where the first was an effort by HPe which has been discontinued and NVDIMM-P seems to be under development and is expected in 2019 roughly.
When discussing the vSphere features that support PMEM what I found most interesting was the fact that DRS is fully aware of VMs using PMEM during load balancing. It will take this in to account, and as the cost is higher for a migration of a PMEM enabled VM it will most likely select a VM backed by shared storage. Of course, when doing maintenance DRS will move the VMs with PMEM to a host which has sufficient capacity. Also, FT is fully supported.
In the second session,VIN2183BU, Praveen and Qasim discussed performance details. After a short introduction, they dive deep into performance and how you can take advantage of the technology. First they discuss the different modes in which persistent memory can be exposed to the VM/Guest OS, I am listing these out as they are useful to know.
- vPMEMDisk = exposed to guest as a regular SCSI/NVMe device, VMDKs are stored on PMEM Datastore
- vPMEM = Exposes the NVDIMM device in a “passthrough manner, guest can use it as block device or byte addressable direct access device (DAX), this is the fastest mode and most modern OS’s support this
- vPMEM-aware = This is similar to the mode above, but the difference is that the application understands how to take advantage of vPMEM
Next they discussed the various performance tests and comparisons they have done. What they have tested is various modes and compare that as well to the performance of NVMe SSD. What stood out most to me is that both the vPMEM and vPMEM-Aware mode provide great performance, up to an 8x performance increase. In the case of vPMEMDisk that is different, and that has to do with the overhead there is. Because it is presented as a block device there’s significant IO amplification which in the case of “4KB random writes” even leads to a throughput that is lower for NVMDIMM than it is for NVMe. During the session it is mentioned that both VMware as well as Intel are looking to optimize their part of the solution to solve this issue. What was most impressive though wasn’t the throughput but the latency, there was a 225x improvement measured between NVMe and vPMEM and vPMEM-Aware. Although vPMEMDisk was higher than vPMEM and vPMEM-aware, it was still significantly lower than NVMe and very consistent across reads and writes.
This was just the FIO example, this is followed by examples for various applications both scale out and scale up solutions. What I found interesting were the Redis tests, nice performance gains at a much lower latency, but more importantly, the cost will probably go down when leveraging persistent memory instead of pure DRAM.
Last but not least tests were conducted around performance during vMotion and the peformance of the vMotion process itself. In both cases using vPMEM or vPMEM-aware can be very beneficial for the application and the vMotion process.
Both great sessions, again highly recommended watching both.