I’ve been conducting VCDX Defense Interviews for a while now. Last week in Las Vegas during PEX something struck me and I guess this post by Frank Denneman is a good example…
On a regular basis I come across NFS based environments where the decision is made to store the virtual machine swap files on local VMFS datastores. Using host-local swap can affect DRS load balancing and HA failover in certain situations. So when designing an environment using host-local swap, some areas must be focused on to guarantee HA and DRS functionality.
Every decision you make has an impact on your design/environment. What does a decision exactly impact? In most cases every decision impacts the following:
- Cost
- Availability
- Performance
In the example Frank wrote about (see quote) a decision which clearly had an impact on all three. Although at the time it might have been a best practice the decision to go along with this best practice still had an impact on the environment. Because it was a best practice this impact might not have been as obvious. But when listed as follows I hope you understand why I am writing this article:
- Costs – Reduced costs by moving the .vswp file to local disks.
- Performance – VMotion performance is effected because .vswp files need to be copied from HOST-A to HOST-B.
- Availability – Possibly less availability when the amount of free disk space on local VMFS isn’t sufficient to restart VMs in case of disaster.
As you can see a simple decision has a major impact, even though it might be a best practice you will need to think about the possible impact it has and if this best practice fits your environment and meets your (customer) requirements. Another great example would for instance be LUN sizing. So what if I would randomly pick a LUN size. Lets say 1TB:
- Cost – As the average VM size is 35 GB, I want a max of 20VMs on a datastore and I need 20% of overhead for vswp files and snapshots I end up with max usage of 840GB. Added overhead: 160GB!
- Availability – Although the availability of the datastore will be unaffected the uptime of your environment might change. When a single datastore fails you will lose 1TB worth of data. Not only will you lose more VMs, restoring will also take longer.
- Performance – Normally I would restrict the LUN size to reduce the amount of VMs on a single datastore. More VMs on a datastore means more higher possibility of SCSI reservation conflicts.
The VCDX certification is not about knowing all the technical details, of course it is an essential part of it, it’s about understanding the impact of a decision. It’s about justifying your decision based on the impact it has on the environment/design. Know the pros / cons. Even if it is a best practice it might not necessarily apply to your situation.
Richard says
Funny, Frank has blogged about swap files, now you elaborate more on this issue… It’s funny because I’ve submitted a technical design within the company I’m working for, and somebody added in the review of the design the possibility of hosting swap files locally, without any good arguments. Well, I’ve got some good arguments now to do it or not… 😉
Kory M. says
Ironic post, Duncan, thank you. I’m currently in the process of evaluating our storage here at the independent school I work at. Decisions are being made to decide what level RAID to apply, how many number of spindles, how fast the drives should be, the sizes of the drives themselves, etc. I’ve done a bit of research relating to where to physically store the non-vmdk files, i.e. the .vswp files, etc. Taking all that into consideration and using valuable resources such as your blog, Gabe’s blog, the KB, the community forums, etc. have greatly helped my decision making process.
I’ve established a “config” LUN for generic file storage and the configuration files on slow 7.5k rpm drives, leaving our faster 15k rpm drives for “high performance” storage. Unfortunately, this means everytime I need to do a SvMotion, I have perform a cold migration since I need to “break” the VM and move its vmdk files to another LUN. Fortunately, this doesn’t happen very often and obviously doesn’t affect DRS or vMotion purposes. It may not be ideal for all applications, of course, but it seems to be working for us, especially since our storage is so limited.
Thanks for the continued great insight and info, Duncan. I’m learning a lot from you and your site!
daniel says
NFS thin provisioning and/or dedup/compression on the block level seems like a sure winner here if storing swap files in a SAN seems pricy, my temp datastore is 248K in size and hosting 55 VM swapfiles currently.
Frank Brix Pedersen says
Performance: When running swap on local disks and your VM’s are actively swapping you will actually get worse performance than running on “fast” FC spindles. Thats an argument I havent seen here. What about memory reservations? You could save a lot of swap space by giving your vm’s reserved memory.
What would be really cool would be to have local SSD disks for swap storage. Then you could actually use your swap without the same performance penalty as we see today.
Duncan says
If you use reservations and have Strict Admission Control for HA you will end up with a really conservative amount of slots on your cluster and thus have higher TCO and lower ROI.
Doug says
Excellent posting, Duncan! Everything has a cost. What are you willing to trade for what you want/need? 🙂