I receive a lot of hits on an old article regarding aligning your VMDKs. This article doesn’t actually explain why it is important but only how to do it. The how is not actually as important in my opinion. I do however want to take the opportunity to list some of the options you have today to align your VMs VMDKs. Keep in mind that some require a license(*) or login for that matter:
- UberAlign by Nick Weaver
- mbralign by NetApp(*)
- vOptimizer by Vizioncore(*)
- GParted (Free tool, Thanks Ricky El-Qasem).
First let’s explain why alignment is important. Take a look at the following diagram:
In my opinion there is no need to discuss VMFS alignment. Everyone, and if you don’t you should!, creates their VMFS via vCenter which means it is automatically aligned and you won’t need to worry about it. However you will need to worry about the Guest OS. Take Windows 2003, by default when you install the OS your partition is misaligned. (Both Windows 7 and Windows 2008 create aligned partitions by the way.) Even when you create a new partition it will be misaligned. As you can clearly see in the diagram above every cluster will span multiple chunks. Well actually it depends. I guess that’s the next thing to discuss but first let’s show what an aligned OS partition looks like:
I would recommend everyone to read this document. Although it states at the beginning it is obsolete it still contains relevant details! And I guess the following quote from the vSphere Performance Best Practices whitepaper says it all:
The degree of improvement from alignment is highly dependent on workloads and array types. You might want to refer to the alignment recommendations from your array vendor for further information.
Now you might wonder why some vendors are more effected by misalignment than others. The reason for this is block sizes on the back end. For instance NetApp uses a 4KB block size (correct me if I am wrong). If your filesystem uses a 4KB block size (or cluster size as Microsoft calls it) as well this basically means every single IO will require the array to read or write to two blocks instead of 1 when your VMDK’s are misaligned as the diagrams clearly show.
Now when you take for instance an EMC Clariion it’s a different story. As explained in this article, which might be slightly outdated, Clariion arrays use a 64KB chunk size to write their data which means that not every Guest OS cluster is misaligned and thus EMC Clariion is less effected by misalignment. Now this doesn’t mean EMC is superior to NetApp, I don’t want to get Vaughn and Chad going again ;-), but it does mean that the impact of misalignment is different for every vendor and array/filer. Keep this in mind when migrating and / or creating your design.
Andrew Mitchell says
Great article. The other problem you can run into with a mixture of aligned and misaligned file systems within your guests is that bock level storage dedupe can not work as efficiently.
One minor point – Although VMFS volumes created via the VI Client are aligned correctly, the same cannot be said for the local VMFS volume that can be created by the ESX installer.
FYI, the windows 2008/7 installer align partition by default
Can anyone comment about HP MSA SANs’ block etc.??
I believe it’s either 64 or 128…I think 128??
We don’t have high I/O and I have always aligned Win2k3 servers to 64 with diskpart, both the system/OS and data drives…which can be done after the partition is created and before any data is written to the partition.
Thank you, Tom
Duncan– Thanks for the post. Not sure I get why you say Clariion’s are less affected, though. 64K chunk still means guest OS is misaligned across blocks, no?
(Sidebar: Linux guest LVM partition alignment is a tricky area not often discussed. Here’s a great post on this; although discussing SSD, the concept’s the same: http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/)
Vaughn Stewart says
Great discussion, mind if I add a few points?
As I don’t want to hijack you blog I’ll share the key points, but I have elaborated on this discussion on my blog for any interested: http://tinyurl.com/ykn62w3
You are absolutely spot on, alignment is critical. The lack of alignment results in an array retrieving more data than what the VM is requesting. This results in inefficiencies on the array that leads to requiring more storage hw resources to serve a workload.
Did You Know
Misalignment can be found with VMFS datastores and inside of VMs.
I’ll skip this part as your post is speaking to VMDKs (so hit my blog for more)
Regarding Misalignment within Virtual Machines, to my knowledge the only storage vendors to publish content around the importance of alignment have been EMC & NetApp. In fact, EMC has even updated their incorrect data around the lack of needing to align on NFS (kudos to Chad Sakac of EMC for getting this erroneous data corrected).
I would suggest that anyone interested in understanding more on this issue read the NetApp technical report TR-3747 (http://media.netapp.com/documents/tr-3747.pf). The content in this document was reviewed and approved by VMware, Microsoft, Citrix, and NetApp.
As for the VI3 document that Duncan referenced, I have some concerns. First it only recommends aligning the virtual disks outside of the VM’s systems drive. It reasoning listed is around a view that the systems drive does not have a high I/O requirement. While I can agree on the merits of this point I disagree on the recommendation. First aligning system drives is hard to accomplish once the system is deployed. I believe this may be closer to the actual reason for the recommendation.
Misalignment also impacts other technologies like data deduplication. As this is a NetApp only construct i’ve covered it on my blog. this is to allow us to communicate in general terms on this issue.
I believe it is fair to say the premise of misalignment impacting NetApp more than other arrays is over simplified, allow me to elaborate.
To begin, what GOS are you running? I know Windows rather well, and as such I will speak to this OS family.
First modern GOS types like Windows 7, Vista, 2008 implement GPT versus MBR and as such have a 1MB starting partition offset (versus the traditional 32,256 byte with MBR). The 1MB offset is aligned and optimized for every storage array vendor, protocol, and platform. I would like to thank Microsoft for listening to their storage partners input when they began engineering GPT.
If your VMs run Windows NT – 2003 then they most likely are misaligned due to the default starting offset of 32,256 found with MBR partitions. Also, if you upgrade a VM from one of these versions to Windows 7 or 2008 the starting partition offset will remain unchanged.
See TR-3747 for more on this point. It is also covered in Tr-3428 (VI3) & TR-3749 (vSphere).
So Why Bother to Align Existing VMs? It’s simple, consider GOS partition alignment a standard for clouds and virtual infrastructures. When you align you ensure the best performance for VMs on any storage platform, over any storage protocol whether it is an internal cloud or an external cloud provider. Isn’t one of the goals of server virtualization hardware independence?
I have discussed this in the past: http://tinyurl.com/nu9yet
What Array and Storage File System are You Storing VMs on?
A NetApp array stores data in a 4KB block whether the data is served via SAN or NAS. So if the GOS partition is misaligned then should a VM make a 4KB read request we will read 2 4KB blocks (or 8KB). Most data reads aren’t that small and as such don’t have a 100% read overhead (4KB / 8KB). Say the VM makes a 1MB read request, then we would retrieve 1MB plus an additional 4KB block. In this case the overhead on the array is less than 1%. I would ‘guesstimate’ that most non-busy VMs make requests more in the range of 32KB to 128KB, and as such the overhead for misalignment would be around 7% to 10%.
Storage arrays from other vendors store data in other block, or chunk, sizes. Say your array stores data in a 64 KB block (Maybe EMC can confirm this is the size of the storage chunk used in a Symmetrix with LUNs). In this configuration if the GOS partition is misaligned then should a VM make a 4KB read request we will read a 64KB block. As I’ve stated before most data reads aren’t that small, so let’s consider a 1MB read request. In this case the array would retrieve 1MB plus an additional 64KB block. In this case the overhead on the array is around 1%. So if we consider my premise that many non-busy VMs make requests in the 32KB to 128KB range the overhead with a misaligned 64KB block would be between 200% and 50%.
While ESX/ESXi does aggregate its read requests they are ‘decoupled’ when they hit the array and as such do actually experience this read inefficiencies.
Bottom Line, align the GOS partitions in your VMs. it will serve you the best in the long run.
Duncan thanks for raising awareness on this topic and to Chad for correcting the errors found with Celerra NFS datastores.
I cover a few additional thoughts on the bottom line regarding how to strategies and tools on my blog. http://tinyurl.com/ykn62w3
@Tom : It’s not about high I/O. It is about consolidated workloads and the effect it will have as a whole. When you are running 1000 VMs on a single array with medium IO the impact of misalignment will probably be higher than 100 VMs with high I/O.
@duncan: In our case it’s three (3) hosts and probably always less than 50 VMs later on, far less presently, we are a tiny SMB environment compared to what you probably work with daily. 🙂 I learned about alignment early on and try always to apply it. Thank you…anyone can answer about HP MSA SAN alignment block/chunk/etc.??
OT: Why yellow bricks?? The yellow brick road??
no clue about the HP. I’m not an HP storage guru… Check the about section for a hint on the name.
Poor monkeys…they must be cold!! 🙂
The HP comment is more for other blog readers.
Thank you, Tom
Duncan Epping says
Thanks Vaughn for the extensive comment! It really adds value to this discussion and my article in general. It’s a shame I feel like a storage amateur now. 🙁
Anyway, I would personally also recommend to align everything. OS + DATA. Why not if you can? Not doing it will have an impact. Even if the OS is not producing a lot of IOs in total, 500 VMs with a misaligned OS disk, will create overhead no matter what. Avoiding that overhead will mean you can run more VMs on the same device.
Did not think of the impact of the read/write pattern of the VM/App indeed. And I agree it will most likely read larger chunks than the 4KB mentioned and 1MB is probably closer for many workloads, but I also over exaggerated to get the point across.
Do the NetApp tools work in multi-vendor environments? ie: Can I use the mbrscan and mbralign on both NetApp and Clariion LUNs, even though base chunk size is different?
Duncan Epping says
Yes they do work for every vendor Matt.
Vaughn Stewart says
@Duncan – Thanks for raising the awareness of this topic and for the ping to chime in. We are working hard to demistify the ‘black box’ that is storage. I’d expect storage for the cloud to look radically different in 3 years than it does today and as such it looks like the we’ll have plenty to discuss, share, and integrate!
@Duncan – Thanks for the info.
Apparently simply a login alone for MBRALIGN isn’t enough. You can create a free account, but so far I it doesn’t appear that is enough on its own.
I hope I’m missing something, otherwise the only ‘free’ way to do it is the GParted method, which isn’t too bad unless your VM has tons of data.
Apparently this is no longer needed on HP EVA’s running certain versions of XCS code. There are a few forum posts to support this, but I can’t find anything official from HP?
Links to documents in the forums attached are no longer valid and I can’t find the EVA Best practices hide on the HP website. Can anyone confirm this either way?
Ron Davis says
I even spoke to my NetApp sales rep. Without support, they won’t share the mbralign scripts with you.
I know when they first were released they were meant to be free for all, and vendor neutral. I guess they are only vendor neutral.
Dave M says
Great post Duncan! So, how would this play out if your vCenter server is going to be virtual? You would initially have to create at least one VMFS volume through the VI client so you can get your vCenter server built. So, I’m guessing you’d want to use FDISK to create that first VMFS, get your vCenter VM up and running, and then use vCenter to create the remaining VMFS volumes. Sound about right?
To answer Tom, the default stripe size for a volume on a MSA 1000 or 1500 is dependent on the RAID level used:
RAID 0 can use stripe sizes 8, 16, 32, 64, 128, and 256 (Default: 128 KB) RAID 1 can use stripe sizes 8, 16, 32, 64, 128, and 256 (Default: 128 KB) RAID 5 can use stripe sizes 8, 16, 32, and 64 (Default: 16KB)
RAID 6 can use stripe sizes 8, 16, 32, and 64 (Default: 16 KB)
The MSA 2000 series has a concept of chunk size. The default chunk size is 64 Kbytes, but can be set to 16k or 32k at time of volume creation. The number can be adjusted to improve performance based on expected workload: large chunks are generally more effective for sequential reads while smaller may be better for random.
The stripe size in an MSA2000 is the chunk size * the number of physical data disks that make up a vdisk.
A different approach that can be taken to correct this would be to re-convert the virtual machines (V2V) with PlateSpin Migrate.
Some minimal downtime is introduced from the cutover process (shut-down/ start-up)from the old to new VMs but we’ve had good feedback from many customers who have used us for this purpose.
So with Windows 2008 I only have to align additional partitions like D: etc…?
I’m still not sold on aligning the boot/OS partition. It is just more trouble than it is worth — unless you follow the practice of dropping everything on C:
Windows 2008 will automatically align any partitions you create from Disk Manager.
Andrew Mitchell says
Not sure how it’s more trouble than it’s worth. You do it once, create a template then don’t have to worry about it again.
PS: I believe the original VMware recommendation *not* to align boot partitions was inherited from EMC who found that, on physical systems, certain combinations of operating systems and SCSI/HBA cards wouldn’t boot if using other than the default offset. This found its way into the general EMC SAN config best practice guides (but didn’t specify physical or virtual), which then found their way into VMware’s best practices for deploying on EMC storage. It then became folklore……
that’s correct Andrew, it’s even an urban myth that corruption could be caused by alignment. I’ve researched it a while back and never found any piece of evidence supporting it. (except for vague rumours on the internet)
I’m not saying *not* to do it, and I don’t think it breaks anything or should be considered a bad practice — I just question the benefits. With respect to the history, #1 was that most admins didn’t understand, or had never heard of the misalignment issue, and #2 it is much more of a pain to align the boot partition on EACH physical machine prior to installing the OS than it is to simply deploy a ‘correct’ VM template.
Since I read about this issue during an Exchange 2000 deployment, I’ve always ensured that my data volumes are aligned. I’ve never really worried about the OS. Does the OS partition really require high I/O aside from initial boot? I can see a benefit if you boot a large portion of your VMs simultaneously and frequently, as may be the case with virtual desktops or when doing large deployments into a cloud.
But, since I’d only have to do it once in my VM template, and it doesn’t hurt anything, why not do it, right? 🙂
Chad Sakac says
Disclosure – EMCer here.
There’s no harm in aligning boot. In most cases, there’s less benefit than data disks that more often are absorbing a lot of IO – aka “data disks” – but it doesn’t hurt.
It’s notable that all arrays (that I’m aware of) benefit from alignment, but that the effect of misalignment depends a great deal on architecture.
For example (and I won’t bring out non-EMC examples to avoid speaking about others like others have in the comment thread) the effect of misalignment on EMC arrays:
– as high (worst case testing) as 50% on Celerra NFS use cases (the underlying filesystem has a 8K extent size)
– as high (worst case testing) as 15% on CLARiiON and less in the Symmetrix block use cases (which is determined by stripe size, usually 64K in the CLARiiON case, can be much larger (768K or larger) effectively in the Symmetrix case.
This is largely (in the EMC case) the effective minimum IO size of the underlying architectural model, and it’s relation to the stripe and target IO sweet spot (large sequential IO or a whole whackload of small random IO). This testing data is available on powerlink if folks are interested.
It’s also affected by the write caching architecture of the array. Larger cache models enable more write coalescing (which means batching together IOs to avoid partial stripe writes and other things to minimize IOs).
I will note that Vaughn’s calculations above are incorrect, and that alignment is more critical on EMC’s NAS than it is on EMC’s SAN. I won’t presume to describe the effect on NetApp, and will let customers speak for themselves.
Long and short – alignment is always a good thing. Whether it’s a critical thing depends on your array. Follow your array vendor’s recommendation.
Andre Leibovici says
Aren’t we perhaps missing the important message here?
That alignment is important and that it can be implemented is clear and everybody agrees. The adverse affects on different Storage platforms is also to certain level understood.
Most organizations already have Windows NT,2K,2k3 and XP deployed and I doubt they will ever touch those archeological production VMs to squeeze some additional performance from the array.
Windows 2K8 reached general retail availability on February 27, 2008
Windows 7 reached general retail availability on October 22, 2009
The message should be “Unless your App does NOT support the above two Guest OS” ONLY USE Windows 2K8 and 7.
That will save us all a lot of headache in the future!
So you are going to tell an enterprise environment with 1000s of VMs to do a full migration to W2K8 or W7?
The message is: Alignment is important, factor it in!
Andre Leibovici says
You are right, alignment is important. No doubt about that.
I am not saying that organizations should do a full migration. Even because as I mentioned, IMHO they will not touch their production environments in most cases.
I meant that moving forward if possible the choice should be for Guest OS that align itself to the blocks/clusters. I believe that this should be the message to avoid performance issues on environments managed by less experienced administrators.
Lots of good stuff here. One thing to add is if you are using something like NetApp, features beyond that of the performance of the frame or dedupe are also impacted. It is a set of building blocks of technology. If the disks are not aligned, the dedupe doesn’t really fuction so well, and replication bandwith is much higher than it needs to be.
Align the disks and these issues go away.
We’ve taken to aligning our disks. There can be some issue with windows drive letters and the MBR Align. MBRAlign also has limitations which don’t allow us to use it, such as an inability to deal with LVM. For those systems, you’ll have to roll your own option B.
Ken Gould says
Going back to the original post on creating VMFS volumes from the GUI. It absolutely creates them aligned. And creating them from the command line in an aligned fashion is possible, though cumbersome, involving fdisk scripts etc before creation of the VMFS.
Personally, as some one who creates lots of VMFS partitions on test and dev systems on a daily basis, I would love to see the command line tools become capable of native alignment. Creating 50 volumes manually via GUI involves a lot of clicking, and waiting, and clicking. Scripting is so much faster and reusable in the long run, and less error prone as a result.
Isn’t that where PowerCLI, or the API for that matter, comes into play?
1.HOW to align c drive (system drive)?
2.what if i align the wrong size, any impact on the system? the performance get worse than not to align it?
3.If I have 2 hard disk drives (RAID 1) and 2 partitions (c and d drive), do i need to align them?
Mark Verhagen says
Are you certain NetApp’s MBRALIGN tool requires a license – I’ve simply extracted it from the bundle and run it raw – not sure how you would license it? Insight/confirmation please. Great article btw!
hi duncan, have you heard about Paragon’s PAT (Paragon Alignment Tool) which has just been released: https://www.paragon-software.com/business/partition-alignment/?
Jamie Pratt says
I’m still lost with Vmorz on how to do the c: drive! – (I’m fine with gparted on linux though!?) … i guess you just create the partition in windows setup, dump out and use a third-party util? … another, final question – say i align my vms at 32k (or whatever) for yadda-yadda vendor’s array… if we moved to a different array/vendor, would things be misaligned or otherwise degraded/changed unless we did some sort of lun migrate at the array level to compensate? hrm.. perhaps i need to go on back to EMC’s “STF 101” course, eh Chad? LOL… 😉 – Great work as always Duncan, and to the rest of you as well – thanks for keeping us lowly customers in the loop! 🙂
so it’s safe to say that in the aspect of alignment that the more VM guests that share a disk within a larger raid group can have detrimental impact to any other of the LUN’s within that raid group?
Is it necessary to take down all the VMs to run the MBRScan as I have read? Is there a way to align as you go or prevent misalignment in the first place? We just rolled out a brand new 3170 clustered environment and we are being told by support it is out of alignment. Our read latency is off the chart, but we have 100s of VMs already on this SAN. Is there a workaround or better tool?
Duncan Epping says
another tool would be http://vizioncore.com/product/voptimizer-pro
Domenico Viggiani says
What about Linux filesystems and volume managers (ext3, ext4, LVM)?
Is Red Hat 6 “aligned” by default as 2008/7?
Great Post. But I have a question, just to understand the requirement of the VMFS/VMDK alignment.
By default, if VMFS created using vCenter, it will give you an options to change the block size. Default block size is 1 MB. Does it mean that in any VMFS formatted with 1 MB block size, maximum vmdk can hold is 256 GB?
What if an admin created all the VMFS with the default block size which is 1 MB and on that VMFS he stored 400 and 800 GB of VMDK. Does those VMDKs needs to be moved to another vmfs datastor formatted with higher block size, such is 4 MB block size?
If you create a VMFS volume with a 1MB blocksize you will NOT be able to store files larger than 256GB so the scenario you describe can never occur.
So, it’s recommended to create the vmfs block size based on the vmdk size correct?
Or just use a large one so you will never hit this issue….
Or just use a large one so you will never hit this issue…
You mean large VMFS block size?
Yes largest block size out there will prevent this from happening
Thanks a lot Duncan, appreciate your answers. 🙂
We are migrating to a new storage array and we’ll need to provision around 100 new LUNs. We have it scripted with PowerCLI but the question of VMFS alignment has come up.
Does anyone have a like to a Doc or a script that explains how to add new datastores in bulk and achieve the proper alignment?
Nicholas Weaver says
Dropping a comment here but I just released a free tool for handling this: http://nickapedia.com/2011/11/03/straighten-up-with-a-new-uber-tool-presenting-uberalign/
Great post as always Duncan 🙂
Matt Mancini says
Lots of great comments, however one misconception is Windows 2008 is aligned by default. This is not always the case. From Microsoft’s site….
“Windows Server 2008 and Windows Vista: New Partitions
In Windows Vista as well as Windows Server 2008, partition alignment is usually performed by default. The default for disks larger than 4 GB is 1 MB; the setting is configurable and is found in the registry at the following location:
However, if OEM setups are delivered (for example, with recovery partitions), even fresh installations of Windows Server 2008 having partitions with undesirable partition starting offsets have been observed.
Whatever the operating system, confirm that new partitions are properly aligned.”
I would also like to add, check you alignment ALL vm’s regularly. Just because you checked your alignment years ago doesn’t mean its still aligned. It just make good sense to ensure your optimal.
More info go here – http://vmexplorer.com/2012/01/24/never-assume-windows-2008-is-aligned-out-of-the-box/
Michael Potter says
It seems a lot of people are using Windows ‘Volume’ and ‘Partition’ interchangeably. The following Microsoft article suggests that this isn’t accurate:
My question is, does anyone know how to check volume alignment on a dynamic disk?
At the end of the article (I linked above) it describes that you can find volume alignment with ‘DMDIAG’ (copied from a Windows 2003 box) and ‘relative sectors’. But after running ‘DMDIAG’ I can not for the life of me figure out what exactly that means or how to get to the conclusion outlined in the article.
I for one will not just trust that Microsoft did it correctly. I want proof. I would greatly appreciate it if anyone can help me.
Michael Potter says
So I called Microsoft and after a couple of days of working it out with the tech I have my answer.
The reason I wasn’t able to find the relative sectors on my DMDIAG output was because I manually created the striped volume with DISKPART.
If you want to be able to verify the volume alignment with DMDIAG you must create the dynamic disk with the windows built-in disk manager tool (diskmgmt.msc) I don’t know why it didn’t report on a DISKPART created volume it just didn’t (Microsoft feature.. Perhaps)
Once my volume was created with diskmgmt.msc I was able to run DMDIAG and confirm that all of my striped volumes created on dynamic disks were aligned to 2048. All tools that look at partition offset will report that the disk is not aligned. This is due to Microsoft moving the partition alignment to 32256.
Now all I have to worry about is application alignment and allocation unit sizing.
NetApp Robot says
The right url is http://media.netapp.com/documents/tr-3747.pdf
Your link is broken.
The digrams no longer show up for this article. I’ve tried viewing on Win7 in IE8 and also on ipad….
In our environment, we have a Vm named abc and ithas 3 vmdks, naming abc-flat.vmdk, disk1.vmdk and disk2.vmdk.
While performing mbralign for the disk1 and disk2 i am getting the below message and not able to run the alignment.Please help me out.
The same vmdk was found in multiple vmx files.
Please give explicit path to vmdk file location
Scott Peter says
Thanks for excellent article and great comments on virtualization using vmware