I was asked about this a couple of times over the last few days so I figured it might be an interesting topic. This is described in our book as well in the Datastore Cluster chapter but I decided to rewrite it and add some of it into a table to make it easier to digest. Lets start of with the table and explain why/where/what… Keep in mind that this is my opinion and not necessarily the best practice or recommendation of your storage vendor. When you implement Storage DRS make sure to validate this against their recommendations. I have marked the area where I feel caution needs to be taken with (*).
Capability | Mode | Space | I/O Metric |
Thin Provisioning | Manual | Yes (*) | Yes |
Deduplication | Manual | Yes (*) | Yes |
Replication | Manual (*) | Yes | Yes |
Auto-tiering | Manual | Yes | No (*) |
Yes you are reading that correctly, Storage DRS enabled with all of them and even with I/O metric enabled except for auto-tiering. Now although I said “Manual” for all of them I even believe that in some of these cases Fully Automated mode would be perfectly fine. Now as it will of course depend on the environment I would suggest to start out in Manual mode if any of these 4 storage capabilities are used to see what the impact is after applying a recommendation.
First of all “Manual Mode”… What is it? Manual Mode basically means that Storage DRS will make recommendations when the configured thresholds for latency or space utilization has been exceeded. It also will provide recommendations for placement during the provisioning process of a virtual machine or a virtual disk. In other words, when setting Storage DRS to manual you will still benefit from it as it will monitor your environment for you and based on that recommend where to place or migrate virtual disks to.
In the case of Thin Provisioning I would like to expand. I would recommend before migrating virtual machines that the “dead space” that will be left behind on the source datastore after the migration can be reclaimed by the use of the unmap primitive as part of VAAI.
Deduplication is a difficult one. The question is, will the “deduplication” process be as efficient after the migration as it was before the migration. Will it be able to deduplicate the same amount of data? There is always a chance that this is not the case… But than again, do you really care all that much about it when you are running out of disk space on your datastore or are exceeding your latency threshold? Those are very valid reasons to move a virtual disk as both can lead to degradation of service.
In an environment where replication is used care should be taken when balancing recommendations are applied. The reason for this being that the full virtual disk that is migrated will need to be replicated after the migration. This temporarily leads to an “unprotected state” and as such it is recommended to only migrate virtual disks which are protected during scheduled maintenance windows.
Auto-tiering arrays have been a hot debate lately. Not many seem to agree with my stance but up til today no one has managed to give me a great argument or explain to me exactly why I would not want to enable Storage DRS on auto-tiering solutions. Yes I fully understand that when I move a virtual machine from datastore A to datastore B the virtual machine will more than likely end up on relatively slow storage and the auto-tiering solution will need to optimize the placement again. However when you are running out of diskspace what would you prefer, down time or a temporary slow down? In the case of “I/O” balancing this is different and in a follow up post I will explain why this is not supported.
** This article is based on vSphere 5.0 information **
cwjking says
Well, given that even in my environment today we dont use DRS set to “fully automated” I dont suspect we will do the same. Currently, we are evaluating our current licensing cost – the big topic of the week. I like the fact that now I wouldn’t have to look at datastores and figure out where to place a VM or Migrate one too. Hell that saves me alot of time. SLA driven would be great but we don’t tier or storage or offer different SLA’s to hosted customers. Yes, we are very old school ;).
Duncan Epping says
Really not even set to “fully automated”. Time to welcome 2011 🙂
cwjking says
Yeah, tell me about it. I believe in the functionality of vSphere fully. It’s mostly old school methodology for previous things that were impacting so they just assumed better to not have issues so leave it off. In vSphere 4 and onward there have been dramatic improvements to DRS. ALSO, like you have stated repeatedly… people have a misconception of “H/A and DRS”.
Duco Jaspars says
Duncan, problem with Auto tiering as far as I understood is caused due to non consistent relationship between throughput and latency on some auto tiered arrays.
Especialy with sub LUN tiering, this relationship is not constant and since some blocks of a LUN might be on SSD and others on SATA, so no consistent measurements can be made which could lead to bad recommendations or actions.
Saw a nice table in a document which I will forward to you.
Duncan Epping says
I know what you are referring to, let me slightly rephrase this article and explain in another post where this is coming from Duco. Thanks.
craig says
the only concerns and issue you may gonna face are the large data volume virtual machine. Issit practical to let the system manage to keep moving them around different volume or lun, I think it still need time to prove this point.
Duncan Epping says
It is not like Storage DRS will just keep on moving things around, that is not how it works… There is a risk/benefit analysis as part of Storage DRS. It is only invoked once every 8 hrs and on top of that the move will only be recommended when there is a benefit for at least 24 hrs. Even then the default setting is “manual”, it is you as the admin deciding if you want to move it or not!
Louw Pretorius says
Is there a window-period within which Storage DRS monitors latency/capacity before taking action to prevent things like continuous storage movement (storage ping-pong)
Louw Pretorius says
OK, found my answer.
“I/O load is evaluated by default every 8 hours currently with a default latency threshold of 15ms.”
“If the benefit doesn’t last for at least 24 hours, Storage DRS will not make the recommendation.”
from : http://www.ntpro.nl/blog/archives/1787-vSphere-5-Video-Storage-DRS.html
David says
IoLoadBalancingAlwaysUseCurrent
Always use current stats for IO load balancing