Why the world needs Software Defined Storage

Yesterday I was at a Software Defined Datacenter event organized by IBM and VMware. The famous Cormac Hogan presented on Software Defined Storage and I very much enjoyed hearing about the VMware vision and of course Cormac’s take on this. Coincidentally, last week I read this article by long-time community guru Jason Boche on VAAI and number of VMs, and after a discussion with a customer yesterday (at the event) about their operational procedures for provisioning new workloads I figured it was time to write down my thoughts.

I have seen many different definitions so far for Software Defined Storage and I guess there is a source of truth in all of them. Before I explain what it means to me, let me describe commonly faced challenges people have today.

In a lot of environments managing storage and associated workloads is a tedious task. It is not uncommon to see large spreadsheets with a long list of LUNs, IDs, Capabilities, Groupings and whatever more is relevant to them and their workloads. These spreadsheets are typically used to decide where to place a virtual machine or virtual disk. Based on the requirements of the application a specific destination will be selected. On top of that, a selection will need to be made based on currently available disk space of a datastore and of course the current IO load. You do not want to randomly place your virtual machine and find out two days later that you are running out of disk space… Well, that is if you have a relatively mature provisioning process. Of course it is also not uncommon to just pick a random datastore and hope for the best.

To be honest, I can understand many people randomly provision virtual machines. Keeping track of virtual disks, datastores, performance, disk space and other characteristics… it is simply too much and boring. Didn’t we invent computer systems to do these repeatable boring tasks for us? That leads us to the question where and how Software Defined Storage should help you?

A common theme recurring in many “Software Defined” solutions presented by VMware is:

Abstract, Pool, Automate.

This also applies to Software Defined Storage in my opinion. These are three basic requirements that a Software Defined Storage solution should meet. But what does this mean and how does it help you? Let me try to make some sense out of that nice three word marketing slogan:

Software Defined Storage should enable you to provision workloads to a pool of virtualized physical resources based on service level agreements (defined in a policy) in an automated fashion.

I understand that is a mouth full, so lets elaborate a bit more. Think about the challenges I described above… or what Jason described with regards to “VMs per Volume” and how there are various different components that can impact your service level. A Software Defined Storage (SDS) solution should be able to intelligently place virtual disks (virtual machines / vApps) based on selected policy for the object (virtual disk / machine / appliance). These policies typically contain characteristics of the provided service level. On top of that a Software Defined Storage solution should take risks / constraints in to account. Meaning that you don’t want your workload to be deployed to a volume which is running out of disk space for instance.

What about those characteristics, what are those? Characteristics could be anything, just two simple examples to make it a bit more obvious:

  • Does your application require recover-ability after a disaster? –> SDS selects destination which is replicated, or instructs storage system to create replicated object for the VM
  • Does your application require a certain level of performance? –> SDS selects destination that can provide this performance, or instructs storage system to reserve storage resources for the VM

Now this all sounds a bit vague, but I am purposely trying to avoid using product or feature names. Software Defined Storage is not about a particular feature, product or storage system. Although I dropped the word policy, note that enabling Profile Driven Storage within vCenter Server does not provide you a Software Defined Storage solution. It shouldn’t matter either (to a certain extent) if you are using EMC, NetApp, Nimbus, a VMware software solution or any of the other thousands of different storage systems out there. Any of those systems, or even a combination of them, should work in the software defined world. To be clear, in my opinion (today) there isn’t such a thing as a Software Defined Storage product, it is a strategy. It is a way of operating that particular part of your datacenter.

To be fair, there is a huge difference between various solutions. There are products and features out there that will enable you to build a solution like this and transform the way you manage your storage and provision new workloads. Products and features that will allow you to create a flexible offering. VMware has been and is working hard to be a part of this space, vSphere Replication / Storage DRS / Storage IO Control / Virsto / Profile Driven Storage are part of the “now”, but just the beginning… Virtual Volumes, Virtual Flash and Distributed Storage have all been previewed at VMworld and are potentially what is next. Who knows what else is in the pipeline or what other vendors are working on.

If you ask me, there are exciting times ahead. Software Defined Storage is a big part of the Software Defined Data Center story and you can bet this will change datacenter architecture and operations.

** There are two excellent articles on this topic the first by Bill Earl, and the second by Christos Karamanolis, make sure to read their perspective. **

Write-Same vs XCopy when using Storage vMotion

I had a question last week about Storage vMotion and when Write-same vs XCopy was used. I was confident I knew the answer, but I figured I would do some testing. So what was the question exactly and the scenario I tested?

Imagine you have a virtual machine with a “lazy zero thick disk” and an “eager zero thick” disk. When initiating a Storage vMotion while preserving the disk format, would the pre-initialized blocks in the “eager zero thick” disk be copied through XCopy or would “write-same” (aka zero out) be used?

So that is what I tested. I created this virtual machine with two disks of which one being thick and about half filled and the other “eager zero thick”. I did a Storage vMotion to a different datastore (same format as source) and checked esxtop while the migration was on going:

CLONE_WR = 21943
ZERO = 2

In other words, when preserving the disk format the “XCopy” command (CLONE_WR) is issued by the hypervisor. The reason for this is when doing a SvMotion and keeping the disk formats the same the copy command is initiated for a chunk but the hypervisor doesn’t read the block before the command is initiated to the array. Hence the reason the hypervisor doesn’t know these are “zero” blocks in the “eager zero thick” disk and goes through the process of copy offload to the array.

Of course it would interesting to see what happens if I tell during the migration that all disks will need to become “eager zero thick”, remember one of the disks was “lazy zero thick”:

CLONE_WR = 21928
ZERO = 35247

It is clear that in this case it does zero out the blocks (ZERO). As there is a range of blocks which aren’t used by the virtual machine yet the hypervisor ensures these blocks are zeroed so that they can be used immediately when the virtual machine wants to… as that is what the admin requested “eager zero thick” aka pre-zeroed.

For those who want to play around with this, check esxtop and then the VAAI stats. I described how-to in this article.

How to disable Datastore Heartbeating

I have had this question multiple times now, how do I disable datastore heartbeating? Personally, I don’t know why you would ever want to do this… but as multiple people have asked I figured I would write it down. There is no “disable” button unfortunately, but there is a work-around. Below are the steps you need to take to disable datastore heartbeating.

vSphere Client:

  • Right Cluster object
  • Click “Edit Settings”
  • Click “Datastore Heartbeating”
  • Click “Select only from my preferred datastores”
  • Do not select any datastores

Web Client:

  • Click “Cluster object”
  • Click “Manage” tab
  • Click “vSphere HA”
  • Click “Edit button” on the right side
  • Click “Datastore Heartbeating”
  • Click “Select only from my preferred datastores”
  • Do not select any datastores

It is as simple as that… However, let me stress that this is not something that I would recommend doing. Only when you are troubleshooting and need it disabled for whatever reason, please make sure to enable it when you are done.

Vote for the top virtualization Blogs / 2012 looking back!

Yes, it is that time of the year again… vSphere-land.com’s voting for the top virtualization blogs has started again. Of course I am hoping to end up somewhere at the top of the list again, but I realize like no one else that this is not a given. The competition once again is huge, there are a couple of new-comers which has published some outstanding work. Personally I am a huge fan of Cormac Hogan’s work and I hope he will make it in to the top 10 this year, I sure as hell voted for him! Of course I expect my friends like Frank Denneman, Alan Renouf, Massimo Referre and William Lam to also hit the top 10.

I am hoping each of you will select the top-10 blogs again based on quality, relevancy, longevity and frequency. (I personally find length of the article irrelevant, content is King!) I always use the yearly voting to look back at what happened the last 12 months. What happened in 2012, what has kept me busy?

For me 2012 started with Partner Exchange. Presented a 2-hr workshop on how to design a Cloud Infrastructure, together with Dave Hill (he is on the list as well, so vote for him :-)) and guests Frank Denneman and Chris Colotti. On top of that Chris Colotti and I presented a DR solution for vCloud Director based environments. This is something that people had been waiting on for a long time. I came up with the process/concept for this and also published a whitepaper on this topic. The same concept can be used for VMware View environments by the way, white paper out soon! I published a whole bunch of other white papers (1, 2) this year as part of my Tech Marketing responsibilities, of which the vMSC Best Practices paper is probably the most read and best received. I managed to get a couple of sessions approved at VMworld and had a blast presenting with my buddy Lee Dilworth. Also being a VMworld “Expert” was once again an awesome experience, I especially enjoyed the group discussions. I also presented as a couple of VMUGs (belgium, ireland, uk) and last but not least, published the vSphere 5.1 Clustering Deepdive… which happened to be the most sold book at VMworld San Francisco.

Along the way I managed to crank out an article or two hundred, and my blog went down during the vSphere 5.1 launch due to the massive amount of traffic. I can tell you that my hosting company was surprised as they thought it was a DDOS attack, but then they figured out it was just a massive amount of people hitting my site on the same day. I guess thanks for that :-)

I did want to list my 10 top articles over the last 12 months in no particular order:

Thanks again to Eric Siebert who spends a MASSIVE amount of time going through the voting, filtering out discrepancies and making sure it all is done in a fair manner! Make sure to bookmark his website, add it to your RSS reader and follow him on twitter. So what are you waiting for, head on over and take the survey!

vSphere HA 5.x restart attempt timing

I wrote about how vSphere HA 5.x restart attempt timing works a long time ago but there appears still to be some confusion about this. I figured I would clarify this a bit more, I don’t think I can make it more simple than this:

  • Initial restart attempt
  • If the initial attempt failed, a restart will be retried after 2 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 4 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 8 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 16 minutes of the previous attempt

After the fifth failed attempt the cycle ends. Well that is, unless a new master host is selected (for whatever reason) between the first and the fifth attempt. In that case, we start counting again. Meaning that if a new master is selected after attempt 3, the new master will start with the “initial restart attempt.

Or as Frank Denneman would say:

vSphere HA 5.x restart attempt timing