As part of my role I very often review design documents that other consultants/architect have written, and not only those of VMware employees but also from external people. On top of that of course I also see a lot of VCDX application packages pass by. Something struck me the other day when I was doing the 3rd review in just a couple of hours and I started thinking about the designs I had reviewed so far and noticed there was a common theme.
Before I get started I want to make sure everyone understands that I believe there’s a very strong value to using standardized templates / frameworks. So don’t misinterpreted this article.
I know that many of you are consultants/architects and leverage the Plan & Design kit that VMware PSO created or have an internally developed template that might or might not be based on this P&D Kit. (If your a VMware Partners and wonder what this kit is, log in to the partner portal and look around!) The Plan & Design kit is basically a template, although the hot word these days is framework, that lays out the foundation for a vSphere 4.x design. I guess “framework” or “template” already reveals how it should be used but lately I have been noticing, and yes even VCDX submissions, that people are trying to cut corners and skip sections or use the defaults. I guess by now most of you are thinking “well that doesn’t apply to me”, but lets be honest here when you use the same template for years you start to get lazy. I know I do.
While there is absolutely nothing wrong with using the template and adopting the best practices mentioned in this template, this only goes when they are used in the right context. The framework that VMware for instance provides contains many examples of how you could implement something, and the ones provided are usually the best practice. That doesn’t necessarily mean though that this best practice meets your customer’s requirement or can be used based on the constraints this environment/customer has. Just to give an example of something that I see in 90% of the designs I review:
- Max amount of VMs per datastore 15
- Datastore size 500GB
- Justification: To reduce SCSI reservations
This used to be a best practice and probably a very valid design decision in most cases. However over the last 3 version the locking mechanism has been severely improved. On top of that even more recently VAAI was introduced and the risks were reduced because of that. Along the way the number 15 got bumped up to 20-25, depending on the workload and the RTO. Based on those technology changes your best practice and template should have been updated, or at a minimum explain what the “new” reason is for sticking with these values.
Every single time you write that new design challenge your decisions, go over these best practices and make sure they still apply. Every time a new version of the product is released validate the best practices and standardized design decisions and change them accordingly to benefit from these features.
Matt Liebowitz says
Great points Duncan. I work for a partner and use the document you’re talking about and know exactly what you mean. For the most part we try to fill in every section of that document and/or take out sections that are not relevant to the design. We also often have to do a search/replace from “ESX” to ESXi” throughout the document. 🙂
One other point – if you used a website/document/KB article as a reference for your design, make sure you put it in the appendix at the end. The default references in the appendix are good, but more than likely you used more than that. Give credit where credit is due.
Matt
Chris Kranz says
I totally agree. I used the Plan & Design kit as the basis of my application, but as I started to get into the detail, I realised that actually there was a lot more content that needed including ontop of the standard template. Additionally some aspects simply didn’t fit my design as the template makes universal assumptions to create a “one size fits all” type template. This really isn’t good enough for a VCDX application where it needs to be very detailed around the solution you are putting together.
I’d say the Plan & Design kit is a great template to get you started, and it really helped as I was stuck on where to start, but a template is meant to be customised, and with the VCDX there are no cutting corners. I still spent 100+ hours on my VCDX application and that was still using templates and previous docs and diagrams I had produced.
Considering the time and expense of this exercise, there really should be minimal corner cutting.
Richard Diphoorn says
Duncan, good post and in my opinion there sure is a need for an template, but often this limits the “artistic” side of creating an design.
I think every design is different for each customer (certainly there are more common components in a design) because every customer is unique in my opinion. Therefore, I think a “framework/methodology/template” should be used, but only to push somebody in the right mindset for creating a design which applies to a customer specific situation.
I mean, if anybody could just create a design from a template, why bother to hire VMware PSO consultants then? 😉
Vladan says
Me too I use this template since working for a VMware Partner. I found it good and exhaustive, but I always add my personal touch and some additional links to the KB articles and also links to vladan.fr if relevant articles on my blog apply to the design.
I don’t know how you guys (in Holland) but I had to translate into French, since my customers’s English is mostly very poor….
But good template to start with…
John says
EMC recommends making pools of storage for VMFS datastores set to the 2TB (-512b) max.
What’s a safe number of VMs doing that with VAAI?
Doug says
In a perfect world, we just fill in some blanks and automation spits out a set of nice documents. We’re not there yet — unless you consider the ‘automation’ to be us. 🙂 I’m working with the HealthAnalyzer right now and that is a great tool for baseline data collection, review, and aggregation. I have issues with the report it spits out, but I deal with it because the tool saves me a lot of time getting to that point.
For the P&D, I appreciate the documentation templates for much the same reason: they save a lot of work getting started. From there, we have to ask the right questions and evaluate the answers against relevant and current ‘best practices’ to determine what should be done for the actual design.
That said, one of the challenges most of us face is finding out when key components change and how that change impacts a design. For example, I keep up on release notes and read a lot, but citing Duncan’s example here, it is almost impossible for me to determine that VMFS improvements or VAAI changes the recommended VM:Datastore ratio (specifically) from 15:1 to 20-25:1. I would assume that there would be SOME improvement, but quantifying that is not a simple task for those who don’t know what exact changes were made.
The big takeaway from my perspective is that technology changes constantly and documentation/design ‘frameworks’ can’t be expected to remain up to date — especially sufficiently complete ones. It takes a lot of work to get those things developed and at least the same amount of work to review/update them continuously. To Duncan’s point, they’re a good starting point (much better than starting from scratch each time!), but should not be considered a complete solution by themselves.
CianoKuraz says
Best practices are like the Theory of Relativity…”it depends” 🙂
Matt M aka vmexplorer says
I agree that best practices are a good starting point.. However it looks like the real question should be “What is your process to improve your process?” Sounds funny but it’s true… Things aren’t static and your “framework” is no exception… Just – Write It, Use It, Review It, Improve It, and Repeat…
Jason Boche says
I was fortunate (or unfortunate) enough to not have access to the standard VMware design documents. As a result, my design documentation was written from scratch which is probably somewhat of a headache for the VCDX panel and a risk to the candidate.
Russell says
I’ve actually completely chucked the VMware P&D kit because I don’t think it scales beyond helping a QA department understand the best approach to rolling out VMware.
Last year at PEX during the PTAB, I made a big point about how the current kit as it stands promotes a lot of “Find/Replace” design and doesn’t prompt the consultant at any point in time to do actual requirements gathering or technology research (your case in point is VAAI.)
I could go on and on with this topic (I’ve spent several hours in heated debates with VMware PSO consultants and engagement managers over what this should be) forever though. I think it is definitely something that needs to be addressed when we start talking about taking people to the cloud (public, private, or a hybrid approach.)
Travis Wood says
I have reviewed a lot of designs done using the P&D kit. I often comment on the standard design decisions asking for justifications why this option was chosen when I know it was from the template. Also since 4.1 every design that’s come through has not included any reference to NIOC, SIOC and VAAI – so I have always sent back asking why these are not going to be used.
Brad Payne says
Reading this post piqued about this template – but I don’t see it in the partner portal. Is it available to all TAP partners?
Chris Colotti says
At the end of the day it is about justification as well as tying your requirements, constraints, and assumptions to the design. I cannot count how many times I have seen VCDX applications where the design decision was “To use DRS or not to use DRS”. Although this is a valid question let’s be realistic…..give me a darn good reason why you would not and make that a constraint of the design which would be an interesting talking point. However if it is just because it is a best practice, then that is not very interesting.
Maybe the constraint by the customer is their CMDB requires that the application/VM be tied to a Hostname not a cluster therefore DRS needs to be disabled. THAT is a good example of a constraint adding a requirement to the DRS design because it directly affects DRS OR having DRS in fully automated and the type of real considerations an Architect needs to address. That also affects the operational procedures for host downtime and maintenance.
Another point to make is your design does NOT have to be free of mistakes. Mine was not in my defense, but you have to understand the flaws in your design and properly address them.
Using the toolkit as a guide is completely valid, but be creative with your design and make it something interesting to talk about. If it is cookie cutter template it’s just going to be boring frankly.
Doug Youd says
Hi Duncan,
This is a particularly timely comment for me. In my environment we have just recently engaged VMware PSO for a design engagement. The thought behind it was we could elevate our internal staff’s understanding of the virtualization platform by getting in external help on the design… and who better than vmware themselves to do the work.
What we got back was just the PSO template, with a few values changed. The quality was so bad that in some cases it actually contradicted itself. We’re dealing with the issue, but its still frustrating that I now have to fight to get the design fleshed out.
Pretty disappointing to hear that this level of laziness also extends to VCDX applicants.
-Doug
Richard Powers says
Yes very a very timely post. I’m challenging the 500 GB standard LUN that EMC says is optimal for my VMware environment on a new v-MAX we are putting into place. But I do have an HDS 9990 that I wouldn’t put more that 16 VMs on a single LUN but I still have a mix of 1.2 TB LUNs for large VMs and 500 GB LUNs for smaller VMs. For me it depends on the known limitations I have to work with at my company.
-Rich
Alex Tanner says
Hi Richard,
I have this conversation almost daily with our customers.
Prior to v4 and certainly to 4.1 with VAAI – EMC did have some fairly clear cut best practices around Datastore size.
These practices were outlined in our various platform techbooks – a great resources if you have not read them.
The guide value was based around a view that yes you could grow the Datastore to 2TB (minus a bit) but that the temptation for a lot of VMware Admins, who are not necessarily storage folks, would be to try and fit as many VMs on that Datastore as possible and this could create issues due to some of the design elements of VMFS that can manifest themselves in the extreme – i.e the serial queueing of I/O and the Metadata locking (as Duncan says – hugely improved in 4.0 and 4.1 with VAAI)
The other big change was the move from placing Tier2/3 ‘craplications’ en masse in a big datastore in the past to in recent times moving towards a belief that this approach could also be used with demanding Tier 1 applications
So as ESX hosts have been able to drive more I/O – rather than telling customers – look if you bulid a datastore of 1TB+ and shove 20-30 heavy storage I/O VMs on it – it will probably fall over (as that 1TB Datastore coulds be based on just 5 x R5 600GB disks – we have suggested that (in the past at least) we keep the Datastore size down to limit the temptation for VMadmins to dump large number of performant VMs on a badly designed Datastore backend architecture, being accessed by increasingly large numbers of ESX hosts – exacerbating some of those architectural characteristics of VMFS mentioned above
Clearly if the customer and the Solution Architect understand the I/O profiles of the VMs, bulid in some headroom (deploy new technologies like extended Cache and Fully Automated Storage tiering) the risks can be alleviated and bigger Datastores proposed
If you listen to the Gospel (from an EMC perspective at least – Chad Sakac – he says go big [correct design principles adhered to of course)
The other thing to bear in mind is that best practices are just that – a best practice (not a “do it this way or we will not support your environment)and those less fluent in the ways of storage and VMware will often cling to those prescriptions as they avoid trouble – I know it made my life easier when I didn’t have to design only prescribe a storage outcome
Best Practices also tend to linger on – especially in the high end around VMAX where getting it wrong used to be a lot more work to correct than the mid-tier – that is now changing and I agree with you, folks need to be brought up to date as to what is possible and sensible
I would direct you and your contacts at EMC to Chad’s blog
http://virtualgeek.typepad.com/virtual_geek/2009/03/vmfs-best-practices-and-counter-fud.html
At least within the vSpecialist team we embrace the view that, with the correct design priciples for storage observed, large datastores are possible and suggested
Many thanks
Alex Tanner
Rick Boyett says
At the risk of sounding dumb, can anyone point me in the direction of the kit? I have looked all over Partner Central with no luck. I did a search under Content and found nothing.
Thanks in advance
David E. Moss says
The location for this material is on the Partner web site (salesforce). Click on the Content tab up top, and then search for:
Plan and Design for vSphere 4 Services Kit R1.zip
Cheers!
Evan says
Thanks David.
I am logged into my partner account and have tried searching for this but it is not there. Is this something you can send me?
Regards,
Evan
Jason says
Hi.
I’m also logged on to the parter website but cannot locate ‘Plan and Design for vSphere 4 Services Kit R1.zip’ under the contects tab. Would someone steer me in the right direction?
Thanks,
Jason