cloud

vSphere 5 Coverage

Duncan Epping · Aug 6, 2011 ·

I just read Eric’s article about all the topics he covered around vSphere 5 over the last couple of weeks and as I just published the last article I had prepared I figured it would make sense to post something similar. (Great job by the way Eric, I always enjoy reading your articles and watching your videos!) Although I did hit roughly 10.000 unique views on average per day the first week after the launch and still 7000 a day currently I have the feeling that many were focused on the licensing changes rather then all the new and exciting features that were coming up, but now that the dust has somewhat settled it makes sense to re-emphasize them. Over the last 6 months I have been working with vSphere 5 and explored these features, my focus for most of those 6 months was to complete the book but of course I wrote a large amount of articles along the way, many of which ended up in the book in some shape or form. This is the list of articles I published. If you feel there is anything that I left out that should have been covered let me know and I will try to dive in to it. I can’t make any promises though as with VMworld coming up my time is limited.

Once again if there it something you feel I should be covering let me know and I’ll try to dig in to it. Preferably something that none of the other blogs have published of course.

vSphere 5 – Metro vMotion

Duncan Epping · Aug 3, 2011 ·

I received a question last week about higher latency thresholds for vMotion… A rumor was floating around that vMotion would support RTT latency up to 10 miliseconds instead of 5. (RTT=Round Trip Time) Well this is partially true. With vSphere 5.0 Enterprise Plus this is true. With any of the versions below Enterprise Plus the supported limit is 5 miliseconds RTT. Is there a technical reason for this?

There’s a new component that is part of vMotion which is only enabled with Enterprise Plus and that components is what we call ‘Metro vMotion’. This feature enables you to safely vMotion a virtual machine across a link of up to 10 miliseconds RTT. The technique used is common practice in networking and a bit more in-depth described here.

In the case of vMotion the standard socket buffer size is around 0.5MB. Assuming a 1GbE network (or 125MBps) then bandwidth delay product dictates that we could support roughly 5ms RTT delay without a noticeable bandwidth impact. With the “Metro vMotion” feature, we’ll dynamically resize the socket buffers based on the observed RTT over the vMotion network. So, if you have 10ms delay, the socket buffers will be resized to 1.25MB, allowing full 125MBps throughput. Without “Metro vMotion”, over the same 10ms link, you would get around 50MBps throughput.

Is that cool or what?

vSphere 5.0 HA: Changes in admission control

Duncan Epping · Aug 3, 2011 ·

I just wanted to point out a couple of changes for HA in vSphere 5.0 with regards to admission control. Although they might seem minor they are important to keep in mind when redesigning your environment. Lets just discuss each of the admission control policies and list the changes underneath.

Host failures cluster tolerates
Still uses the slot algorithm. Major change here is that you can have a value larger than 4 hosts. The 4 host limit was imposed by the Primary/Secondary node concept. As this constraint has been lifted it is now possible to select a value up to 31. So in the case of a 16 host cluster you can set the value to 15. (Yes you could even set it to 31 as the UI doesn’t limit you but that wouldn’t make sense would it…) Another change is the default slotsize for CPU. The default slotsize used to be 256MHz. This has been decreased to 32MHz.
Percentage as cluster resources reserved
This admission control policy has been overhauled and it is now possible to select a percentage for both CPU and Memory separately. In other words you can set CPU to 30% and Memory to 25%. The algorithm hasn’t changed and this is still my preferred admission control policy!
Specify Failover host
Allows you to select multiple hosts instead of just 1. So for instance in an 8 host cluster you can specify two as designated failover hosts. These hosts will not be used during normal operations, keep this in mind!

For more details on admission control I would like to refer to the HA deepdive (not updated to 5.0 yet) or my book on vSphere 5.0 Clustering which contains many examples of how to correctly set the percentage for instance.

Hot of the press: vSphere 5.0 Clustering Technical Deepdive

Duncan Epping · Jul 12, 2011 ·

** Update: Available now: paperback full |paperback black & white **

After months of hard work the moment is finally there, the release of our new book: vSphere 5.0 Clustering Technical Deepdive! When we started working, or better said, planning an update of the book we never realized the amount of work required. Be aware that this is not a minor update. This book covers HA (full rewrite as HA has been rewritten for 5.0), DRS (mostly rewritten to focus on resource management) and Storage DRS (new!). Besides these three major pillars we also decided to add what we call supporting deepdives. The supporting deepdives added are: vMotion, Storage vMotion, Storage I/O Control and EVC. This resulted in roughly 50% more content (totaling 348 pages) than the previous book, also worth noting that every single diagram has been recreated and are they cool or what?

Before I will give you the full details I want to thanks a couple of people who have helped us tremendously and without whom this publication would not have been possible. First of all I would like to thank my co-author Frank “Mr Visio” Denneman for all his hard work. Frank and I would also like to thank our VMware management team for supporting us on this project. Doug “VEEAM” Hazelman thanks for writing the foreword! A special thanks goes out to our technical reviewers and editors: Doug Baer, Keith Farkas and Elisha Ziskind (HA Engineering), Anne Holler, Irfan Ahmad and Rajesekar Shanmugam (DRS and SDRS Engineering), Puneet Zaroo (VMkernel scheduling), Ali Mashtizadeh and Gabriel Tarasuk-Levin (vMotion and Storage vMotion Engineering), Doug Fawley and Divya Ranganathan (EVC Engineering). Thanks for keeping us honest and contributing to this book.

As promised in the multiple discussions we had around our 4.1 HA/DRS book we wanted to make sure to offer multiple options straight away. While Frank finalized the printed copy I worked on formatting the ebook. Besides the black&white printed version we are also offering a full color version of the book and a Kindle version. The black&white sells for $ 29.95, the full color for $ 49.95 and the Kindle for an ultra cheap price: $ 9.95. Needless to say that we recommend the Kindle version. It is cheap, full color and portable or should we say virtual… who doesn’t love virtual? On a sidenote, we weren’t planning on doing a black and white release but due to the extremely high production costs of the full color print we decided to offer it as an extra service. Before I give the full description here are the direct links to where you can buy the book. (Please note that Amazon hasn’t listed our book yet, seems like an indexing issue, should be resolved soon hopefully For those who cannot wait to order the printed copy check-out Createspace or Comcol.

Amazon:
eBook (Kindle) – $ 9.99 (price might vary based on location as amazon charges extra for delivery)
Black & White Paper – $ 29.95
Full Color Paper – $ 49.95

Createspace:
Black & White Paper – 29.95
Full Color Paper – 49.95

For the EMEA folks comcol.nl offered to distribute it again, paper black & white can be found here, and full color here.

VMware vSphere 5.0 Clustering Technical Deepdive zooms in on three key components of every VMware based infrastructure and is by no means a “how to” guide. It covers the basic steps needed to create a vSphere HA and DRS cluster and to implement Storage DRS. Even more important, it explains the concepts and mechanisms behind HA, DRS and Storage DRS which will enable you to make well educated decisions. This book will take you in to the trenches of HA, DRS and Storage DRS and will give you the tools to understand and implement e.g. HA admission control policies, DRS resource pools, Datastore Clusters and resource allocation settings. On top of that each section contains basic design principles that can be used for designing, implementing or improving VMware infrastructures and fundamental supporting features like vMotion, Storage I/O Control and much more are described in detail for the very first time.

This book is also the ultimate guide to be prepared for any HA, DRS or Storage DRS related question or case study that might be presented during VMware VCDX, VCP and or VCAP exams.

Coverage includes:
– HA node types
– HA isolation detection and response
– HA admission control
– VM Monitoring
– HA and DRS integration
– DRS imbalance algorithm
– Resource Pools
– Impact of reservations and limits
– CPU Resource Scheduling
– Memory Scheduler
– DPM
– Datastore Clusters
– Storage DRS algorithm
– Influencing SDRS recommendations

Be prepared to dive deep!

Pick it up, leave a comment and of course feel free to make those great mugshots again and ping them over via Facebook or our Twitter accounts! For those looking to buy in bulk (> 20) contact clusteringdeepdive@gmail.com.

Which metric to use for monitoring memory?

Duncan Epping · Apr 29, 2011 ·

** PLEASE NOTE: This article was written in 2011 and discussed how to monitor memory usage, which is different then memory / capacity sizing. For more info on “active memory” read this article by Mark A. **

This question has come up several times over the last couple of weeks so I figured it was time to dedicate an article to it. People have always been used to monitoring memory usage in a specific way, mainly by looking at the “consumed memory” stats. This always worked fine until ESX(i) 3.5 introduced the aggressive usage of Large Pages. In the 3.5 timeframe that only worked for AMD processors that supported RVI and with vSphere 4.0 support for Intel’s EPT was added. Every architectural change has an impact. The impact is that TPS (transparent page sharing) does not collapse these so called large pages. (Discussed in-depth here.) This unfortunately resulted in many people having the feeling that there was no real benefit of these large pages, or even worse the perception that large pages are the root of all evil.

After having several discussions with customers, fellow consultants and engineers we managed to figure out why this perception was floating around. The answer was actually fairly simple and it is metrics. When monitoring memory most people look at the following section of the host – summary tab:

However, in the case of large pages this metric isn’t actually that relevant. I guess that doesn’t only apply to large pages but to memory monitoring in general, although as explained it used to be an indication. The metric to monitor is “active memory“. Active memory is is what the VMkernel believes is currently being actively used by the VM. This is an estimate calculated by a form of statistical sampling and this statistical sampling will most definitely come in handy when doing capacity planning. Active memory is in our opinion what should be used to analyze trends. Kit Colbert has also hammered on this during his Memory Virtualization sessions at VMworld. I guess the following screenshot is an excellent example of the difference between “consumed” and “active”. Do we need to be worried about “consumed” well I don’t think so, monitoring “active” is probably more relevant at this point! However, it should be noted that “active” represents a 5 minute time slot. It could easily be that the first 5 minute value observed is the same as the second, yet they are different blocks of memory that were touched. So it is an indication of how active the VM is. Nothing more than that.