vSphere Metro Storage Cluster with vSphere 5.5

I had a couple of questions around the exact settings for vSphere Metro Storage Clusters with vSphere 5.5. It was the third time in two weeks I shared the same info about vMSC with vSphere 5.5 so I figured I would write a quick blog making the information a bit easier to find through google. Below you can find the settings required for a vSphere Metro Storage Cluster with vSphere 5.5. Note that in-depth details around operations / testing can be found in this white paper: version 5.x // version 6.0.

  1. VMkernel.Boot.terminateVMOnPDL = True
  2. Das.maskCleanShutdownEnabled = True 
  3. Disk.AutoremoveOnPDL = 0 

I want to point out that if you migrate from 5.0 or 5.1 that Host Advanced Setting “VMkernel.Boot.terminateVMOnPDL” replaces disk.terminateVMOnPDLDefault (/etc/vmware/settings). Das.maskCleanShutdownEnabled is actually configured to “true” by default as of vSphere 5.1 and later, but personally I prefer to set it anyway so that I know for sure it has been configured accurately. Then there is Disk.AutoremoveOnPDL, this setting is new in vSphere 5.5 as discussed here. Make sure to disable it, as PDLs are likely to be temporary there is no point removing the devices and then having to do a rescan to have them reappear, it only slows down your process recovery. (EMC also recommends this by the way, see page 21 of this PDF on vMSC/VPLEX).

What happens to VMs when a cluster is partitioned?

I had this question this week around what happens to VMs when a cluster is partitioned. Funny thing is that with questions like these it seems like everyone is thinking the same thing at the same time. I had the question on the same day from a customer running traditional storage and had a network failure across racks and from a customer running Virtual SAN who just wanted to know how this situation was handled. The question boils down to this, what happens to the VM in “Partition 1″ when the VM is restarted in Partition 2?

The same can be asked for a traditional environment, only difference being that you wouldn’t see those “disk groups” in the bottom but a single datastore. In that case a VM can be restarted when a disk lock is lost… What happens to the VM in partition 1 that has lost access to its disk? Does the isolation response kick in? Well if you have vSphere 6.0 then potentially VMCP can help because if you have a single datastore and you’ve lost access to it (APD) then the APD response can be triggered. But if you don’t have vSphere 6.0 or don’t have VMCP configured, or if you have VSAN, what would happen? Well first of all, it is a partition scenario and not an isolation scenario. On both sides of the partition HA will have a master and hosts will be able to ping each other so there is absolutely no reason to invoke the “isolation response” as far as HA is concerned. The VM will be restarted in partition 2 and you will have it running in Partition 1, you will either need to kill it manually in Partition 1, or you will need to wait until the partition is lifted. When the partition is lifted the kernel will realize it no longer holds the lock (as it is lost it to another host) and it will kill the impacted VMs instantly.

Virtual SAN enabling PeaSoup to simplify cloud

This week I had the pleasure of talking to fellow dutchy Harold ButerHarold is the CTO for Peasoup and we had a lively discussion about Virtual SAN and why Peasoup decided to incorporate Virtual SAN in their architecture, what struck me was the fact that Peasoup Hosting was brought to life partly as a result of the Virtual SAN release. When we introduced Virtual SAN, Harold and his co-founder realized that this was a unique opportunity to build something from the ground up while avoiding big upfront costs typically associated with legacy arrays. How awesome is that, a new product that results in to new ideas and in the end a new company and product offering.

The conversation of course didn’t end there, lets get in to some more details. We discussed the use case first. PeaSoup is a hosting / cloud provider. Today they have two clusters running based on Virtual SAN. They have a management cluster which hosts all components needed for a vCloud Director environment and then they have a resource cluster. The great thing for PeaSoup was that they could start out with a relatively low investment in hardware and scale fast when new customers on-board or when existing customers require new hardware.

Talking about hardware PeaSoup looked at many different configurations and vendors and for their compute platform decided to go with Fujitsu RX300 rack mount servers. Harold mentioned that by far these were the best choice for them in terms of price, build quality and service. Personally it surprised me that Fujitsu came out as the cheapest option, it didn’t surprise me that Fujitsu’s service and build quality was excellent though. Specs wise the servers have 800GB SSDs, 7200 RPM NL-SAS disks and 256GB memory and of course two CPUs (Intel 2620 v2 – 6 core).

Harold pointed out that the only down side of this particular Fujitsu configuration was the fact that it only came with a disk controller that is limited to “RAID O” only, no passthrough. I asked him if they experienced any issues around that and he mentioned that they had 1 disk failure so far and that is resulted in having to reboot the server in order to recreate a RAID-0 set for that new disk. Not too big of a deal for PeaSoup, but of course if possible he would prefer to prevent this reboot from being needed. The disk controller by the way is based on the LSI 2208 chipset and it is one of things PeaSoup was very thorough about, making sure it was supported and that it had a high queue depth. The “HCL” came up multiple times during the conversation and Harold felt that although doing a lot of research up front and creating a scalable and repeatable architecture takes time, it also results in a very reliable environment with predictable performance. For a cloud provider reliability and user experience is literally your bread and butter, they couldn’t afford to “guess”. That was also one of the reasons they selected a VSAN Ready Node configuration as a foundation and tweaked where their environment and anticipated workload would require it.

Key take away: RAID-0 works perfectly fine during normal usage, only when disks need to be replaced a slight different operational process is required.

Anticipated is a keyword once again as it has been in many of the conversations I’ve had before, it is often unknown what kind of workloads will run on top of these infrastructures which means that you need to be able to be flexible in terms of scaling up versus scaling out. Virtual SAN provides just that to PeaSoup. We also spoke about the networking aspect. As a cloud provider running vCloud Director and Virtual SAN networking is a big aspect of the overall architecture. I was interested in knowing what kind of switching hardware was being used. PeaSoup uses Huawei 10GbE switches(CE6850), and each Server is connected with at least 4 x 10GbE port to these switches. PeaSoup dedicated 2 of these ports to Virtual SAN, which wasn’t a requirement from a load perspective (or from VMware’s point of view) but they preferred this level of redundancy and performance while having a lot of room to grow. Resiliency and future proof are key for PeaSoup. Price vs Quality was also a big factor in the decision to go with Huawei switches, Huawei in this case had the best price/quality ratio.

Key take away: It is worth exploring different network vendors and switch models. Prices greatly variate between vendors and models which could lead to substantial cost savings without impacting service / quality

Their host and networking configuration is well documented and can be easily repeated when more resources are needed. They even have discount / pricing documented with their suppliers so they know what the cost will be and can assess quickly what is needed and when, and of course what the cost will be. I also asked Harold if they were offering different storage profiles to provide their customers a choice in terms of performance and resiliency. So far they offer two different policies to their customers:

  • Failures to tolerate = 1  //  Stripe Width = 2
  • Failures to tolerate = 1  //  Stripe Width = 4

So far it appears that not too many customers are asking about higher availability, they recently had their first request and it looks like the offering will include “FTT=2″ along side “SW=2 / 4″ in the near future. On the topic of customers they mentioned they have a variety of different customers using the platform ranging from companies who are in the business of media conversion, law firms to a company which sells “virtual private servers” on their platform.

Before we wrapped up I asked Harold what the biggest challenge for them was with Virtual SAN. Harold mentioned that although they were a very early adopter and use it in combination with vCloud Director they have had no substantial problems. What may have been the most challenging in the first months was figuring out the operational processes around monitoring. Peasoup is a happy Veeam customer and they decided to use Veeam One to monitor Virtual SAN for now, but in the future they will also be looking at the vR Ops Virtual SAN management pack, and potentially create some custom dashboards in combination with LogInsight.

Key take away: Virtual SAN is not like a traditional SAN, new operational processes and tooling may be required.

PeaSoup is an official reference customer for Virtual SAN by the way, you can find the official video below and the slide deck of their PEX session here.

PernixData announcements at #VFD5

Today PernixData presented at Virtualization Field Day 5. Excellent presentation by Satyam once again. This week I was fortunate to catch up with Frank Denneman to discuss what was going to be announced and what can be expected in the near future. I want to make it clear that there were no expectations given around release dates, don’t expect this to drop next week.

There were 4 key announcements:

  1. PernixData Architect – A better way to design, operate, and optimize data centers
  2. PernixData Cloud – Making enterprise IT more transparent
  3. PernixData FVP
    1. Freedom – Yes, “free” is the key word here!
    2. New features / functionality…

I am going to go at this in a different order than the deck, as I want to cover some of the changes with regards to FVP first. Satyam spoke about a new thing called “FVP Freedom“. FVP Freedom is a free version of FVP which can be used by anyone in any environment. Of course there are some constraints / limitations and these are:

  • Up to 128GB DFTM cluster for write through acceleration
  • Community support

However, you can use FVP Freedom for an unlimited number of hosts and unlimited number of VMs. Note that “DFTM” stands for Distributed Fault Tolerant Memory. This means that FVP Freedom gives you memory (read) caching only, no SSD caching. (128GB limit per cluster) I think this is huge, and it is a very smart way of getting people to test your solution and run it in production. So what can you do with 128GB? Well of course they tested this, and they were capable of increasing the VSImax users from 181 to 328 with that 128GB of memory on 2 hosts. You may wonder why they took this approach cause what does giving it away for free bring them? Well that will be obvious when you read the other announcements.

Besides a free version of FVP some enhancements to the current version were also announced. For me support for vSphere 6.0 and VVols were two major items. On top of that new “phone home” functionality is build in, which allows for better and pro-active support. What also stood out was the new stand alone UI. This means that you will be taken out of the Web Client to a standalone HTML5+JS based interface. You may wonder why they did this, that is where the two new product announcements come in to play. FVP is still a Windows installable by the way, I hoped they would announce an appliance which lowers complexity in terms of installation and management, but maybe next time who knows.

PernixData Architect was the first announcement. It is a piece of software that enables you to monitor your infrastructure (storage focussed of course) and make educated decisions based on the information and even recommendations provided. So what are we talking about in terms of metrics etc? PernixData Architect (for now?) is focussed on storage, not just from a cluster point of view, but also from a host level and virtual machine level. What is the latency a virtual machine is experiencing? How many IOPS does this VM do on average? What is the throughput? What is the read/write ratio? What are common block sizes? All the things you would like to know when designing, scaling, sizing your storage infrastructure and of course when using FVP.

Besides the details above you can for instance also see what the active working set is in your environment for any of your VMs. You can even get recommendations around how to configure FVP, you may have it set to write back but if you are mainly serving reads from cache you may want to change that for instance. It will also give you other recommendations for instance around networking etc.

You can imagine that with all the metrics and info they are gathering they will be able to provide you much more recommendations in the future. I can see those dashboards expanding fast, and I think it is valuable for everyone to understand how their workloads are behaving. On twitter some comments were made about vR Ops and CloudPhysics. Not a fair comparison as Pernix is focussed on “just” storage for now. Personally I hope they will start tying in other aspects like memory, cpu and networking as I don’t think customers want to be stuck with 2 or 3 monitoring solutions.

Now that you have all that data, what can you do with it? Well that is where PernixData Cloud comes in. PernixData Cloud can give you Insights in to how you are doing compared to others in the industry with similar environments, or even with different environments. Those running PernixData Architect can feed it in to the cloud analytics platform and do an analysis on it. But what if they don’t? How useful is this cloud analytics platform going to be? Well here is the catch, when you use FVP Freedom one of the requirements will be to upload your statistics and environmental details in to PernixData Cloud. So, what kind of data can you get out of it? Let me give you two visual examples as that shows immediately why this is valuable:

Both of the above examples demonstrate what PernixData Cloud Insights can give you. Data that is going help making purchasing decisions, and I can see how it could also be useful in the future for making design decisions. (Here is what others did to achieve X.) Best example is the top screenshot, not sure which flash device to buy? What are others buying? What can you expect out of it in terms of latency/throughput/IOPS? Cloud Insights will enable you to make educated decisions based on real life environments instead of based on fact-sheets which always appear to be misleading.

All in all, exciting news / announcements from PernixData at Virtualization Field Day 5. Nice work guys, and thanks Mr Denneman for taking the time to have a chat with me and thanks Mr Foskett for streaming the event live!

Conservative vs Aggressive for VMCP APD response

I just finished writing the vMSC 6.0 Best Practices paper which is about to be released when a question came in. The question was around the APD scenario and whether the response to an APD should be set to aggressive or conservative. Its a good question and my instinct immediately says: conservative… But should it be configured to that in all cases? If so, why on earth do we even have the aggressive method? That got me thinking. (By the way, make sure to read this article by Matt Meyer on VMCP on the vSphere blog, good post!) But before I spill the beans, what is aggressive / conservative in this case and what is this feature again?

VM Component Protection (VMCP) is new in 6.0 and it allows vSphere to respond to a scenario where the host have lost access to a storage device. (Both PDL and APD.) In previous releases vSphere was already capable of responding to PDL scenarios but the settings weren’t really exposes in the UI and that has been done with 6.0 and the APD response has also been added at the same time. Great feature if you ask me, especially in stretched environments as it will help during certain failure scenarios. [Read more…]