virtual san

How Virtual SAN enables IndonesianCloud to remain competitive!

Duncan Epping · Jun 2, 2015 ·

Last week I had the chance to catch up with one of our Virtual SAN customers. I connected to Neil Cresswell through twitter and after going back and forth we got on a conference call. Neil showed me what they had created for the company he works for, a public cloud provider called IndonesianCloud. No need to tell you where they are located as the name kind of reveals it. Neil is the CEO of IndonesianCloud by the way, and very very passionate about IT / Technology and VMware. It was great talking to him, and before I forget I want to say thanks for taking time out of your busy schedule Neil, I very much appreciate it!

IndonesianCloud is a 3 year old, cloud service provider, part of the vCloud Air Network, which focuses on the delivery of enterprise class hosting services to their customers. Their customers primarily run mission critical workloads in IndonesianCloud’s three DC environment, which means that stability, reliability and predictability is really important.

Having operated a “traditional” environment for a long time Neil and his team felt it was time for a change (Servers + Legacy Storage). They needed something which was much more fit for purpose, was robust / reliable and was capable of providing capacity as well as great performance. On top of that, from a cost perspective it needed to be significantly cheaper. The traditional environment they were maintaining just wasn’t allowing them to remain competitive in their dynamic and price sensitive market. Several different hyperconverged and software based offerings were considered, but finally the settled on Virtual SAN.

Since the Virtual SAN platform was placed into production two months ago, they have deployed over 450 new virtual machines onto their initial 12 node cluster. In addition, migration of another 600 virtual machines from one of their legacy storage platforms to their Virtual SAN environment is underway. While talking to Neil I was mostly interested in some of the design considerations, some of the benefits but also potential challenges.

From a design stance Neil explained how they decided to go with SuperMicro Fat Twin hardware, 5 x NL-SAS drives (4TB) and Intel S3700 SSDs (800GB) per host. Unfortunately no affordable bigger SSDs were available, and as such the environment has a lower cache to capacity ratio than preferred. Still, when looking at the cache hit rate for reads it is more or less steady around 98-99%. PCIe flash was also looked at, but didn’t fit within the budget. These SuperMicro systems were on the VSAN Ready Node list, and this was one of the main reasons for Neil and the team to pick them. Having a pre-validated configuration, which is guaranteed to be supported by all parties, was seen as a much lower risk than building their own nodes. Then there is the network; IndonesianCloud decided to go with HP networking gear after having tested various products. One of the reasons for this was the better overall throughput, better multicast performance, and lower price per port. The network is 10GbE end to end of course.

Key take away: There can be substantial performance difference between the various 10GbE switches, do your homework!

The choice to deploy 4TB NL-SAS drives was a little risky; IndonesianCloud needed to balance the performance, capacity, and price ratios. Luckily having already run their existing cloud platform for 3 years, there was a history of IO information readily available. Using this GB/IOPS historical information meant that IndonesianCloud were able to make a calculated decision that 4TB drives with 800GB SSD would provide the perfect combination of performance and capacity. With very good cache hit rates, Neil would like to deploy larger SSD drives when they become available, as he believes that cache is a great way to minimise the impact of the slower drives. Equally, the write performance of the 4TB drives was also concerning. Using the default VSAN stripe size configuration of 1 meant that at most, only 2 drives were able to service write de-stage requests for a given VM, and due to the slow speed of the 4TB drives, this could have an impact on performance. To mitigate this, IndonesianCloud performed a series of internal tests that baselined different stripe sizes to get a good balance of performance. In the end a stripe size of 5 was selected, and is now being used for all workloads. This also helps in situations where reads are coming from disk by the way, great side effect. BTW, the best way to think about Stripe Size and Failures to Tolerate is like Raid 1E (mirrored stripes).

Key take away: Write performance of large NL-SAS drives is low, striping can help improving performance.

IndonesianCloud has standardised on a 12 node Virtual SAN cluster, and I asked why, given that Virtual SAN 5.5 U1 supports up to 32 nodes (64 with 6.0 even). Neil’s response was that 12 nodes is what comprises an internal “zone”, and that customers can balance their workloads across zones to provide higher levels of availability. Having all nodes in a single cluster, whilst possible, was not considered the best fit for a service provider that is all about containing risk. 12 nodes also maps to approximately 1000 VMs, which is what they have modelled the financial costs against, so 1000 VMs deployed on the 12 node cluster would consume CPU/Memory/Disk at the same ratio, effectively ensuring maximum utilisation of the asset.

If you look at the workloads IndonesianCloud customers run, they range from large databases, time sensitive ERP systems, webservers, streaming TV CDN services, and they are even running Airline ERP operations for a local carrier… All of these VMs are from external paying customers by the way, and all of them are mission critical for those customers. On top of Virtual SAN some customers even have other storage services running. One of them for instance is running SoftNAS on top of Virtual SAN to offer shared file services to other VMs. Vast ranges of different applications, with different IO profiles and different needs but all satisfied by Virtual SAN. One thing that Neil stressed was that the ability to change the characteristics (failures to tolerate) specified in a profile was key for them, it allows for a lot of flexibility / agility.

I did wonder, with VSAN being relative new to the market, if they had concerns in terms of stability and recoverability. Neil actually showed me their comprehensive UAT Testing Plan and the results. They were very impressed by how VSAN handled these tests without any problem. Tests ranging from pulling drives, failing network interfaces and switches, through to removing full nodes from the cluster, all of these were performed whilst simultaneously running various burn-in benchmarks. No problems whatsoever were experienced, and as a matter of fact the environment has been running great in production (don’t curse it!!).

Key take away: Testing, Testing, Testing… Until you feel comfortable with what you designed and implemented!

When it comes to monitoring though, the team did want to see more details than what is provided out of the box, especially because it is a new platform they felt that this gave them a bit more insurance that things were indeed going well and it wasn’t just their perception. They worked with one of VMware’s rock stars (Iwan Rahabok) when it comes to VR Ops on creating custom dashboards with all sorts of data ranging from cache hit ratio to latency per spindle to ANY type of detail you want on a per VM level. Of course they start with generic dashboard which then allow you to drill down; any outlier is noted immediately and leveraging VR Ops and these custom dashboards, they can drill deep whenever they need. What I loved most is how relatively easy it is for them to extend their monitoring capabilities. During our WebEx Iwan felt he needed some more specifics on a per VM basis and added these details literally within minutes to VR Ops. IndonesianCloud has been kind enough to share a custom dashboard they created, where they can catch a rogue VM easily. In this dashboard, when a single VM, and it can be any VM, generates excessive IOPS it will trigger a spike right away in the overall dashboard.

I know I am heavily biased, but I was impressed. Not just with Virtual SAN, but even more so with how IndonesianCloud has implemented it. How it is changing the way IndonesianCloud manages their virtual estate and how it enables them to compete in today’s global market.

All-flash VSAN configuration example

Duncan Epping · Mar 31, 2015 ·

I was talking to a customer this week who was looking to deploy various 4 node VSAN configurations. They needed a solution which would provide them performance and wanted to minimize the moving components due to the location and environmental aspects of the deployment, all-flash VSAN is definitely a great choice for this scenario. I looked at various server vendors and based on their requirements (and budget) provided them a nice configuration (in my opinion) which comes in for slightly less than $ 45K.

What I found interesting is the price of the SSDs, especially the “capacity tier” as the price is very close to SAS 10K RPM. I selected the Intel S3500 as the capacity tier as it was one of the cheapest listed that is part of the VMware VSAN HCL, will be good to track GB/$ for new entries on the HCL that will be coming soon, so far S3500 seems to be the sweet spot. Also seems that from a price point perspective the 800GB devices are most cost effective at the moment. The 3500 seems to perform well as demonstrated in this paper by VMware on VSAN scaling / performance.

This is what the bill of materials looked like, and I can’t wait to see it deployed:

Supermicro SuperServer 2028TP-HC0TR – 2U TwinPro2
Each node comes with:
- 2 x Eight-Core Intel Xeon Processor E5-2630 v3 2.40GHz 20MB Cache (85W)
- 256 GB in 8 DIMMs at 2133 MHz (32GB DIMMs)
- 2 x 10GbE NIC port
- 1 x 400GB Intel SSD DC S3700 (Cache tier)
- 5 x 800GB Intel SSD DC S3500 (Capacity tier)
- Dual 10-Gigabit Ethernet
- LSI 3008 12G SAS

That is a total of 16TB of flash based storage capacity, 1TB of memory and 64 cores in mere 2U. The above price is based on a simple online configurator and does not include any licenses, a very compelling solution if you ask me.

All Flash VSAN – One versus multiple disk groups

Duncan Epping · Mar 11, 2015 ·

A while ago I wrote this article on the topic of “one versus multiple disk groups“. The summary was that you can start with a single disk group, but that from a failure domain perspective having multiple disk groups is definitely preferred. Also from a performance stance there could be a benefit.

So the question now is, what about all-flash VSAN? First of all, same rules apply: 5 disk groups max, each disk group 1 SDD for caching and 7 devices for capacity. There is something extra to consider though. It isn’t something I was aware off until I read the excellent Design and Sizing Guide by Cormac. It states the following:

In version 6.0 of Virtual SAN, if the flash device used for the caching layer in all-flash configurations is less than 600GB, then 100% of the flash device is used for cache. However, if the flash cache device is larger than 600GB, the only 600GB of the device is used for caching. This is a per-disk group basis.

Now for the majority of environments this won’t really be an issue as they typically don’t hit the above limit, but it is good to know when doing your design/sizing exercise. The recommendation of 10% cache to capacity ratio still stands, and this is used capacity before FTT. If you have a requirement for a total of 100TB, then with FTT=1 that is roughly 50TB of usable capacity. When it comes to flash this means you will need a total of max 5TB flash. That is 5TB of flash in total, with 10 hosts that would be 500GB per host and that is below the limit. But with 5 hosts that would be 1TB per host which is above the 600GB mark and would result in 400GB per host being unused?

Well not entirely, although the write buffer has a max size of 600GB, note that the remainder of the capacity is used by the SSD for endurance, so it will cycle through those cells… and that is also mainly what you are sizing for when it comes to the write buffer. That 10% is mainly around endurance. So you have a couple of options, you can have multiple disk groups to use the full write buffer capability if you feel you need that, or you can trust on VSAN and the flash “internals” to do what they need to do… I have customers doing both, and I have never heard a customer “complain” about the all-flash write-buffer limit… 600GB is a lot to fill up.

Virtual SAN and ESXTOP in vSphere 6.0

Duncan Epping · Feb 12, 2015 ·

Today I was fiddling with ESXTOP to see if anything was new for vSphere 6.0. Considering the massive number of metrics it already holds it is difficult to find things which stand out / are new. One thing did stick out though which is a new display for Virtual SAN.I haven’t found much detail around this new section in ESXTOP to be honest, but then again I guess most of it speaks for itself. If you are in ESXTOP and press “x” then you will go to the VSAN screen. Now when you press “f” you have the option to add “fields”, I enabled all and the below is the result:

It isn’t a huge amount of detail yet, but being able to see the number of reads, writes and average latency is useful for sure per host. Also what has my interest is “RECOWR/s” and “MBRECOWR/s”. This refers to “recovery writes”, which is the resync of components which were somehow impacted by a failure. If for whatever reason RVC or the VSAN Observer is unavailble then it may be worth peaking at ESXTOP to see what is going on.

HP ConvergedSystem 200–HC EVO:RAIL available now!

Duncan Epping · Feb 11, 2015 ·

Yesterday I was informed by the EVO:RAIL team that the HP ConvergedSystem 200–HC EVO:RAIL is available (shipping) as of this week. I haven’t seen much around additional pieces HP is including, but I was told though that they are planning to integrate HP One View. HP One View is a management/monitoring solution that gives you a great high level overview of the state of your systems but at the same time enables you to dive deep when required. Depending on the version included HP One View can also do things like Firmware Management, which is very useful in a Virtual SAN environment if you ask me. I know though that many people have been waiting for HP to start shipping as it appears to be a preferred vendor for many customers. In terms of configuration, the HP solution is very much similar to what we have already seen out there:

4 nodes in 2U each containing:
- 2 x Intel® E5-2620 v2 six-core CPUs
- 192 GB memory
- 1 x SAS 300 GB 10k rpm drive ESXi boot device
- 3 x SAS 1.2 TB 10k rpm drive (VSAN capacity tier)
- 1 x 400 GB MLC enterprise-grade SSD (VSAN performance tier)
- 1 x H220 host bus adapter (HBA) pass-through controller
- 2 x 10GbE NIC ports
- 1 x 1GbE IPMI port for remote (out-of-band) management

As soon as I find out more around integration of other components I will let you folks know.