customer

VMworld Sessions with vSAN Customers

Duncan Epping · Sep 10, 2018 ·

At VMworld, there were various sessions with vSAN Customers, many of which I have met in some shape or form in the past couple of years. Some of these stories contain some great use cases or stories. Considering they are “hidden” in 60-minute sessions I figured I would write about them and share with you the link where and when applicable.

In the Monday keynote, there were also a couple of great vSAN quotes and customers mentioned. Not sure everyone spotted this, but definitely, something I felt is worth sharing, as these were powerful stories and use cases. First of all the vSAN numbers were shared, with 15k customers and adoption within 50% of the Global 2000 within 4 years I think it is fair to say that our business unit is doing great!

In the Make A Wish foundation video I actually spotted the vSAN management interface, although it was not explicitly mentioned still very cool to see that vSAN is used. As their CEO mentioned, it was great to get all that attention after they appeared on national television but it also resulted in a big website outage. The infrastructure is being centralized and new infra and security policies are put into place, “working with VMware enables us to optimize our processes and grant more wishes”.

Another amazing story was Mercy Ships, this non-profit operates the largest NGO hospital ship bringing free medical care to different countries in Africa. Not just medical care, they also are providing training to local medical staff so they can continue providing the help needed in these areas of the world. They are now building their next generation ship which is going live in 2020, VMware and Dell/EMC will be a big part of this. As Pat said: “it is truly amazing to see what they do with our technology”. Currently, they use VxRail, Dell Isilon etc on their ships as part of their infrastructure.

The first session I watched was a session by our VP of Product Management and VP of Development, I actually attended this session in person at VMworld, and as a result of technical difficulties, they started 20 minutes late, hence the session is “only” 40 minutes. This session is titled “HCI1469BU – The Future of vSAN and Hyperconverged Infrastructure“. In this session, David Selby has a section of about 10 minutes and he talks about the vSAN journey which Honeywell went through. (If you are just interested in David’s section, skip to 9:30 minutes into the session) In his section, David explains how at Honeywell they had various issues with SAN storage causing outages of 3k+ VMs, as you can imagine very costly. In 2014 Honeywell started tested with a 12 Node cluster for their management VMs. This for them was a low-risk cluster. Their test was successful and they quickly started to move VMs over to vSAN in other parts of the world. Just to give you an idea:

US Delaware, 11k VMs on vSAN
US Dallas, 500 VMs on vSAN
NL Amsterdam, 12k VMs (40% on vSAN, 100% by the end of this year!)
BE Brussels, 1000 VMs (20% on vSAN, 100% by the end of this year!)

That is a total of roughly 24,500 VMs on vSAN, with close to 1.7PB of capacity, with an expected capacity of around 2.5PB by the end of this year. All running on vSAN All Flash the Dell PowerEdge FX2 platform by the way! Many different types of workloads run on these clusters. Apps ranging from MS SQL, Oracle, Horizon View, Hadoop, Chemical Simulation software, and everything else you can think off. What I found interesting is that they are running their Connexo software on top of vSAN, in this particular case the data of 5,000,000 smart energy meters in homes in a country in Europe is landing on vSAN. Yes, that is 5 million devices sending data to the Honeywell environment and being stored, and analyzed on vSAN.

David also explained how they are leveraging IBM cloud with vSAN to run Chemical Plant simulators so operators of chemical plants can be trained. IBM cloud also runs vSAN, and Honeywell uses this so they can leverage the same tooling and processes for on-premises as well as in IBM Cloud. What I think was a great quote, “performance has gone through the roof, applications load in 3 seconds instead of 4 minutes, they received helpdesk tickets as users felt applications were loading too fast”. David works closely with the vSAN team on the roadmap, and had a long list of features he wanted in 2014, all of those have been released now, now there are a couple of things he would like to see addressed and as mentioned by Vijay, they will be worked on in the future.

A session I watched online was “HCI1615PU -vSAN Technical Customer Panel on vSAN Experiences“. This was a panel session that was hosted by Peter Keilty from VMware and had various customers: William Dufrin – General Motors, Mark Fournier – US Senate Federal Credit Union, Alex Rodriguez – Rent A Center, Mariusz Nowak – Oakland University. I always like these customer panels as you get some great quotes and stories, which are not scripted.

First, each of the panel members introduces themselves and followed by an intro of their environment. Let me quickly give you some stats of what they are doing/running:

General Motors – William Dufrin
- Two locations running vSAN Thirteen vCenter Server instances
- 700+ physical hosts
- 60 Clusters
- 13,000+ VMs

William mentioned they started with various 4 node vSAN clusters, now they by default role out a minimum of 6-node or 12-node, depending on the use-case. They have server workloads and VDI desktops running, here we are talking thousands of desktops. Not using stretched vSAN yet, but this is something they will be evaluating in the future potentially.

US Senate Federal Credit Union – Mark Fournier
- Three locations running vSAN (remote office location
- 2 vCenter Instances
- 8 hosts
- 3 clusters
  - one cluster with 4 nodes, and then two 2-node configurations
- Also using VVols!

What is interesting is that Mark explains how they started virtualizing only just 4 years ago, this is not something I hear often. I guess change is difficult within the US Senate Federal Credit Union. They are leveraging vSAN in remote offices for availability/resiliency purposes at a relatively low cost (ROBO Licensing). They run all-flash but this is overkill for them, resource requirements are relatively low. Funny detail is that vSAN all-flash is outperforming their all-flash traditional storage solution in their primary data center. Now considering moving some workloads to the branches to leverage the available resources and perform better. Also a big user of vSAN Encryption, considering this is a federal organization that was to be expected, leveraging Hytrust as their key management solution.

Rent-A-Center – Alex Rodriguez
- One location using vSAN
- 2 vCenter Server instances
- 16 hosts
- 2 clusters
- ~1000 VMs

Alex explains that they run VxRail, which for them was the best choice. Flawless and very smooth implementation, which is a big benefit for them. Mainly using it for VDI and published applications. Tested various other hyper-converged solutions, but VxRail was clearly better than the rest. Running a management cluster and a dedicated VDI cluster.

Oakland University – Mariusz Nowak
- Two locations
- 1 vCenter Server instance
- 12 hosts
- 2 clusters
- 400 VMs

Mariusz explains the challenges around storage costs. When vSAN was announced in 2014 Mariusz was intrigued instantly, he started reading and learning about it. In 2017 they implemented vSAN and moved all VMs over, except for some Oracle VMs, but this is for licensing reasons. Mariusz leverages a lot of enterprise functionality in their environment, ranging from Stretched Cluster, Dedupe and Compression, all the way to Encryption. This is due to compliance/regulations. Interesting enough, Oakland University runs a stretched cluster with a < 1ms RTT, pretty sweet.

Various questions then came in, some interesting questions/answers or quotes:

“vSAN Ready Node and ROBO licensing is extremely economical, it was very easy to get through the budget cycle for us and set the stage for later growth”
The Storage Policy Based Management framework allows for tagging virtual disks with different sets of rules and policies when we implemented that we crafted different policies for SolidFire and vSAN to leverage the different capabilities of each platform (reworded for readability)
QUESTION: What were some of the hurdles and lessons learned?
- Alex: We started with a very early version vSPEX Blue and the most challenging for us back then was updating, going from one version to the other. Support, however, was phenomenal.
- William: Process and people! It is not the same as traditional storage, you use a policy-based management framework on object-based storage, which means different procedures. Networking, in the beginning, was also a challenge, consistent MTU settings across hosts and network switches are key!
- Mariusz: We are not using Jumbo Frames right now as we can’t enable it across the cluster (including the witness host), but with 6.7 U1 this problem is solved!
- Mark: What we learned is that dealing with different vendors isn’t always easy. Also, ROBO licensing makes a big difference in terms of price point.
QUESTION: Did you test different failure scenarios with your stretched cluster? (reworded for readability)
- Mariusz: We did various failure scenarios. We unplugged the full network of a host and watched what happened. No issues, vSphere/vSAN failed over VMs with no performance issues.
QUESTION: How do you manage upgrades of vSphere and firmware?
- Alex: We do upgrades and updates through VxRail Manager and VUM. It downloads all the VIBs and does a rolling upgrade and migration. It works very well
- Mark: We leverage both vSphere ROBO as well as vSAN ROBO, one disadvantage is that vSphere ROBO does not include DRS which means you don’t have “automated maintenance mode”. This results in the need to manually migrate VMs and placing hosts into maintenance mode manually. But as this is a small environment this is not a huge problem currently. We can probably script it through PowerCLI.
- Mariusz: We have Ready Nodes, which is more flexible for us, but it means upgrades are a bit more challenging. But VMware has promised more is coming in VUM soon. We use Dell Plugins for vCenter so that we can do firmware upgrades etc from a single interface (vCenter).

The last session I watched was “HCI3691PUS – Customer Panel: Hyper-converged IT Enabling Agility and Innovation“, which appeared to be a session sponsored by Hitachi with ConAgra Brands and Norwegian Cruise Line as two reference customers. Matt Bouges works for ConAgra Brands as an Enterprise Architect, Brian Barretto works for Norwegian Cruise Line as a Virtualization Manager.

First Matt discussed why ConAgra moved towards HCI, which is all about scaling and availability as well as business restructuring. They needed a platform that could scale with their business needs. For Brian / Norwegian Cruise Line‘s it was all about cost. The current SAN/Storage architecture was very expensive, and as at the time, a new scalable solution (HCI) emerged they explored that and found that the cost model was in their favor. As they run the data centers on the ships as well they need something that is agile, note that these ships are huge, basically floating cities, with redundant data centers onboard of some of these ships. (Note they have close to 30 ships, so a lot of data centers to manage.) Simplicity and also rack space was a huge deciding factor for both ConAgra and Norwegian Cruise Lines.

Norwegian Cruise Line mentioned that they also still use traditional storage, same for ConAgra. It is great that you can do this with vSAN, keep your “old investment”, while building out the new solution. Over time most applications will move over though. One thing that they feel is missing with hyper-converged is the ability to run large memory configurations or large storage capacity configurations. (Duncan: Not sure I entirely agree, limits are very close to non-HCI servers, but I can see what they are referring to.) One thing to note as well from an operational aspect is that certain types of failures are completely different, and handled completely different in an HCI world, that is definitely something to get familiar with. Another thing mentioned was the opportunity of HCI in the Edge, nice small form factor should be possible and should allow running 10-15 VMs. It removes the need for “converged infra” in those locations or traditional storage in general in those environments. Especially now that compute/processing and storage requirements go up at the edge due to IoT and data analytics that happens “locally”.

That was it for now, hope you find this useful!

VSAN everywhere with Computacenter

Duncan Epping · Jun 1, 2016 ·

This week I had the pleasure to have a chat with Marc Huppert (VCDX181). Marc works for Computacenter in Germany as a Senior Consultant and Category Leader VMware Solutions. He primarily focuses on datacenter technology. I noticed a tweet from Marc that he was working on a project where they will be implementing a 60 site ROBO deployment. It is one of those use cases that I don’t get to see too often in Europe so I figured I would drop him a note. We had a conversation of about an hour and below is the story around this project, and some of the other projects Marc has worked on.

Marc mentioned that he has been involved with VSAN for about 1.5 years now. They at first did intensive testing internally to see what VSAN could do for their customers and looked at the various use cases for their customer base. Quickly they discovered that when combining VSAN with Fusion-IO they ended up with a very powerful combination. Not only extremely reliable (Marc mentioned he has never seen a Fusion-IO card fail), but also an extremely well performing solution. They did comparisons between Fusion-IO and regular SATA connected SSDs and performance literally doubled, not just for reads but also writes was a big difference. One of the other reasons for considering PCIe based flash is to have the maximum number of disk slots available for the capacity tier. It all makes sense to me. Right now for current projects, NVMe based flash by Intel is being explored, and I am very curious to see what Marc’s experience is going to be like in terms of performance, reliability and the operational aspects compared to Fusion-IO.

Which brings us to the ROBO project, as this is the project where NVMe will be used surprisingly enough. Marc mentioned that this customer, a large company, has over 60 locations all connected to a central (main) datacenter. Each location will be equipped with 2 hosts. Depending on the size of the location and the number of VMs needed a particular VSAN configuration can be selected:

Small – 1 NVMe device + 5 disks, 128GB RAM
Medium – 2 NVMe devices + 10 disks, 256GB RAM
Large – 3 NVMe devices + 15 disks, 384GB RAM

Yes, that also leaves room to grow when desired, as every disk group can go up to 7 disks. From a Compute point of view the configurations do not differ too much besides memory config and disk capacity, actually the CPU is the same, to keep operations simple. In terms of licensing, the vSphere ROBO and VSAN ROBO edition are being leveraged, which provides a great scalable and affordable ROBO infrastructure, especially when coming from a two node configuration with a traditional storage system per location. Not just the price point, but primarily the day 2 management.

When demonstrating VSAN to their customer this is what impressed the customer the most. They have two people managing the entire virtual and physical estate, that is 60 locations (120 nodes) and the main datacenter which houses ~ 5000VMs and many physical machines as a result. You can imagine that they spend a lot of time in vCenter and they prefer to manage things end to end from that same spot, definitely don’t want be switching between different management interfaces. Today they manage many small storage systems for their ROBO locations, and they immediately realised that VSAN in a ROBO configuration would reduce the time they spend managing those locations significantly.

And that is just the first step, next up would be the DMZ. They have a separate compute cluster as it stands right now, but it unfortunately connects back to the same shared storage system as where there production is running. They do fully understand the risk, but never wanted to incur the large cost associated with a storage system dedicated for their DMZ, not just capex but also from an opex point of view. With VSAN the economics change, making a fully isolated and self-contained DMZ compute and storage cluster dead simple to justify, especially when combining it with NSX.

One awesome customer if you ask me, and I am hoping they will become a public VSAN reference at some point in the future as it is a great testimony to what VSAN can do. We briefly discussed other use cases Marc had seen out in the field and Horizon View, Management Clusters and production came up. Which is very similar to what I see. Marc also mentioned that there is a growing interest in all-flash, which is not surprising considering the dollar per GB cost of SAS is very close to flash these days.

Before we wrapped it up, I asked Marc if had any challenges with VSAN itself, what he felt was most complex. Marc mentioned that sizing was a critical aspect and that they have spend a lot of time in the past figuring out which configurations to offer to customers. Today the process they use is fairly straight forward: select Ready Node configuration, change SATA SSD with PCIe based Flash or NVMe, increase or decrease number of disks. Fully supported, yet still flexible enough to meet all the demands of his customers.

Thanks Marc for the great conversation, and looking forward to meeting up with you at VMworld. (PS: Marc has visited all VMworld events so far in both the US and EMEA, a proper old-timer you could say :-))

You can find him on Twitter or on his blog: http://www.vcdx181.com

How is Virtual SAN doing? 3500 customers reached!

Duncan Epping · Apr 21, 2016 ·

Are you wondering how Virtual SAN is doing? The recent earnings announcement revealed that… Virtual SAN is doing GREAT! Over 3500 customers so far (21 months after the release!) and 200% Year over Year growth. I loved how Pat Gelsinger described Virtual SAN: “VMware’s simple enterprise grade native storage for vSphere”. It doesn’t get more accurate and to the point than that, and that is how people should look at it. vSphere native storage, it just works. Just a couple of things I wanted to grab from the earnings call (transcript here) that I think stood out with regards to VSAN:

and I – having been three years at EMC as a storage company, part of it is it just takes a while to get a storage product mature, right, and that – we have crossed the two-year cycle on VSAN now. The 6.2 release, as I would say, checks all the boxes with regard to key features, capabilities and so on, and we are, I’ll say right on schedule, right, we’re seeing the inflection point on that business, and the 6.2 release really hit the mark in the marketplace very well.

I’d say we’re clearly now seen as number one in a hyper-converged infrastructure space, and that software category we think is going to continue to really emerge as a powerful trend in the industry.

I think Zane mentioned a large financial services company. We had a large EMEA retailer, a large consumer goods manufacturer, a large equipment engines company, and each one of these is really demonstrating the power of the technology.

We also had good transactional bookings as well, so it wasn’t just in big deals but also transactional performance was good. So the channel participation is increasing here.

So we really left Q1 feeling really good about this area, and I’m quite bullish about its growth potential through the year and 2017 and beyond.

I think I don’t need to add anything other than… Go VSAN!

VSAN enabling Sky to be fast / responsive / agile…

Duncan Epping · Nov 30, 2015 ·

Over the last couple of months I’ve been talking to a lot of VSAN customers. A while ago I had a very interesting use case with a deployment on an Oil Platform. This time it is a more traditional deployment: I had the pleasure of talking to James Cruickshank who works for Sky. Sky is Europe’s leading entertainment company, serving 21 million customers across five countries: Italy, Germany, Austria, the UK and Ireland.

James is part of Sky’s virtualisation group which primarily focusses on new technologies. In short, the team figures out if a technology will benefit Sky, how it works, how to implement it and how to support it. He documents all of his findings then develops and delivers the solution to the operations team.

One of the new products that James is working with is Virtual SAN. The project started in March and Sky has a handful of VSAN ready clusters in each of its strategic data centres. These clusters currently have ESXi 5.5 hosts with one 400GB SSD and 4 x 4TB NL-SAS drives all connected over 10GbE, a significant amount of capacity per host. The main reason for that is that there is a requirement for Sky to run with FTT=2 (for those who don’t know, this means that a 1TB disk will consume ~3TB). James anticipates VSAN 6 will be deployed with a view to deliver production workloads in Q1 2016.

We started talking about the workloads Sky had running and what some of the challenges were for James. I figured that, considering the size of the organisation and the number of workloads it has, getting all the details must not have been easy. James confirmed that it was difficult to get an understanding of the IO profile and that he spent a lot of time developing representative workloads. James mentioned that when he started his trial the VSAN Assessment Tool wasn’t available yet, and that it would have saved him a lot of time.

So what is Sky running? For now mainly test/dev workloads. These clusters are used by developers for short term usage, to test what they are building and trash the environment, all of which is enabled through vRealize Automation. Request a VM or multiple, deploy on VSAN cluster and done. So far in Sky’s deployment all key stakeholders are pleased with the technology as it is fast and responsive, and for the ops team in particular it is very easy to manage.

James mentioned that recently he has been testing both VSAN 5.5 and 6.0. He was so surprised about the performance increase that he re-ran his test multiple times, then had his colleagues do the same, while others reviewed the maths and the testing methodology. Each time they came to the same conclusion; there was an increase in excess of 60% performance between 5.5 and 6.0 (using a “real-world” IO profile), an amazing result.

Last question for me was around some of the challenges James faced. The first thing he said was that he felt the technology was fantastic. There were new considerations around the design/sizing of their VSAN hosts, the increased dependency on TCP/IP networks and the additional responsibilities for storage placed within the virtualisation operations team. There were also some minor technical challenges, but these were primarily from an operational perspective, and with vSphere / VSAN 5.5. In some cases he had to use RVC, which is a great tool, but as it is CLI based it does have a steep learning curve. The HealthCheck plugin has definitely helped a lot with 6.0 to improve this.

Another thing James wanted to call out is that in the current VSAN host design Sky uses an SSD to boot ESXi, as VSAN hosts with more than 512GB RAM cannot boot from SD card. This means the company is sacrificing a disk slot which could have been used for capacity, when it would prefer to use SD for boot if possible to optimise hardware config.

I guess it is safe to say that Sky is pleased with VSAN and in future the company is planning on adopting a “VSAN first” policy for a proportion of their virtual estate. I want to thank Sky, and James in particular, for taking the time to talk to me about his experience with VSAN. It is great to get direct feedback and hear the positive stories from such a large company, and such an experienced engineer.