Last week I had the pleasure to catch up with Rob Alford from United Utilities. For those who don’t know, United Utilities is the largest listed water company in the UK, based in the north west of England.
I met Rob at VMworld last year so I knew they were working with Virtual SAN and had actually bought Virtual SAN on day 1 of GA, one of the first VSAN customer in Europe for sure and probably worldwide. As I mentioned they are a utilities company, and as you can imagine they run really mission critical workloads, service disruption is not something they can afford. And yes, some of these workloads sit on top of Virtual SAN. I found interesting to hear the approach United Utilities took. Simply said: Virtual SAN first approach. A policy very similar to the “virtualization first” approach many of you had years back.
I wondered what got Rob and his team interested in Virtual SAN. Rob mentioned they have a relatively large environment (2500+ VMs) and a big believer of “converged infrastructure”. They’ve been running on converged infrastructure for a long time and they feel the hyper-converged is that next step that will offer them more flexibility against lower cost. (More explicitly, implementation of VSAN resulted in 50-60% cost reduction for United Utilities.) The big challenge with “traditional converged solutions” for them was most definitely the huge up front cost and scale. Virtual SAN offers both scale up and scale out per host, making it far more flexible than any other solution out there. Rob mentioned they had looked at other hyper-converged solutions but weren’t convinced by running a Virtual Storage Appliance on top of vSphere, they wanted to keep things simple and as close to the hypervisor as possible. When they heard first about Virtual SAN they were sold, they loved the concept of having a storage solution tightly integrated with their virtualization platform.
United Utilities was already a big VMware customer and one of the bigger challenges they’ve faced (and are still facing to a certain extend) was much more organizational than it is technical. They have a traditional IT team and are very much siloed. They have a big VMware team, with a lot of in-depth knowledge and Virtual SAN helped them to get around these silos. It did mean the team had to think about things in a different way as Virtual SAN introduces some new concepts. Nothing they couldn’t overcome but they did notice that things like for instance disk controllers are all of a sudden really important.
Key take away: Spend the time upfront researching hardware options, educate yourself / team as importance of components may change.
I asked Rob how Virtual SAN was holding up, if there were was anything that surprised him. He told me they have a billing system which holds between 6 to 7 million customers. Their billing cycle is a slow process and running on a physical machine with a high-end enterprise storage system takes about 22 hours to complete end-to-end. When they started testing the same billing process on a virtual environment, while still leveraging the same storage platform, they were capable of reducing the 22 hours down to 14 – 16 hours. A big win for Rob and the team. Of course the team wanted to know what would happen when this same workload would run on top of Virtual SAN, they were hoping to get at least the same as when using the high-end enterprise storage solution and I guess it is fair to say that they were shocked when they saw that the exact same run completed in under 3 hours. Yes, down from 22 hours to under 3 hours is a huge win indeed.
When I asked Rob what their environment looked like he said they have over 2500 VMs still on legacy converged infrastructure and over 650 VMs on Virtual SAN. Plan is to migrate those 2500 VMs over soon, in the upcoming 12 months. As they are a very traditional company they also still have many physical machines which they are also hoping to virtualize soon. A big undertaking and of course “Virtual SAN first” policy applies here.
I asked Rob what their Virtual SAN hosts looks like. Rob said they have clusters up to the size of 8. They’ve always tried to keep failure domains small and applied the same logic to Virtual SAN, strictly speaking from a VSAN approach this isn’t needed but I can understand why they do it. Each host in those clusters has 5 disk groups, with each disk group having a 200GB SSD and three 1.2TB SAS disks. That is 15 disks per host with a total of 18TB worth of storage capacity and 1TB worth of flash. Currently United Utilities uses Dell R730XD for their VSAN clusters, but as Rob stressed the main reason for going hyper-converged and software based is to have the ability to change hardware when they want. All logic should reside in software and the software solution should not limit them to a couple of options but instead provide them flexibility to chose whatever they want.
Key take away: Hardware should be easily replaceable. All logic should reside in software to avoid lock in as much as possible.
We also discussed how United Utilities handled things like providing extra availability for those services needing it. Rob mentioned that they have services which from an application standpoint are clustered, but also have apps which are not and for those apps United Utilities uses vSphere HA and Veeam Replication. Not just for DR, but also for workload mobility. Nice solution if you ask me, and once again not tied to any hardware platform.
Before we wrapped up Rob mentioned one other thing he was impressed about, when they started working with Virtual SAN they had issues with the driver/firmware of their disk controller. What surprised Rob was how engaged the engineering team was, direct contact with engineers to see how the problem can be resolved is not something you experience too often.
I want to thank Rob for taking the time to provide some more insights in their journey and infrastructure, and I think it is safe to say that Virtual SAN is breaking down silos for United Utilities!