Over the last couple of months I have been talking to many Virtual SAN customers. After having spoken to so many customers and having heard many special use cases and configurations I’m not easily impressed. I must say that half way during the conversation with Steffan Hafnor Røstvig from TeleComputing I was seriously impressed. Before we get to that lets first look at the background of Steffan Hafnor Røstvig and TeleComputing.
TeleComputing is one of the oldest service providers in Norway. They started out as an ASP with a lot of Citrix expertise. In the last years they’ve evolved more to being a service provider rather than an application provider. Telecomputing’s customer base consists of more than 800 companies and in excess of 80,000 IT users. Customers are typically between 200-2000 employees, so significant companies. In the Stavanger region a significant portion of the customer base is in the oil business or delivering services to the Oil business. Besides managed services, TeleComputing also has their own datacenter they manage and host services in for customers.
Steffan is a solutions architect but started out as a technician. He told me he still does a lot of hands-on, but besides that also supports sales / pre-sales when needed. The office he is in has about 60 employees. And Steffan’s core responsibility is virtualization, mostly VMware based! Note that TeleComputing is much larger than those 60 employees, they have about 700 employees worldwide with offices in Norway, Sweden and Russia.
Steffan told me he got first introduced to Virtual SAN when it was just launched. Many of their offshore installation used what they call “datacenter in a box” solution which was based on IBM Bladecenter. Great solution for that time but there were some challenges with it. Cost was a factor, rack size but also reliability. Swapping parts isn’t always easy either and that is one of the reasons they started exploring Virtual SAN.
For Virtual SAN they are not using blades any longer but instead switched to rack mounted servers. Considering the low number of VMs that are typically running in these offshore environments a fairly “basic” 1U server can be used. With 4 hosts you will now only take up 4U , instead of the 8 or 10U a typical blade system requires. Before I forget, the hosts itself are Lenovo x3550 M4’s with one S3700 Intel SSD of 200GB and 6 IBM 900GB 10K RPM drives. Each host has 64GB of memory and two Intel E5-2630 6 core CPUs. It also uses an M5110 SAS controller. Especially in the type of environments they support this is very important, on top of that the cost is significantly lower for 4 rack mounts vs a full bladecenter. What do I mean with type of environments? Well as I said offshore, but more specifically Oil Platforms! Yes, you are reading that right, Virtual SAN is being used on Oil Platforms.
For these environments 3 hosts are actively used and a 4th host is just there to serve as a “spare”. If anything fails in one of the hosts the components can easily be swapped, and if needed even the whole host could be swapped out. Even with a spare host the environment is still much cheaper than compared to the original blade architecture. I asked Steffan if these deployments were used by staff on the platform or remotely. Steffan explained that staff “locally” can only access the VMs, but that TeleComputing manages the hosts, rent-an-infrastructure or infrastructure as a service is the best way to describe it.
So how does that work? Well they use a central vCenter Server in their datacenter and added the remote Virtual SAN clusters connected via a satellite connection. The virtual infrastructure as such is completely managed from a central location. Not just virtual, also the hardware is being monitored. Steffan told me they use the vendor ESXi image and as a result gets all of the hardware notification within vCenter Server, single pane of glass when you are managing many of these environments like these is key. Plus it also eliminates the need for a 3rd party hardware monitoring platform.
Another thing I was interested in was knowing how the hosts were connected, considering the special location of the deployment I figured there would be constraints here. Steffan mentioned that 10GbE is very rare in these environments and that they have standardized on 1GbE. Number of connection is even limited and today they have 4 x 1GbE per server of which 2 are dedicated to Virtual SAN. The use of 1GbE wasn’t really a concern, the number of VMs is typically relatively low so the expectation was (and testing and production has confirmed) that 2 x 1GbE would suffice.
As we were wrapping up our conversation I asked Steffan what he learned during the design/implementation, besides all the great benefits already mentioned. Steffan said that they learned quickly how critical the disk controller is and that you need to pay attention to which driver you are using in combination with a certain version of the firmware. The HCL is leading, and should be strictly adhered to. When Steffan started with VSAN the Healthcheck plugin wasn’t released yet unfortunately as that could have helped with some of the challenges. Other caveat that Steffan mentioned was that when single device RAID-0 sets are being used instead of passthrough you need to make sure to disable write-caching. Lastly Steffan mentioned the importance of separating traffic streams when 1GbE is used. Do not combine VSAN with vMotion and Management for instance. vMotion by itself can easily saturate a 1GbE link, which could mean it pushes out VSAN or Management traffic.
It is fair to say that this is by far the most exciting and special use case I have heard for Virtual SAN. I know though there are some other really interesting use cases out there as I have heard about installations on cruise ships and trains as well. Hopefully I will be able to track those down and share those stories with you. Thanks Steffan and TeleComputing for your time and great story, much appreciated!
⚠️ Vaughn Stewart (@vStewed) says
Beautiful use case for VSAN and HCI
Hi Duncan , Do think there is a business case of Hyper converged boxes in a large enterprise customer, who has already invested in SAN Storage ,Switches and can upgrade the Storage vs Buying expensive Nutanix Boxes? or the business case is only for ROBO Sites ?
Duncan Epping says
I think there are plenty of scenarios where hyperconverged would fit, and especially a solution which has a lower entry price then Nutanix. I’ve spoken to plenty of customers now who are moving from traditional storage to VSAN. Actually have an article about that as well coming up.
Jon Retting says
I think it’s an amazing initiative, and thank you for the excellent write up. Personally I would be stressing big time over 1Gbe for VSAN. It is very difficult to deliver according to expectations when 1Gbe is factor. Just to clarify, they are using 3 VSAN hosts, and fourth node for parts? I can understand using 4 VSAN hosts and fifth for spare, but the a reason to have <4 VSAN escapes me. Obviously the solution will work. There seems to be a lot of concessions, probably mostly centered around real-estate. Thanks, -Retting
Duncan Epping says
keep in mind that they have a limited number of VMs running on these Oil Platform. It is not like we are talking 500TB worth of data and hundreds of VMs.
I’ve spoken with various customers doing VSAN with 1GbE, and during normal operations most hardly see a difference between 1GbE and 10GbE. The difference comes during re-syncs when a server breaks.
Keep in mind that these guys have limited space on a rig, so if they can do with less… they will do with less.
Jon Retting says
Well that makes sense. So probably like under 40 VMs running. Don’t think they will have much luck with VDP, as i can’t seem to get it working reliably without 10Gbe. Considering the extreme space limitations I wonder what their recovery space looks like. Hopefully they have an extreme WAN connection, and replicate to an on-shore stack. All in all an oil rig is a very fun problem to tackle, so many nuances makes the project that much better. Thanks again for sharing!
Steffan Hafnor Røstvig says
Thank you for commenting!
We prefer using a fifth esxi host, with local storage, running a single vm with backup software. We use image backup and traditional agent based backup. All backup is stored onsite, but also replicated to on-shore over whatever link is available. If the platform is on the move a VSAT connection is often the only alternative.
Jon Retting says
“If the platform is on the move”, that’s awesome. Left me thinking about water buoys, and the possibility of creating a mesh/repeater Wifi over distance buoy system. On-board battery, with solar/current charging. As I said really fun project to think about. Cheers, -Jon
Have you concidered vce vblock as a soln? It has redundecy built it and works with cloud infrastructures . As a turn key soln, it reduces our staffing load. Plus the RCM s/w support model means they test all patches for us and release only image packages
Duncan Epping says
Not sure I am following your comment. You think deploying a VBlock in this scenario would be what the customer is looking for?