Two weeks ago I spoke with Anthony Spiteri about Virtual SAN and how he uses it and why he uses it. For those who don’t know Anthony, he is an architect at a service provider called Zettagrid, he is an avid blogger and spends some time on twitter now and then. Make sure to bookmark his blog and follow him on twitter, he is a smart guy. I wanted to chat with him just to understand why they selected VSAN as their storage solution for their Management environment.
Anthony mentioned that when he joined Zettagrid they weren’t using dedicated management clusters. As most of you know who manage larger infrastructures, separating production workloads from the management stack can be very useful. You don’t want your management solution contending for CPU/Memory resources, and you surely don’t want any production outage impact your management cluster… Like for instance a storage outage. Which is exactly what happened in Anthony’s case, a storage outage took out (some of) the management components, which in its turn made it impossible to figure out what was going on, a situation you don’t want to ever encounter as a service provider. Luckily they managed to figure it out relatively quick, but it did made them see a change was needed.
What better time to introduce a new concept like hyper-converged and create a self-contained management environment? Anthony mentioned that he had looked at two different platforms but decided to go for VSAN. The reason was straight forward, they did a large amounts of tests and they simply couldn’t break it. It just worked, and it worked in a dead easy way, which also meant that when this would be taken in to production the learning curve would be tiny for the operational guys.
As a hardware platform Dell FX2 is used, I am a big fan of this platform and fully understand why they picked it. 4 nodes in 2u, which even includes switching, so for VSAN this means you can keep the traffic in the chassis with these smaller “4 node management” pods. Zettegrid decided to deploy 3 of these pods and each of them will run services like vCenter Server, vCloud Director, SQL, AD, Veeam Backup etc. Nice solution if you ask me.
We also spoke about pricing, although not part of my responsibilities it is always interesting to see how a solution works out from a TCO/ROI stance. I still recall exchanging some messages with Anthony about the VSPP pricing, and he mentioned it was on the high side. Needless to say, but the recent pricing changes definitely make VSAN a no-brainer for Service Providers. The points cut in half and the billing is one based on what is “used” versus what is “allocated”, and believe me (actually believe Anthony) that makes a huge difference! Such a big difference, Anthony said that they will definitely be looking at VSAN for their Cloud Resources as well.
Thanks Anthony for taking the time. Always good to hear back from customers.
PS: There is an official VSAN reference story coming out soon as well coincidentally, I will link to that as soon as I have received it.
Duncan with respect – we are currently in the phase of bringing a 10 node VSAN cluster to life. Concept has been validated by VMware and we began the implementation and directly ran into several issues which broke our cluster day by day.
Some of them could be fixed by simply but carefully reading the VMware knowledgebase. You might need plenty of time because there are a lot of KBs regarding VSAN. Which is in fact positive to have the pitfalls documented. But we also have been introduced to several advanced parameters which are not documented.
During troubleshooting we engaged VMware Business Critical Support because our old storage system is heavily overloaded which is a show stopper for nearly all IT projects in our company.
Load testing and log uploads every day. Today – after 41 days of troubleshooting VMware engaged an escalation team because logs have released an error message which is pointing to a memory leak.
We also looked at other products and we liked VSAN. But it is not so easy as all the tech marketing blogosphere is hyping it.
Please provide some real world feedback and please also engage in posts at VMTN which maybe do not sound like a success story in the first appearance.
Duncan Epping says
Hi Daniel, I talk to customers on a daily basis and I do follow various threads on VMTN, however my role doesn’t allow me to be an (as) active community member as I used to be. (However I am still in the top three, which should say something.) I know there have been challenges with certain disk controllers. Let me drop you an email and get some more details around the challenges you are facing to figure out how we can move you forward. Thanks for being a customer, and thanks for providing feedback.
Anders Hansen says
Funny to read this post. We are about to do exactly the same as zettagrid, deploying a VSAN (streched) cluster for our management VM’s. We are also a service provider and had a incident in our infrastructure last year which resulted in our monitoringsystem being partly down. Its a 10 node cluster on HPE DL380 Gen9 hardware. Looking very much forward to see it in action.