• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

VMware

Virtual SAN going offshore

Duncan Epping · Aug 17, 2015 ·

Over the last couple of months I have been talking to many Virtual SAN customers. After having spoken to so many customers and having heard many special use cases and configurations I’m not easily impressed. I must say that half way during the conversation with Steffan Hafnor Røstvig from TeleComputing I was seriously impressed. Before we get to that lets first look at the background of Steffan Hafnor Røstvig and TeleComputing.

TeleComputing is one of the oldest service providers in Norway. They started out as an ASP with a lot of Citrix expertise. In the last years they’ve evolved more to being a service provider rather than an application provider. Telecomputing’s customer base consists of more than 800 companies and in excess of 80,000 IT users. Customers are typically between 200-2000 employees, so significant companies. In the Stavanger region a significant portion of the customer base is in the oil business or delivering services to the Oil business. Besides managed services, TeleComputing also has their own datacenter they manage and host services in for customers.

Steffan is a solutions architect but started out as a technician. He told me he still does a lot of hands-on, but besides that also supports sales / pre-sales when needed. The office he is in has about 60 employees. And Steffan’s core responsibility is virtualization, mostly VMware based! Note that TeleComputing is much larger than those 60 employees, they have about 700 employees worldwide with offices in Norway, Sweden and Russia.

Steffan told me he got first introduced to Virtual SAN when it was just launched. Many of their offshore installation used what they call “datacenter in a box” solution which was based on IBM Bladecenter. Great solution for that time but there were some challenges with it. Cost was a factor, rack size but also reliability. Swapping parts isn’t always easy either and that is one of the reasons they started exploring Virtual SAN.

For Virtual SAN they are not using blades any longer but instead switched to rack mounted servers. Considering the low number of VMs that are typically running in these offshore environments a fairly “basic” 1U server can be used. With 4 hosts you will now only take up 4U , instead of the 8 or 10U a typical blade system requires. Before I forget, the hosts itself are Lenovo x3550 M4’s with one S3700 Intel SSD of 200GB and 6 IBM 900GB 10K RPM drives. Each host has 64GB of memory and two Intel E5-2630 6 core CPUs. It also uses an M5110 SAS controller. Especially in the type of environments they support this is very important, on top of that the cost is significantly lower for 4 rack mounts vs a full bladecenter. What do I mean with type of environments? Well as I said offshore, but more specifically Oil Platforms! Yes, you are reading that right, Virtual SAN is being used on Oil Platforms.

For these environments 3 hosts are actively used and a 4th host is just there to serve as a “spare”. If anything fails in one of the hosts the components can easily be swapped, and if needed even the whole host could be swapped out. Even with a spare host the environment is still much cheaper than compared to the original blade architecture. I asked Steffan if these deployments were used by staff on the platform or remotely. Steffan explained that staff “locally” can only access the VMs, but that TeleComputing manages the hosts, rent-an-infrastructure or infrastructure as a service is the best way to describe it.

So how does that work? Well they use a central vCenter Server in their datacenter and added the remote Virtual SAN clusters connected via a satellite connection. The virtual infrastructure as such is completely managed from a central location. Not just virtual, also the hardware is being monitored. Steffan told me they use the vendor ESXi image and as a result gets all of the hardware notification within vCenter Server, single pane of glass when you are managing many of these environments like these is key. Plus it also eliminates the need for a 3rd party hardware monitoring platform.

Another thing I was interested in was knowing how the hosts were connected, considering the special location of the deployment I figured there would be constraints here. Steffan mentioned that 10GbE is very rare in these environments and that they have standardized on 1GbE. Number of connection is even limited and today they have 4 x 1GbE per server of which 2 are dedicated to Virtual SAN. The use of 1GbE wasn’t really a concern, the number of VMs is typically relatively low so the expectation was (and testing and production has confirmed) that 2 x 1GbE would suffice.

As we were wrapping up our conversation I asked Steffan what he learned during the design/implementation, besides all the great benefits already mentioned. Steffan said that they learned quickly how critical the disk controller is and that you need to pay attention to which driver you are using in combination with a certain version of the firmware. The HCL is leading, and should be strictly adhered to. When Steffan started with VSAN the Healthcheck plugin wasn’t released yet unfortunately as that could have helped with some of the challenges. Other caveat that Steffan mentioned was that when single device RAID-0 sets are being used instead of passthrough you need to make sure to disable write-caching. Lastly Steffan mentioned the importance of separating traffic streams when 1GbE is used. Do not combine VSAN with vMotion and Management for instance. vMotion by itself can easily saturate a 1GbE link, which could mean it pushes out VSAN or Management traffic.

It is fair to say that this is by far the most exciting and special use case I have heard for Virtual SAN. I know though there are some other really interesting use cases out there as I have heard about installations on cruise ships and trains as well. Hopefully I will be able to track those down and share those stories with you. Thanks Steffan and TeleComputing for your time and great story, much appreciated!

Shift in focus… Go Storage & Availability OCTO!

Duncan Epping · Jul 8, 2015 ·

Almost a year ago I joined the Office of CTO under Paul Strong. My main focus was SDDC, but I naturally gravitated towards the core platform (vSphere) and software defined storage and topics like availability. Not just my personal preference, but also a common requested topic for public speaking engagements. Most VMUG speaking requests I receive are around VSAN, VVols or vSphere HA. Each year I take some time to reflect on where I am, what I do, and where I want to go. This year I asked myself what really excited me in todays world of IT/infrastructure? What am I most passionate about? What do I enjoy talking and writing about the most?

Having written books on Virtual SAN and vSphere Clustering, and countless blog posts on the topic of software defined storage, BC/DR and availability it was pretty obvious what I am most passionate about. I like talking and writing about Virtual SAN, Virtual Volumes, Site Recovery Manager and it is safe to say that I am a vSphere HA fanboy. I am most passionate about Storage & Availability, that much was obvious

At an internal event I had a conversation with Charles Fan and Christos Karamanolis. The Storage & Availability BU was considering creating an Office of CTO and they asked if I would be interested in collaborating in some shape or form. For me this was a no-brainer. Knowing what is coming for Virtual SAN and Virtual Volumes (and future products we are working on) I asked myself if collaborating would be the best option or if I should take that next step. The decision was easy, as of this week I have officially joined the Office of CTO of the Storage & Availability BU.

In the Office of CTO I will be responsible for connecting our R&D team with customers, partners and our field. I will be evangelizing software defined storage and availability, primarily in EMEA and APJ. I will focus on defining and communicating VMware’s vision and strategy, and be an active advisor for our product roadmap and portfolio. I can’t be more excited than this, I am super enthusiastic about all what is to come out of our business unit and it is extremely energizing to say the least to talk to our customers about what we do today and what is coming tomorrow. As a big plus I get to work with my friend Rawlinson Rivera once again, and report in to someone I greatly respect namely Christos who will be heading up the team. Make sure to read Christos’s blog post on the team that has being formed and some hints of what you can expect in the future. Lets get busy!

Thanks Charles, Christos and Paul for this great opportunity!

vSphere Metro Storage Cluster with vSphere 6.0 paper released

Duncan Epping · Jul 8, 2015 ·

I’d already blogged about this on the VMware blog, but I figured I would share it here as well. The vSphere Metro Storage Cluster with vSphere 6.0 white paper has been released. I worked on this paper together with my friend Lee Dilworth, it is an updated version of the paper we did in 2012. It contains all of the new best practices for vSphere 6.0 when it comes to vSphere Metro Storage Cluster implementations, so if you are looking to implement one or upgrade an existing environment make sure to read it!

VMware vSphere Metro Storage Cluster Recommended Practices

VMware vSphere Metro Storage Cluster (vMSC) is a specific configuration within the VMware Hardware Compatibility List (HCL). These configurations are commonly referred to as stretched storage clusters or metro storage clusters and are implemented in environments where disaster and downtime avoidance is a key requirement. This best practices document was developed to provide additional insight and information for operation of a vMSC infrastructure in conjunction with VMware vSphere. This paper explains how vSphere handles specific failure scenarios, and it discusses various design considerations and operational procedures. For detailed information about storage implementations, refer to documentation provided by the appropriate VMware storage partner.

Synchronet leverages Virtual SAN to provide scale, agility and reduced costs to their customers

Duncan Epping · Jun 11, 2015 ·

This week I had the pleasure to talk to John Nicholson who works for one of our partners (Synchronet out of Houston). John has been involved with various Virtual SAN implementations and designs and I felt that it would make for an interesting conversation. John in my opinion is a true datacenter architect, he has a good understanding of all aspects and definitely has a lot of experience with different storage platforms (both traditional and hyper-converged). Something I did not mention during our conversation, but the answers John was giving me to some of the questions were most definitely VCDX-level. (If you can find the time, do it John :-)) Below is John’s bio, make sure to follow him on twitter:

John Nicholson vExpert (2013-2015) is the manager of client services for Synchronet.  He oversees the professional services who deploy cutting edge virtualization, VDI, and storage solutions for customers as well as the managed services who keep these environments running smoothly.  He enjoys a deep dive into the syslog, and can telepathically sense slow and undersized storage.

First customer / project we discussed was a Virtual SAN environment for a construction company. The environment was build on top of Dell R720s and they have 400GB flash capacity in each node and 7x 1.2TB 10K RPM. In this environment MS SQL is running on top of Virtual SAN and Exchange. The SQL database is used for ~ 1000 customers as part of a real time bidding and tracking solution. As you can imagine reliability and predictable performance is key in this environment. Also hosted on Virtual SAN is their ERP system and it is also used for their development environment for their end-customer applications.

What was interesting with this particular project is that there were some strange performance anomalies, as you can imagine Virtual SAN being a new product was a suspect but after troubleshooting the environment they found out that there was a mismatch driver/firmware mismatch for the 10GbE Intel NICs they were using. Further investigation revealed that all types of traffic were impacted. John wrote about it on their corporate blog here, worth reading if you are using the Intel X540 10GbE NICs.

Key take away: Always verify driver / firmware combination and compatibility as it can have an impact.

What pleased John and the customer the most is probably the performance Virtual SAN is providing. Especially when it comes to latency, or should I say the lack of latency as they are hitting sub millisecond numbers. They’ve been so happy running Virtual SAN in their environment that they’ve just purchased new hosts and a DR site with VSAN is being implemented this week. The DR site will be used at first to test VSAN 6.0 and when proven stable and reliable the production environment will be upgraded to 6.0 and the DR site will be configured for DR purposes leveraging vSphere Replication. I asked John how they went about advising the customer to leverage a virtual replication technology which is asynchronous and John mentioned that as part of their advisory/consultancy services they have business analysts on-board which will assess what the cost of down time is and map out the cost of availability and decide a solution based on that outcome. Same applies to de-duplication by the way, what is the price of disk, what is the dedupe ratio, does it make sense in your environment?

While discussing this project John mentioned that he has worked with customers in the past which had two or three IT folks of which one being a dedicated storage admin, primarily because of the complexity of the environment and the storage system. In todays world with solutions like Virtual SAN that isn’t needed any longer and the focus of IT people should be enabling the business.

During our discussion about networking John mentioned that Synchronet has a long history with IP based storage solution (primarily iSCSI), and based on their experience top grade switches were in absolute must when deploying these types of storage. While talking to some of the Virtual SAN engineers John asked about how Virtual SAN would handle switches which have a lower “PPS” (packets per second). The Virtual SAN team mentioned that VSAN was less prone to the common issues faced in iSCSI/NFS environments, John being the techie that he is of course was skeptical and wanted to test this for himself. The results were published in this white paper, fair to say that John and his team were impressed with how Virtual SAN handled itself indeed with relatively cheap switches. For the majority of Virtual SAN deployments their typical customer setup is leveraging 2 VMkernel interfaces each connected to a different switch so that traffic isn’t going outside of the switch, this is what it would look like for those interested:

Host 1 / NIC-A —> Switch-A
Host 2 / NIC-A —> Switch-A
Host 3 / NIC-A —> Switch-A
Host 1 / NIC-B —> Switch-B
Host 2 / NIC-B —> Switch-B
Host 3 / NIC-B —> Switch-B

The second project John mentioned was for a software startup in the healthcare space. They’ve been doing a lot of mergers and acquisitions. Initially they wanted to get 6 VMs up and running but with the ability to scale-up and scale-out when needed. I found the “scale-up” comment interesting so asked what John was referring to. In this scenario the server configuration used was SuperMicro Fat Twin initially deployed with 3 hosts using a single socket and 800GB of flash capacity and two NL-SAS drives. As the company started growing of course the number of virtual machines increased and they have over 70 VMs running currently, simply achieved by adding an additional CPU in each box and add some drives. The question I had was what about flash capacity then compared to disk capacity? John said that they started out with flash overprovisioned simply to allow them to scale-up when required. Especially in the merger and acquisition space where the growth pattern is unknown this is a huge advantage of a solution like Virtual SAN which allows you to both scale-out and scale-up when required. Compared to traditional storage systems this model worked very well as they avoided the huge up front cost (50K USD – 100K USD not uncommon). On top of that, John said that with the majority of storage systems a big discount is given during the initial purchase but when it is time to add a disk shelve that discount has magically disappeared. Also, with traditional storage systems you can fairly easily reach the limits of a storage controller and be stuck with a system which can’t scale to the size you need it to scale. Another problem that disappears when leveraging VSAN.

Key take away: Large upfront costs can be avoided while offering flexibility in terms of scaling and sizing

Synchronet isn’t just an implementation partner by the way, they also do managed services and one of the things they are doing for instance is monitor customer environments leveraging Log Insight. This includes monitoring Virtual SAN, and they’ve created custom dashboard so that they can respond to  issues like for instance when a snapshot removal has failed and solve the problem before an issue arises as a result of it. They can go as far as monitoring the raw syslog feeds if needed, but each time a problem occurs in any environment this is recorded and custom dashboards and warnings are created so that every customer immediately benefits from it. For some customers they even do full management of the vSphere environment.

We had some small talk about VDI. John mentioned that VSAN is great for PoC’s and small test environments because it is easy to get in their, use it and then grow it as soon as the PoC / test has completed. Especially the price per desktop licensing is really handy as it keeps the cost down initially, and at the same time the customer knows what it is paying and getting. From an architectural point of view John mentioned that the majority of their customers use non-persistent desktops and as such the Virtual SAN environment looks different then the traditional server VM environments. Typically less disk capacity and higher flash capacity to ensure performance.

Before we wrapped up there was one thing I was interested in knowing, that was if they tweaked any of the Virtual SAN related settings (within the storage policy or for instance advanced settings). John mentioned that they would tweak the number of stripes per VM from 1 to 3 by default. This is primarily to speed up the backup with Virtual SAN 5.5, preliminary tests are showing though that with Virtual SAN and the new snapshotting mechanism this isn’t needed any longer. While talking about striping John also mentioned that for their hosting services the one thing that stood out to him is that Virtual SAN was performing so well that the customers paying for a lower tier of storage were actually getting a lot more storage performance resources then they paid for and the storage policies were used to ensure that a tier 2 VM wouldn’t receive more resources than a tier 1 VM, pretty neat problem to have I guess.

Key take away: Increasing stripe-width with Virtual SAN 5.5 can have a positive impact on performance. With 6.0 this appears no longer needed.

Last thing John wanted to mention was the VIP Tool (https://vip.vmware.com/). He said it helped them immense figuring out how much data was active and designing / sizing Virtual SAN environments for customers. I think it is fair to say that John (and Synchronet) has had huge success introducing Virtual SAN to their customers and deploying it there where applicable. Thanks John for taking the time, and thanks for being a great VMware and Virtual SAN advocate!

VSAN and large VMDKs on relative small disks?

Duncan Epping · Jun 4, 2015 ·

Last week and this week I received a question and as it was the second time in a short time I figured I would share it. The question was around how VSAN places a VMDK which is larger than the disks. Lets look at a diagram first as that will make it obvious instantly.

If you look at the diagram you see these stripes. You can define the number of stripes in a policy if you want. In the example above, the stripe width is 2. This is not the only time when you can see objects being striped though. If an object (VMDK for instance) is larger than 256GB it will create multiple stripes for this object. Also, if a physical disk is smaller than the size of the VMDK it will create multiple stripes for that VMDK. These stripes can be located on the same host as you can see in the diagram but also can be across hosts. Pretty cool right.

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 49
  • Page 50
  • Page 51
  • Page 52
  • Page 53
  • Interim pages omitted …
  • Page 124
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Also visit!

For the Dutch-speaking audience, make sure to visit RunNerd.nl to follow my running adventure, read shoe/gear/race reviews, and more!

Do you like Hardcore-Punk music? Follow my Spotify Playlist!

Do you like 80s music? I got you covered!

Copyright Yellow-Bricks.com © 2026 · Log in