Startup News Flash part 6

There we are again and just a short one this time, Startup News Flash part 6. VMworld Europe is around the corner so I expect a bit more news next week, I know of at least 1 company revealing what they have been working on… So what happened in the world of flash/startups the last three weeks?

Fresh:

My buddies over at Tintri just announced two new products. The first one being the Tintri VMstore T600 series, with the T620 providing 13.5 TB of usable capacity and the T650 providing 33.5 TB of usable capacity, allowing you to run up to 2000 VMs(T650, the T620 goes up to 500 VMs) on these storage systems. What is unique about Tintri is how they designed their system, FlashFirst and VM-aware as they call it. Allowing for sub-millisecond latencies with over 99% IO coming out of flash, and of course VM-granular quality of service and data management (snapshots, cloning, and replication). Second announcement is all about management: Tintri Global Center. Let me take a quote from their blog, as it says it all: “The first release of Tintri Global Center can administer up to 32 VMstore systems and their resident VMs. Future versions will add additional control beyond monitoring and reporting with features — such as policy based load balancing and REST APIs to facilitate customized automation/scripts involving a combination of features across multiple VMstore systems such as reporting, snapshots, replication, and cloning. ”

Atlantis seems to be going full steam ahead announcing partnership with NetApp and Violin recently. I guess what struck me personally with these announcements is that we are bringing “all flash arrays” (AFAs) and “memory caching” together and it makes you wonder where you benefit from what the most. It is kind of like a supersized menu at McD, after ordering I always wonder if it was too much. But to be honest I have to read the menu in more detail, and maybe even try it out before I draw that conclusion. I do like the concept of AFAs and I love the concept of Atlantis… It appears that Atlantis is bringing in functionality which these solutions are lacking for now, and of course crazy performance. If anyone has experience with the combination, feel free to chime in!

Some older news:

  • Nothing to do with technology but more about validation of technology and a company. Vaughn Stewart, former NetApp executive, announced he joined Pure Storage as their Chief Evangelist. Pure Storage went all out and create an awesome video which you can find in this blog post. Nice move Vaughn, and congrats Pure Storage.
  • The VSAN Beta went live last week and the community forums opened up. If you want to be a part of this, don’t forget to sign up!

 

Designing your hardware for Virtual SAN

Over the past couple of weeks I have been watching the VMware VSAN Community Forum with close interest and also twitter. One thing that struck me was the type of hardware people used for to test VSAN on. In many cases this is the type of hardware one would use at home, for their desktop. Now I can see why that happens, I mean something new / shiny and cool is released and everyone wants to play around with it, but not everyone has the budget to buy the right components… And as long as that is for “play” only that is fine, but lately I have also noticed that people are looking at building an ultra cheap storage solution for production, but guess what?

Virtual SAN reliability, performance and overall experience is determined by the sum of the parts

You say what? Not shocking right, but something that you will need to keep in mind when designing a hardware / software platform. Simple things can impact your success, first and foremost check the HCL, and think about components like:

  • Disk controller
  • SSD / PCIe Flash
  • Network cards
  • Magnetic Disks

Some thoughts around this, for instance the disk controller. You could leverage a 3Gb/s on-board controller, but when attaching lets say 5 disks to it and a high performance SSD do you think it can still cope or would a 6Gb/s PCIe disk controller be a better option? Or even leverage 12Gb/s that some controllers offer for SAS drives? Not only can this make a difference in terms of number of IOps you can drive, it can also make a difference in terms of latency! On top of that, there will be a difference in reliability…

I guess the next component is the SSD / Flash device, this one is hopefully obvious to each of you. But don’t let these performance tests you see on Tom’s or Anandtech fool you, there is more to an SSD then just sheer IOps. For instance durability, how many writes per day for X years life can your SSD handle? Some of the enterprise grades can handle 10 full writes or more per day for 5 years. You cannot compare that with some of the consumer grade drives out there, which obviously will be cheaper but also will wear out a lot faster! You don’t want to find yourself replacing SSDs every year at random times.

Of course network cards are a consideration when it comes to VSAN. Why? Well because I/O will more than likely hit the network. Personally, I would rule out 1GbE… Or you would need to go for multiple cards and ports per server, but even then I think 10GbE is the better option here. Most 10GbE are of a decent quality, but make sure to check the HCL and any recommendations around configuration.

And last but not least magnetic disks… Quality should always come first here. I guess this goes for all of the components, I mean you are not buying an empty storage array either and fill it up with random components right? Think about what your requirements are. Do you need 10k / 15k RPM, or does 7.2k suffice? SAS vs SATA vs NL-SATA? Also, keep in mind that performance comes at a cost (capacity typically). Another thing to realize, high capacity drives are great for… yes adding capacity indeed, but keep in mind that when IO needs to come from disk, the number of IOps you can drive and your latency will be determined by these disks. So if you are planning on increasing the “stripe width” then it might also be useful to factor this in when deciding which disks you are going to use.

I guess to put it differently, if you are serious about your environment and want to run a production workload then make sure you use quality parts! Reliability, performance and ultimately your experience will be determined by these parts.

<edit> Forgot to mention this, but soon there will be “Virtual SAN” ready nodes… This will make your life a lot easier I would say.

</edit>

Where is that Yellow-Bricks dude hanging out during VMworld?

Some people have been asking what my agenda looks like, when my sessions are and if there are any specific social events I am likely to attend… Well here you go:

My sessions / group discussions:

Social events I am may attend, and no… unfortunately I cannot get you tickets to these:

  • Sunday, Oct 13, evening: nothing planned
  • Monday, Oct 14, evening: VMUG party, VMware Ireland, Pernix Data
  • Tuesday, Oct 15, evening: CTO party, VMware Benelux, Veeam
  • Wednesday, Oct 16, evening: VMworld party

Hoping I will be able to attend the following sessions / discussion groups:

  • Wednesday, Oct 16, 12:30 – 13:30 – Hall 8.0, Room C1 – Group Discussion: Stretched Clusters with Lee Dilworth
  • Wednesday, Oct 16, 12:30 – 13:30 – Hall 8.0, Room F2 – Session: Performance and Capacity Management of DRS Clusters with Anne and Ganesha (both VMware engineers)
  • Wednesday, Oct 16, 14:00 – 15:00 – Hall 8.0, Room C2 – Group Discussion: VSAN with Cormac Hogan
  • Wednesday, Oct 16, 15:30 – 16:30 – Hall 8.0, Room C2 – Group Discussion: Disaster Recovery and Replication with Ken Wernerburg
  • Wednesday, Oct 16, 15:30 – 16:30 – Hall 8.0, Room D4 – Session: Storage DRS: Deep Dive and Best Practices to Suit Your Storage Environments with Mustafa and Sachin (both VMware engineers)
  • Wednesday, Oct 16, 17:00 – 18:00 – Hall 8.0, Room A3 – Session: Building a google-like infrastructure for the enterprise with Raymon Epping
  • Thursday, Oct 17, 9:00 – 10:00 – Hall 8.0, Room C1 – Group Discussion: Software Defined Storage with Rawlinson Rivera and Cormac Hogan
  • Thursday, Oct 17, 10:30 – 11:30 – Hall 8.0, Room G3 – Session: DRS: New Features, Best Practices and Future Directions (VMware engineer

HA Futures: Pro-active response

We all know (at least I hope so) what HA is responsible for within a vSphere Cluster. Although it is great that vSphere HA responds to a failure of a host / VM / application and even in some cases your storage device; wouldn’t it be nice if vSphere HA could pro-actively respond to conditions which might lead to a failure? That is what we want to discuss in this article.

What we are exploring right now is the ability for HA to avoid unplanned downtime. HA would detect specific (health) conditions that could lead to catastrophic failures and pro-actively move virtual machines of that host. You could for instance think of a situation where 1 out of 2 storage paths goes down. Although not directly impacting the machines from an availability perspective, it could be catastrophic if that second path goes down. So in order to avoid ending up in this situation vSphere HA would vMotion all the virtual machines to a host which does not have a failure.

This could of course also apply to other components like networking or even memory or CPU. You could potentially have a memory dimm which is reporting specific issues that could impact availability, this in its turn could then trigger HA to pro-actively move all potentially impacted VMs to a different host.

A couple of questions we have for you:

  1. When such partial host failures occur today, how do you address these conditions? When do you bring the host back online?
  2. What level of integration do you expect with management tools? In other words, should we expose an API that your management solution can consume, or do you prefer this to be a stand-alone solution using a CIM provider for instance?
  3. Should HA treat all health conditions the same? I.e., always evacuate all VMs from an “unhealthy” host?
  4. How would you like HA to compare two conditions? E.g., H1 fan failure, H2 network path failure?

Please chime in,

Virtual SAN news flash pt 1

I had a couple of things I wanted to write about with regards to Virtual SAN which I felt weren’t beefy enough to dedicate a full article to so I figured I would combine a couple of news worthy items and create a Virtual SAN news flash article / series.

  • I was playing with Virtual SAN last week and I noticed something I hadn’t noticed yet… I was running vSphere with an Enterprise license and I added the Virtual SAN license for my cluster. After adding the Virtual SAN license all of a sudden I had the Distributed Switch capability on the cluster I had VSAN licensed. Now I am not sure what this will look like when VSAN will go GA, but for now those who want to test with VSAN and want to use the Distributed Switch you can. Use the Distributed Switch to guarantee bandwidth (leveraging Network IO Control) to Virtual SAN when combining different types of traffic like vMotion / Management / VM traffic on a 10GbE pair. I would highly recommend to start playing around with this and get experienced. Especially because vSphere HA traffic and VSAN traffic are combined on a single NIC pair and you do not want HA traffic to be impacted by replication traffic.
  • The Samsung SM1625 SSD series (eMLC) has been certified for Virtual SAN. It comes in sizes ranging between 100Gb and 800GB and can do up to 120k IOps random read… Nice to see the list of supported SSDs expanding, will try to get my hands on one of these at some point to see if I can do some testing.
  • Most people by now are aware of the challenges there were with the AHCI controller. I was just talking with one of the VSAN engineers who mentioned that they have managed to do a full root cause analysis and pinpoint the root of this problem. Currently there is a team working on solving it and things are looking good and hopefully soon a new driver will be released, when we do I will let you guys know as I realize that many use these controllers in their home-lab.