vSwitch Traffic Shaping, what is what?

I was troubleshooting an issue where vMotion would time-out constantly, I had no clue where it was coming from so I started digging. In this case the environment was using a regular vSwitch and 10GbE networking. When I took a closer look I noticed that some form of traffic shaping was applied, as unfortunately the Distributed vSwitch was not an option for this environment. Now traffic shaping was enabled and the peak value was specified and the rest was left to the default value… and unfortunately this is exactly what cause the problem.

So when it comes to vSwitch Traffic Shaping, what is what? There are 3 settings you can set per portgroup:

  • Average Bandwidth – specified in Kbps
  • Peak Bandwidth – specified in Kbps
  • Burst Size – specified in KB

So if you have a 10Gbps NIC port for your traffic this means you have a total of 10,485,760 Kbps. When you enable vSwitch Traffic Shaping by default it is set to have “Average Bandwidth” to 100,000 Kbps , Peak Bandwidth to 100,000 Kbps and Burst Size to 1024,00 KB. So what does that mean? Well it means that if you enable it and do not change the values that the traffic is limited to 100,000 Kbps. 100,000 Kbps is… yes roughly 100Mbps, even less to be more precise: 97.6Mbps. Which is not a lot indeed, and not even a supported configuration for vMotion.

So what if I simply bump up the Peak Bandwidth to lets say 5Gbps, as I do not want vMotion to ever consume more than half of the NIC port (note, vSwitch traffic shaping is only for egress aka outbound traffic). Well setting the peak bandwidth sounds like it may do something, but probably not what you would hope for as this is how the settings are applied:

By default the traffic stream will get what is specified by “Average Bandwidth”. However, it is possible to exceed this when needed by specifying a higher “Peak Bandwidth” value. Your traffic will be allowed to burst until the value of “Burst Size” has been exceeded. In other words, in the above example when only Peak Bandwidth is increased this would lead to the following: By default the traffic is limited to 100Mbps, however it can peak to 5Gbps but only for 100MB worth of data traffic. As you can imagine in the case of vMotion when the full memory content of a VM is transferred that 100MB is hit within a second, after which the vMotion process is throttled back to 100Mbps and the remainder of the VM memory takes ages to copy and eventually times out.

So if you apply traffic shaping using your vSwitch, make sure to think through the numbers. In the above scenario for instance, specifying a 5Gbps Average and Peak would be what was desired.

New beta of the vSphere 5.5 U1 Hardening Guide released

Mike Foley just announced the new release of the vSphere 5.5 U1 Hardening Guide. Note that it is still labeled as a “beta” as Mike is still gathering feedback, however the document should be finalized first week of June.

For those concerned about security, this is an absolute must read! As always, before implementing ANY of these recommendations make sure to test them on a test cluster and test expected functionality of both the vSphere platform and the virtual machines and applications running on top of it.

Nice work Mike!

Updating LSI firmware through the ESXi commandline

I received an email this week from one of my readers / followers on twitter who had gone through the effort of upgrading his LSI controller firmware. He shared the procedure with me as unfortunately it wasn’t well documented. I hope this will help others in the future, I know it will help me as I was about to look at the exact same for my VSAN environment, thanks for sharing this Tom!

— copy / paste from Tom’s document —

We do quite a bit of virtualization and storage validation and performance testing in the Taneja Group Labs (http://tanejagroup.com/). Recently, we were performing some tests with VMware’s VSAN and due to some performance issues we were having with the AHCI controllers on our servers we needed to revise our environment to add some LSI SAS 2308 controllers and attach our SSD and HDDs to the LSI card. However our new LSI SAS controllers didn’t come with the firmware mandated by the VSAN HCL (they had v14 and the HCL specifies v18) and didn’t recognize the attached drives.  So we set about updating LSI 2308 firmware. Updating the LSI firmware is a simple process and can be accomplished from an ESXi 5.5 U1 server but isn’t very well documented. After updating the firmware and rebooting the system the drives were recognized and could be used by VSAN. Below are the steps I took to update my LSI controllers from v14 to v18. [Read more...]

The Compatibility Guides are now updated with VSAN and vFlash info!

For those wanting to play with Virtual SAN (VSAN) and vSphere Flash Read Cache (vFRC / vFlash), the compatibility guides are being updated at the moment. Hit the following URL to find out what is currently supported and what not:

  • vmware.com/resources/compatibility/
  • For vSphere Flash Read Cache:
    • Select “VMware Flash Read Cache” from the drop down list titled “What are you looking for”.
    • Hit “update and view results”
  • For Virtual SAN:
    • Select “Virtual SAN (beta)” from the drop down list titled “What are you looking for”
    • Select “ESXi 5.5″ and click “Next”
    • Select a category (server, i/o controller, hdd, ssd), at the time of writing only server was available
    • Select the type of Server and click next
    • Now a list is presented of supported servers

I know both lists are short today, this is an on-going efforts and I know many vendors are now wrapping up and submitting their test reports, more to be added over the course of the next couple of weeks so keep on coming back to the compatibility guide.

vSphere 5.5 nuggets: Change Disk.SchedNumReqOutstanding per device!

Always wanted to change Disk.SchedNumReqOutstanding per device instead of per host? Well now with vSphere 5.5 you can! I didn’t know about this either, but my colleague Paudie pointed this out. Useful feature when you have several storage arrays and you need to tweak these values, now lets be clear… I do not recommend tweaking this, but in the case you need to you can now do it per device using esxcli.

Get the current configured value for a specific device:
esxcli storage core device list --device <device>

Set the value for a specific device::
esxcli storage core device set -d <device> -O <value between 1-256>.