I was reading Scott’s article about using dedicate clusters for management applications. Which was quickly followed by a bunch of quotes turned into an article by Beth P. from Techtarget. Scott mentions that he had posed the original question on twitter if people were doing dedicated management clusters and if so why.
As he mentioned only a few responded and the reason for that is simple, hardly anyone is doing dedicated management clusters these days. The few environments that I have seen doing it were large enterprise environments or service providers where this was part of an internal policy. Basically in those cases a policy would state that “management applications cannot be hosted on the platform it is managing”, and some even went a step further where these management applications were not even allowed to be hosted in the same physical datacenter. Scott’s article was quickly turned in to a “availability concerns” article by Techtarget to which I want to respond. I am by no means a vShield expert, but I do know a thing or two about the product and the platform it is hosted on.
I’ll use vShield Edge and vShield Manager as an example as in Scott’s article vCloud Director is mentioned which leverages vShield Edge. This means that vShield Manager needs to be deployed in order to manage the edge devices. I was part of the team who was responsible for the vCloud Reference Architecture but also part of the team who designed and deployed the first vCloud environment in EMEA. Our customer had their worries as well about resiliency of vShield Manager and vShield Edge, but as they are virtual they can easily be “protected” by leveraging vSphere features. One thing I want to point out though, if vShield Manager is down vShield Edge will continue to function so no need to worry there. I created the following table to display how vShield Manager and vShield Edge can be “protected”.
Product | vShield Manager | VMware HA | VM Monitoring | VMware FT |
vShield Manager | Yes (*) | Yes | Yes | Yes |
vShield Edge | Yes (*) | Yes | Yes | Yes |
Not only would you be able to leverage these standard vSphere technologies there is more that can be leveraged:
- Scheduled live clone of vShield Manager through vCenter
- Scheduled configuration back up of vShield Manager (*)
Please don’t get me wrong here, there are always methods to get locked out but as Edward Haletky stated “In fact, the way vShield Manager locks down the infrastructure upon failure is in keeping with longstanding security best practices”. (Quote from Beth P’s article) I also would not want my door to be opened up automatically when there is something wrong with my lock. The trick though is to prevent a “broken lock” situation from occurring and to utilize vSphere capabilities in such a way that the last known state can be safely recovered if it would.
As always an architect/consultant will need to work with all the requirements and constraints and based on the capabilities of a product come up with a solution that offers maximum resiliency and with the mentioned options above you can’t tell me that VMware doesn’t provide these
Itzik Reich says
Hi,
me and scott encountered the same problem at the same time without knowing about each others, the thing is that at the current design, the vCenter CAN be a single point of failure not just for vShield Zones, i also had the circular effect with the vDS when my lab went down and my vCenter was hosted as a VM, i had to attach my vCenter to a legacy vSwitch, power it on and move it to a vDS, granted, a lab enviornment is not a prod enviornment but these issues are a signle point of failure..
my personal reccomandation (although i know it’s not VMware own reccomandation) is to put the vCenter on a physical host with VMware Heartbeat replicating it to a VM..
it’s all about isolating the management layer from the ones it’s managing.
Duncan Epping says
Not sure what the vDS issue was you experienced, but as long as you don’t need to make changes vDS should work normally without having vCenter available.
Although Heartbeat is great and a separate management cluster even better this usually only works for large enterprise environments.
As mentioned, there are many methods of providing extra resiliency.
Mike Wronski says
FT for the Edge appliance? Does that work? Assuming the guests behind edge are located on multiple hosts in the cluster, there will already be an edge appliance providing the protected port group on each host in the cluster. What happens if TWO Edge appliances appear on a single host due to FT fail over? I haven’t tested this but it looks like FT wont work for Edge.
Duncan Epping says
Yes it does work Mike. First of all, there is only a single edge device deployed. So not one per Host, but one per “network segment”. Also, the secondary VM is not doing I/O. So if the Primary fails the Secondary will take over.
Ben Thomas says
The cached data is all well and good until there is a catastrophic failure and HA moves VC to another server. I don’t know many admins that keep track of where their VC was registered so in this scenario (assuming default vDS port group config, as most are) vCenter wont be able to be brought online because it cannot grant itself a port. I can’t count the number of support calls VMware gets on this, which is why I usually end up recommending that people use a standard switch for their management port groups, better safe than sorry I guess.
Ben Thomas says
oops, clicked reply on the wrong comment thread, sorry!
Itzik Reich says
The circular issue with the vCenter / vDS is that when your vCenter is hosted on a vDS port and your entire cluster goes down, you will not be able to bring it up…
Duncan Epping says
Why not? When it is configured all data is cached locally and even when vCenter goes down VMs can still be powered on and connect.
Yuri Semenikhin says
vCenter impact on vDS only that you cant make changes on already configured vDS without vCenter, but network will still working
Doug Youd says
I’d like to see some evidence of this. Im currently designing a blade-based vSphere 4.1 evironment that will only have a pair of redundant 10gbit adapters per host for all network traffic…. hence requiring a single vSwitch.
The intention was to use a vDS with QoS and explicit failover policies surrounding the management traffic.
If there is evidence to prove that in the event of a host failure that was hosting the vCenter guest, that vDS wont work…. I’d like to see it.
Cheers,
Doug
Duncan Epping says
I tested this in the past and it worked fine. So I would like to see “evidence” as well.
Eric O'Callaghan says
So does HA/FT only apply to vShield Manager/Edge where vShield App is not deployed? or has 5.0 changed that?
According to Michael Webster:
“Even though vShield Manger has a single vCPU you can’t use VMware Fault Tolerence to protect it when implementing vShield App, this is because it uses linked clones and snapshots as part of the deployment process for the vShield Firewall Service Appliance virtual machines. This limitation doesn’t apply to vShield Edge (which is best practice as per the VMware vCloud Architecture Toolkit).”
http://longwhiteclouds.com/2011/09/09/vshield-app-design-for-the-enterprise/
Cheers,
Eric
Duncan Epping says
I am not sure I understand the consideration…. you would have to ask Michael for more details on this.