UCS with disjointed L2 Domains

How do we deal with disjointed L2 domains in UCS?

To start, what’s a disjointed L2 domain?  This is where you have two Ethernet “clouds” that never connect, but must be accessed by the same UCS Fabric Interconnect.   Take, for example, a multi-tenant scenario where we have multiple customer’s servers within the same UCS cluster that must access different L2 domains.

How do we ensure that all traffic from Customer A’s blade only goes to their cloud, while Customer B’s blades only connect to their cloud?

The immediately obvious answer is to use UCS pin groups to tie each customers interfaces (through their vNIC configuration) to the uplinks that go to their cloud.   Unfortunately, this only solves half of the problem.

In the default operational mode of the Fabric Interconnects (called Ethernet Host Virtualizer, sometimes called End Host Virtualizer), only one uplink is used to receive multicast or broadcast traffic.   EHV mode assumes a single L2 fabric on the uplinks (VLAN considerations notwithstanding).  So in this example, only broadcasts or multicasts from one of the two fabrics would be accepted.   Obviously, this is a problem.

The only way to get around this is to put the Fabric Interconnects into Ethernet Switching mode.   This causes the Fabric Interconnect to behave as a standard L2 switch, including spanning tree considerations.  Now uplinks can receive broadcasts and multicasts regardless of the fabrics they are connected to.   This does, however, increase the administrative overhead of the Fabric Interconnects and reduces your flexibility in uplink configuration since now we must channel all ports going into the same L2 domain in order to use the bandwidth.

To me, a more ideal situation would be to leave the Fabric Interconnects in EHV mode, and use another L2 switch to perform the split between fabrics, such as the following:

This configuration allows the Fabric Interconnect to remain in EHV mode and has the upstream L2 switches performing the split between the L2 domains.  ACLs can be configured on the L2 switches as necessary to isolate the networks, something that cannot be done on the Fabric Interconnect regardless of mode.

Both of these scenarios assume that each of the two customer L2 clouds are using different VLAN numbering, since there’s no capacity in UCS to distinguish between the same VLAN numbers on either Fabric.   There are certainly L3 and other translation tricks that you could use to accomodate this, but that’s an entirely different post.  🙂

19 thoughts on “UCS with disjointed L2 Domains”

  1. Alton – not presently. Future code releases may allow the selection of which uplink is used for broadcasts, including the ability to accept broadcast on multiple ports (for just this exact scenario).

    1. Mostly because it complicates the infrastructure, introduces STP, and makes the upstream switch configuration more complicated. With the new port types in UCSM 1.4, there’s really only one use case I can think of Ethernet switch mode, and that’s this disjointed L2 domain scenario.

  2. Thank you Dave.

    That’s exactly my case and most of my customers’.

    Besides the complexity (that, if you have to introduces 2 L2 switches you are going to have anyway) Is there any other reason?? performance problems?? In our datacenter we have several switch domains (service, fw, no-fw, monitoring…) and we always have to connect at least two to any system. Introducing a second level of switching is expensive and re-complicates the picture and as switch mode is not cisco recommended it makes me feel uncomfortable. When I have to present this system, this is the biggest problem I face with.

    Thanks. Great blog, by the way!

    1. It’s odd that most of your customers’ environments are like that. Most of the environments I see don’t have the disjointed L2 scenario, at least not down at the access layer (where the UCS Fabric Interconnect sits). The reason that switch mode isn’t the recommended deployment method is that it isn’t required for most installations, and just complicates the installation and configuration. There’s no performance impact to running in switch mode, so if it’s required for a particular situation (such as yours), it’s perfectly OK to run that way. I guess what I’m driving at is only use it if you *have* to. I know some networking engineers gravitate towards that mode because it’s more familiar – that’s what I’m trying to avoid.
      Once you’re running in switch mode, you have to starting thinking about the Fabric Interconnect like a switch (STP considerations, etc), but it isn’t configurable like a traditional switch.
      Except in very small deployments, the cost of one or two “real” switches to handle the L2 breakout shouldn’t be a significant impact to the cost of the overall project.

  3. The deeper I go into UCS the bigger the problems I find when thinking in a hosting provider implementation… I mean, for example, MAC pools: How can you control that you are not replicating MACs in an enviroment with hundreds of customers and thousands of servers when introducing UCS systems? I am talking about a very big hosting service provider and MAC controling is not taken into account at all and now it becomes a very important issue to consider.

    1. Can’t really see how this is any more of a problem with UCS than with any other tech in a massive provider environment, but…

      UCS itself won’t enforce/verify duplicate MACs across multi-cluster deployments, but there are lots of ways to solve that problem. First and foremost, most UCS deployments are recommended to be deployed using some scheme in the MAC/WWN/UUID pools that identify and keep unique the particular cluster they are a part of. It’s not difficult to lay out such a scheme, but it does require some forethought. Every SP I’ve ever dealt with has excelled in this type of planning.

      There are also 3rd party products (BladeLogic, Ionix, etc) that will manage multiple clusters and ensure uniqueness in pools/identifiers.

  4. One of the design principle of UCS was to keep it easy for server admins. In UCS, the access switches are called “Fabric-Interconnect” because it doesn’t expose all the switch knobs – so if you put the FIs in switch mode – you won’t have ability to fine tune STP. Of course, you can do that from the next level switches, but at that point you are increasing complexity.

  5. The new 2.0 code allows disjointed L2 Domains in Endhost Mode. But I am not going to race to check it out. Will be working with in in the lab for a while first. But the hope is to get off our disjointed L2 Domains.

    Craig

  6. hi, in my enviornment i have to connect both of my FI to only one network switch,
    kindly confirm which mode i have to follow..?

    1. That wouldn’t be a disjointed L2 domain, unless you were intentionally providing only a subset of VLANs to each of the FIs.

Leave a Reply to Dave Alexander Cancel reply

Your email address will not be published. Required fields are marked *