Correction to L2 Forwarding Rules post

I posted here about the L2 forwarding rules when UCS is in EHV mode.   Several readers have pointed out a flaw in the logic I posted, which was taken from Cisco’s DCUCI course.   In Cisco’s defense, I did write that course.   🙂

At issue is how UCS deals with unknown unicast frames.   The other post incorrectly states than an unknown unicast frame received from a server port would be flooded out all other server ports participating in the same VLAN.   This is not the case.

The logic behind EHV mode is that it is impossible to have an unknown unicast address “behind” or “south” of the Fabric Interconnect.   All adapter MAC addresses are known, either because they were assigned in the service profile or inventoried (if using “derived” values).    For MAC addresses that are generated within a host, say for a virtual machine, the assumption is that at creation (or arrival through vMotion, etc) the MAC address will be announced using gratuitous ARP or other traffic generation techniques and the Fabric Interconnect can learn the address through normal L2 methods.

So to clarify, an unknown unicast frame received from a server port will be flooded out ONLY that interface’s pinned uplink.   Otherwise, all traffic destined for MAC addresses outside of UCS (such as the MAC address of a default gateway, for example) would also get flooded internally – which would not be a good thing.

UCS with disjointed L2 Domains

How do we deal with disjointed L2 domains in UCS?

To start, what’s a disjointed L2 domain?  This is where you have two Ethernet “clouds” that never connect, but must be accessed by the same UCS Fabric Interconnect.   Take, for example, a multi-tenant scenario where we have multiple customer’s servers within the same UCS cluster that must access different L2 domains.

How do we ensure that all traffic from Customer A’s blade only goes to their cloud, while Customer B’s blades only connect to their cloud?

The immediately obvious answer is to use UCS pin groups to tie each customers interfaces (through their vNIC configuration) to the uplinks that go to their cloud.   Unfortunately, this only solves half of the problem.

In the default operational mode of the Fabric Interconnects (called Ethernet Host Virtualizer, sometimes called End Host Virtualizer), only one uplink is used to receive multicast or broadcast traffic.   EHV mode assumes a single L2 fabric on the uplinks (VLAN considerations notwithstanding).  So in this example, only broadcasts or multicasts from one of the two fabrics would be accepted.   Obviously, this is a problem.

The only way to get around this is to put the Fabric Interconnects into Ethernet Switching mode.   This causes the Fabric Interconnect to behave as a standard L2 switch, including spanning tree considerations.  Now uplinks can receive broadcasts and multicasts regardless of the fabrics they are connected to.   This does, however, increase the administrative overhead of the Fabric Interconnects and reduces your flexibility in uplink configuration since now we must channel all ports going into the same L2 domain in order to use the bandwidth.

To me, a more ideal situation would be to leave the Fabric Interconnects in EHV mode, and use another L2 switch to perform the split between fabrics, such as the following:

This configuration allows the Fabric Interconnect to remain in EHV mode and has the upstream L2 switches performing the split between the L2 domains.  ACLs can be configured on the L2 switches as necessary to isolate the networks, something that cannot be done on the Fabric Interconnect regardless of mode.

Both of these scenarios assume that each of the two customer L2 clouds are using different VLAN numbering, since there’s no capacity in UCS to distinguish between the same VLAN numbers on either Fabric.   There are certainly L3 and other translation tricks that you could use to accomodate this, but that’s an entirely different post.  🙂

L2 Forwarding Rules in UCS EHV Mode

After a recent comment from @robertquast, it occurred to me that there’s quite a bit of confusion about the way that UCS Fabric Interconnects handle layer-2 forwarding in the default and recommended Ethernet Host Virtualizer mode.

The basic idea behind EHV is that the Fabric Interconnect appears to the upstream network as if it were a host and not a layer-2 switch.  To accomplish this (and not require our old friend Spanning Tree Protocol), the Fabric Interconnect employs a number of L2 forwarding rules and deja vu checks to prevent the creation of L2 loops.  If you know how VMware vSwitches operate with multiple active uplinks, you’ve already got a pretty good understanding of EHV.

EHV separates L2 links into two classes (at least in UCS) – border ports and server ports.  Border ports connect to outside L2 switching not under the control of UCS, while server ports connect to the chassis in the system (and in turn the blades).

Due to the way that UCS provides connectivity to the blades, through the use of vNIC objects, the switching infrastructure already understands every “physical” NIC that exists “below” the Fabric Interconnect.  Therefore, it has no need to learn those addresses (or expire them), since it can track the physical connectivity and knows when a MAC address is or is not available.   The exception to this is virtual MAC addresses that are not managed by the vNIC objects – specifically, those created or managed by a virtualization or clustering solution (i.e. VMware).   These addresses are learned and then aged out by traditional L2 switch mechanisms, through a configurable timeout.   See my other post regarding MAC address aging for more details.

Uplinks are handled very differently.   Most notably, in EHV mode, the Fabric Interconnects do not ever learn MAC addresses that exist on the uplink switches.   Each physical blade port is pinned to a physical uplink port (or Port Channel). Unknown unicast MAC addresses received on border ports are dropped.   Known unicast MAC addresses received on border ports are first subjected to a deja vu check to ensure that the frame arrives on the destination MAC address’ pinned uplink, and assuming a successful check, the frame is forwarded to the destination port.

The problem arises when a silent VM’s MAC address is aged out of the MAC table in the Fabric Interconnect.   If an outside L2 device generates a frame destined for the now-unknown unicast MAC, and the Fabric Interconnect receives the frame on a border port, the Fabric Interconnect will drop the frame.   This essentially creates a black hole for that MAC address unless the VM generates some traffic *or* a device inside the Fabric Interconnect L2 domain generates a frame destined for the silent VM’s MAC address.  In this case, the Fabric Interconnect *will* broadcast the unknown unicast MAC frame to all server ports (in the same VLAN, of course) and that particular blade’s pinned uplink.  The VM should respond to the frame at which point the Fabric Interconnect will re-learn the address and traffic will flow normally.

So the basic flow…

Frames arriving on border ports:

To known unicast -> forwarded to appropriate server port

To unknown unicast -> dropped

To broadcast -> forwarded to all server ports in VLAN

Frames arriving on server ports:

To known unicast -> forwarded to appropriate server port (remember we only learn addresses “south” of FI)

To unknown unicast -> forwarded  only to pinned uplink port

To broadcast -> forwarded to all server ports in VLAN and only pinned uplink port (same as unknown unicast)

Questions?  Discussion?

***NOTE*** Edited on June 7th, 2010 to correct an error in the description of behavior of unknown unicast messages received on server ports.