After a recent comment from @robertquast, it occurred to me that there’s quite a bit of confusion about the way that UCS Fabric Interconnects handle layer-2 forwarding in the default and recommended Ethernet Host Virtualizer mode.
The basic idea behind EHV is that the Fabric Interconnect appears to the upstream network as if it were a host and not a layer-2 switch. To accomplish this (and not require our old friend Spanning Tree Protocol), the Fabric Interconnect employs a number of L2 forwarding rules and deja vu checks to prevent the creation of L2 loops. If you know how VMware vSwitches operate with multiple active uplinks, you’ve already got a pretty good understanding of EHV.
EHV separates L2 links into two classes (at least in UCS) – border ports and server ports. Border ports connect to outside L2 switching not under the control of UCS, while server ports connect to the chassis in the system (and in turn the blades).
Due to the way that UCS provides connectivity to the blades, through the use of vNIC objects, the switching infrastructure already understands every “physical” NIC that exists “below” the Fabric Interconnect. Therefore, it has no need to learn those addresses (or expire them), since it can track the physical connectivity and knows when a MAC address is or is not available. The exception to this is virtual MAC addresses that are not managed by the vNIC objects – specifically, those created or managed by a virtualization or clustering solution (i.e. VMware). These addresses are learned and then aged out by traditional L2 switch mechanisms, through a configurable timeout. See my other post regarding MAC address aging for more details.
Uplinks are handled very differently. Most notably, in EHV mode, the Fabric Interconnects do not ever learn MAC addresses that exist on the uplink switches. Each physical blade port is pinned to a physical uplink port (or Port Channel). Unknown unicast MAC addresses received on border ports are dropped. Known unicast MAC addresses received on border ports are first subjected to a deja vu check to ensure that the frame arrives on the destination MAC address’ pinned uplink, and assuming a successful check, the frame is forwarded to the destination port.
The problem arises when a silent VM’s MAC address is aged out of the MAC table in the Fabric Interconnect. If an outside L2 device generates a frame destined for the now-unknown unicast MAC, and the Fabric Interconnect receives the frame on a border port, the Fabric Interconnect will drop the frame. This essentially creates a black hole for that MAC address unless the VM generates some traffic *or* a device inside the Fabric Interconnect L2 domain generates a frame destined for the silent VM’s MAC address. In this case, the Fabric Interconnect *will* broadcast the unknown unicast MAC frame to all server ports (in the same VLAN, of course) and that particular blade’s pinned uplink. The VM should respond to the frame at which point the Fabric Interconnect will re-learn the address and traffic will flow normally.
So the basic flow…
Frames arriving on border ports:
To known unicast -> forwarded to appropriate server port
To unknown unicast -> dropped
To broadcast -> forwarded to all server ports in VLAN
Frames arriving on server ports:
To known unicast -> forwarded to appropriate server port (remember we only learn addresses “south” of FI)
To unknown unicast -> forwarded only to pinned uplink port
To broadcast -> forwarded to all server ports in VLAN and only pinned uplink port (same as unknown unicast)
Questions? Discussion?
***NOTE*** Edited on June 7th, 2010 to correct an error in the description of behavior of unknown unicast messages received on server ports.