After a recent comment from @robertquast, it occurred to me that there’s quite a bit of confusion about the way that UCS Fabric Interconnects handle layer-2 forwarding in the default and recommended Ethernet Host Virtualizer mode.
The basic idea behind EHV is that the Fabric Interconnect appears to the upstream network as if it were a host and not a layer-2 switch. To accomplish this (and not require our old friend Spanning Tree Protocol), the Fabric Interconnect employs a number of L2 forwarding rules and deja vu checks to prevent the creation of L2 loops. If you know how VMware vSwitches operate with multiple active uplinks, you’ve already got a pretty good understanding of EHV.
EHV separates L2 links into two classes (at least in UCS) – border ports and server ports. Border ports connect to outside L2 switching not under the control of UCS, while server ports connect to the chassis in the system (and in turn the blades).
Due to the way that UCS provides connectivity to the blades, through the use of vNIC objects, the switching infrastructure already understands every “physical” NIC that exists “below” the Fabric Interconnect. Therefore, it has no need to learn those addresses (or expire them), since it can track the physical connectivity and knows when a MAC address is or is not available. The exception to this is virtual MAC addresses that are not managed by the vNIC objects – specifically, those created or managed by a virtualization or clustering solution (i.e. VMware). These addresses are learned and then aged out by traditional L2 switch mechanisms, through a configurable timeout. See my other post regarding MAC address aging for more details.
Uplinks are handled very differently. Most notably, in EHV mode, the Fabric Interconnects do not ever learn MAC addresses that exist on the uplink switches. Each physical blade port is pinned to a physical uplink port (or Port Channel). Unknown unicast MAC addresses received on border ports are dropped. Known unicast MAC addresses received on border ports are first subjected to a deja vu check to ensure that the frame arrives on the destination MAC address’ pinned uplink, and assuming a successful check, the frame is forwarded to the destination port.
The problem arises when a silent VM’s MAC address is aged out of the MAC table in the Fabric Interconnect. If an outside L2 device generates a frame destined for the now-unknown unicast MAC, and the Fabric Interconnect receives the frame on a border port, the Fabric Interconnect will drop the frame. This essentially creates a black hole for that MAC address unless the VM generates some traffic *or* a device inside the Fabric Interconnect L2 domain generates a frame destined for the silent VM’s MAC address. In this case, the Fabric Interconnect *will* broadcast the unknown unicast MAC frame to all server ports (in the same VLAN, of course) and that particular blade’s pinned uplink. The VM should respond to the frame at which point the Fabric Interconnect will re-learn the address and traffic will flow normally.
So the basic flow…
Frames arriving on border ports:
To known unicast -> forwarded to appropriate server port
To unknown unicast -> dropped
To broadcast -> forwarded to all server ports in VLAN
Frames arriving on server ports:
To known unicast -> forwarded to appropriate server port (remember we only learn addresses “south” of FI)
To unknown unicast -> forwarded only to pinned uplink port
To broadcast -> forwarded to all server ports in VLAN and only pinned uplink port (same as unknown unicast)
Questions? Discussion?
***NOTE*** Edited on June 7th, 2010 to correct an error in the description of behavior of unknown unicast messages received on server ports.
Hey Dave – I understand the flows but here is my big question. My big question is how and when will this be a big deal? What scenarios do I need to watch out and plan around to make sure this isn’t an issue?
Thank you!
I guess my comments on Twitter were less about it not being an issue as much as avoiding a global recommendation to disable mac address aging in UCS. Personally I would try to confine a vm network(s) to a UCS cluster and tune the mac aging to match the gateway router. If memory serves most windows OSes have a 10 minute aging timeframe, so it would seem prudent to adjust UCS to match, however I would suspect less chatty *nix flavors to be more susceptible to the issue.
I would just be worried about the mac address table filling up after disabling aging, running for a year straight and then kaboom.
Very relevant post and brings attention to an important design consideration. As another very good blog post said recently it’s all about knowing the consequences of your design choices.
does anybody know which HW and release is supporting that feature?
@max Which feature?
Hi !
about the “Unknown Unicast – From Server Ports” behavior, something seems strange. Given the following assertions:
– All unknown Unicast traffic from server ports got flooded to all servers ports in the same VLAN + the pinned Border port.
– No learning ever happens on the Border ports.
=> If no learning ever happens on uplink/border ports, then I guess the Layer3 router MAC adress, upstream, is… unknown to the UCS… and that makes sense, because after all, we don’t care: the correct output port is used anyway. that’s fine…
=> if the router MAC address is unknown, then… are we really sure that unknown destination MAC addresses are flooded to all server ports ? if yes, that would mean that all traffic going OUT the UCS (to unknown mac, including L3 router outside the UCS, since we don’t learn on border ports) is to be flooded to all server ports ?
Thank you !
@creis –
Great point. This is something that’s come up in other discussions as well – and my writeup on that point (as well as the official Cisco course material) is incorrect. The short answer is that an unknown unicast frame received on a server port would *not* be forwarded to the other server ports, only to the uplink. I plan a post here shortly that will clarify the issue. Thanks for reading and commenting!
Sorry to dig up an old post, but I have an interesting scenario that I’m trying to find the answer to…
Frames arriving on border ports:
To known unicast -> forwarded to appropriate server port
What happens to a frame that arrives on interconnect A’s border port that is destined for a known address on fabric B’s server port? (i.e. a vNIC which is pinned to fabric B without fabric failover)
Thanks in advance.
@Chris – It would be dropped. Remember that despite the shared management interface, each FI acts an independent L2 forwarder. Excepting malicious circumstances, the only reason that such a frame would arrive is that the upstream switching had not yet learned the location of the MAC in question, so it was flooding it (as it should with an unknown unicast, from its perspective)… standard L2 forwarding would ensure that it also reached FI-B and the appropriate border port. Now, if you had some strange configuration where each FI was connected to different L2 networks, it could get complicated – and require switching mode – but in any case, FI-A would not forward the frame “South”.
Dave
In a end host mode, if server-1-Chassis-1 sends a broadcast packet, will that packet only hit the Border ports (the port connecting to upstream switch) or will it also get broadcast to server ports (the ports where the IOM of other chassis say chassis-2 is connected). Will Fabric interconnect do local switching or will that have to be switched from a external switch
The Fabric Interconnect will flood the broadcast frame out the upstream port to which the originating vNIC is pinned, as well as to any vNIC on the fabric on the same VLAN. In short, yes, the FI will handle local flooding, and rely on the upstream switch to handle anything external to the FI.