8Gb Fibre Channel or 10Gb Ethernet w/ FCoE?

Update 2011/01/31 – I’ve added new thoughts and comments on the subject here: http://www.unifiedcomputingblog.com/?p=234

Which is better?   Which is faster?

I’ve been stuck on this one for a while.   I’m traditionally a pure fibre channel kind of guy, so I’ve been pretty convinced that traditional FC was here to stay for a while, and that FCoE – as much as I believe in the technology – would probably be limited to the access and aggregation layers for the near term.   That is, until it was pointed out to me the encoding mechanisms used by these two technologies and the effective data rates they allowed.   I’m not sure why it never occurred to me before, but it hit me like the proverbial ton of bricks this week.

First off, a quick review of encoding mechanisms.   Any time we’re transmitting or storing data, we encode it in some form or another.   Generally, this is to include some type of a checksum to ensure that we can detect errors in reading or receiving the data.   I remember the good old days when I discovered that RLL hard drive encoding was 50% more efficient than MFM encoding, and with just a new controller and a low level format, my old 10MB (yes, that’s ten whopping megabytes, kids.   Ask your parents what a megabyte was.)  suddenly became 15MB!   Well, we’re about to embark on a similar discovery.

1, 2, 4, and 8 Gb Fibre Channel all use 8b/10b encoding.   Meaning, 8 bits of data gets encoded into 10 bits of transmitted information – the two bits are used for data integrity.   Well, if the link is 8Gb, how much do we actually get to use for data – given that 2 out of every 10 bits aren’t “user” data?   FC link speeds are somewhat of an anomaly, given that they’re actually faster than the stated link speed would suggest.   Original 1Gb FC is actually 1.0625Gb/s, and each generation has kept this standard and multiplied it.  8Gb FC would be 8×1.0625, or actual bandwidth of 8.5Gb/s.   8.5*.80 = 6.8.   6.8Gb of usable bandwidth on an 8Gb FC link.

10GE (and 10G FC, for that matter) uses 64b/66b encoding.   For every 64 bits of data, only 2 bits are used for integrity checks.   While theoretically this lowers the overall protection of the data, and increases the amount of data discarded in case of failure, that actual number of data units that are discarded due to failing serialization/deserialization is minuscule.    For a 10Gb link using 64b/66b encoding, that leaves 96.96% of the bandwidth for user data, or 9.7Gb/s.

So 8Gb FC = 6.8Gb usable, while 10Gb Ethernet = 9.7Gb usable.   Even if I was able to use all of the bandwidth available on an 8Gb FC port (which is very unlikely at the server access layer), with 10GE running FCoE, I’d still have room for 3 gigabit Ethernet-class “lanes”.   How’s that for consolidation?

10Gb FC has the same usable bandwidth, and without the overhead (albeit a small 2% or so) of FCoE, but you don’t get the consolidation benefits of using the same physical link for your storage and traditional Ethernet traffic.

I’m sold.

Fantastic post on statelessness : HP VirtualConnect vs. Cisco UCS

M. Sean McGee posted this great comparison of VirtualConnect and UCS.   I’ve often struggled to give students a clear picture of the differences – HP will tell you that “VirtualConnect is just as good, and we’ve been doing it for years!”.  Well, yes… it does some things similarly, and you can’t argue the timeframe.   UCS does a lot more – and until now, I didn’t have a great source that directly compared them.   From now on, all I have to do is send them to M. Sean McGee’s post!

The State of Statelessness

UCS Manager 1.2(1) Released

As a full UCS bundle (including all code – from the lowliest baseboard management controller to the UCS Manager in all it’s process-preserving glory), Cisco has released version 1.2(1).

Full release notes are available here.

To summarize, this release adds support for the soon-to-be-shipping “M2” versions of the UCS blades, which support the Intel Xeon 5600-series processors (Westmere), which include the 6-core version of the Nehalem lineage.   There are also numerous bug-fixes (expected in this generation of product), including many on my list of “slightly annoying but still ought to be fixed” bugs.

Panduit Completes 7 Meter SFP+ Copper Cables!

Panduit is now shipping their 7 meter SFP+ copper twinax cables.   They’re not officially on the Cisco compatibility list yet, but once they are, this opens up some additional UCS expansion options.   The jump from 5 meter to 7 meter may not seem like a lot, but that’s another rack and a couple of more chassis… who couldn’t use another 32 or 48 blades while still keeping the cabling infrastructure cheap?

My understanding is that the cables work just fine in UCS and Nexus 5000 configurations, but aren’t yet officially supported by Cisco.

L2 Forwarding Rules in UCS EHV Mode

After a recent comment from @robertquast, it occurred to me that there’s quite a bit of confusion about the way that UCS Fabric Interconnects handle layer-2 forwarding in the default and recommended Ethernet Host Virtualizer mode.

The basic idea behind EHV is that the Fabric Interconnect appears to the upstream network as if it were a host and not a layer-2 switch.  To accomplish this (and not require our old friend Spanning Tree Protocol), the Fabric Interconnect employs a number of L2 forwarding rules and deja vu checks to prevent the creation of L2 loops.  If you know how VMware vSwitches operate with multiple active uplinks, you’ve already got a pretty good understanding of EHV.

EHV separates L2 links into two classes (at least in UCS) – border ports and server ports.  Border ports connect to outside L2 switching not under the control of UCS, while server ports connect to the chassis in the system (and in turn the blades).

Due to the way that UCS provides connectivity to the blades, through the use of vNIC objects, the switching infrastructure already understands every “physical” NIC that exists “below” the Fabric Interconnect.  Therefore, it has no need to learn those addresses (or expire them), since it can track the physical connectivity and knows when a MAC address is or is not available.   The exception to this is virtual MAC addresses that are not managed by the vNIC objects – specifically, those created or managed by a virtualization or clustering solution (i.e. VMware).   These addresses are learned and then aged out by traditional L2 switch mechanisms, through a configurable timeout.   See my other post regarding MAC address aging for more details.

Uplinks are handled very differently.   Most notably, in EHV mode, the Fabric Interconnects do not ever learn MAC addresses that exist on the uplink switches.   Each physical blade port is pinned to a physical uplink port (or Port Channel). Unknown unicast MAC addresses received on border ports are dropped.   Known unicast MAC addresses received on border ports are first subjected to a deja vu check to ensure that the frame arrives on the destination MAC address’ pinned uplink, and assuming a successful check, the frame is forwarded to the destination port.

The problem arises when a silent VM’s MAC address is aged out of the MAC table in the Fabric Interconnect.   If an outside L2 device generates a frame destined for the now-unknown unicast MAC, and the Fabric Interconnect receives the frame on a border port, the Fabric Interconnect will drop the frame.   This essentially creates a black hole for that MAC address unless the VM generates some traffic *or* a device inside the Fabric Interconnect L2 domain generates a frame destined for the silent VM’s MAC address.  In this case, the Fabric Interconnect *will* broadcast the unknown unicast MAC frame to all server ports (in the same VLAN, of course) and that particular blade’s pinned uplink.  The VM should respond to the frame at which point the Fabric Interconnect will re-learn the address and traffic will flow normally.

So the basic flow…

Frames arriving on border ports:

To known unicast -> forwarded to appropriate server port

To unknown unicast -> dropped

To broadcast -> forwarded to all server ports in VLAN

Frames arriving on server ports:

To known unicast -> forwarded to appropriate server port (remember we only learn addresses “south” of FI)

To unknown unicast -> forwarded  only to pinned uplink port

To broadcast -> forwarded to all server ports in VLAN and only pinned uplink port (same as unknown unicast)

Questions?  Discussion?

***NOTE*** Edited on June 7th, 2010 to correct an error in the description of behavior of unknown unicast messages received on server ports.

Tolly Report

A lot of people have been asking me what I think of the recently released Tolly report comparing the bandwidth of the HP and Cisco blade solutions.

The short answer is, I don’t think much of it.   It’s technically sound, and the the conclusions it reaches are perfectly reasonable – for the conditions of the tests they performed.   In keeping with Tolly’s charter, the tests were repeatable, documented, and indisputable.  The problem is, the results of the testing only tell half the story.   The *conclusions* they reach, on the other hand, aren’t as defensible.

It’s really not necessary to get into a point by point rebuttal.   At least not for me.   I’m sure Cisco will be along any minute to do just that.

The facts that Tolly established during the report aren’t groundbreaking or surprising.   Essentially, the tests were built to demonstrate the oversubscription of links between UCS chassis and Fabric Interconnects, which Cisco is quite willing to disclose at 2:1.  These tests were paired with HP comparisons of blade-to-blade traffic on the same VLAN, which in HP architectures keeps the traffic local to the chassis.  The interesting thing there is that if the traffic were between the same blades but in different VLANs, the Cisco and HP solutions would have performed identically (assuming the same aggregation-layer architecture).   What makes that interesting is that the Tolly report’s figures depend on a specific configuration and test scenario – the Cisco values won’t change (or get any worse) no matter how you change the variables.  The HP values will vary widely.

And that, my friends, is where I see the true benefit of Cisco’s architecture.   Change the conditions of the test repeatedly, and you’ll get the same results.

I’m not faulting Tolly in this case.  Not at all.  They were asked to test a certain set of conditions, and did so thoroughly and presumably accurately.  It’s just that you can’t take that set of data and use it to make any kind of useful comparison between the platforms.   The real world is much more complicated than a strictly controlled set of test objectives.   Do we really think that HP went to Tolly and asked “We want a fair fight?”   Of course not.

MAC forwarding table aging on UCS 6100 Fabric Interconnects

I was recently forwarded some information on the MAC table aging process in the UCS 6100 Fabric Interconnects that I thought was very valuable to share.

Prior to this information, I was under the impression (and various documentation had confirmed) that the Fabric Interconnect never ages MAC addresses – in other words, it understands where all the MAC addresses are within the chassis/blades, and therefore has no need to age-out addresses.   In the preferred Ethernet Host Virtualizer mode, it also doesn’t learn any addresses from the uplinks, again, so no need to age a MAC address.

So what about VMware and the virtual MAC addresses that live behind the physical NICs on the blades?

Well, as it turns out, the Fabric Interconnects do age addresses, just not those assigned by UCS Manager to a physical NIC (or a vNIC on a Virtual Interface Card – aka Palo).

On UCS code releases prior to 1.1, learned address age out in 7200 seconds (120 minutes) and is not configurable.

On UCS code releases of 1.1 and later, learned addresses age out in 7200 seconds (120 minutes) by default, but can be adjusted in the LAN Uplinks Manager within UCS Manager.

Why do we care?   Well, it’s possible that if a VM (from which we’ve learned an address) has gone silent for whatever reason, we may end up purging it’s address from the forwarding table after 120 minutes… which will mean it’s unreachable from the outside world, since we’ll drop any frame that arrives on an uplink to an unknown unicast MAC address.   Only if the VM generates some outbound traffic will we re-learn the address and be able to accept traffic on the uplinks for it.

So if you have silent VMs and have trouble reaching them from the outside world, you’ll want to upgrade to the latest UCS code release and adjust the MAC aging timeout to something very high (or possibly never).

Moving UCS Service Profile between UCS Clusters

@SlackerUK on Twitter asked about moving Service Profiles between UCS clusters.

In short, it’s not currently possible with UCS Manager without a bit of manual work.

First, create a “logical” backup from UCS Manager.  This will create an XML file containing all of the logical configuration of UCS Manager, including your service profiles.   Find the service profile you want, and remove everything else from the backup.  You can then import that XML file into another UCS Manager instance.  Be aware that everything comes with that XML, including identifiers – so make sure you’re OK with that or remove the original service profile to eliminate duplicates.

If you’re using BMC Bladelogic for Cisco UCS, it *does* have the capability to move service profiles between clusters.

Cisco Kicks HP to the Curb

Well, it’s official.   HP is not going to be a Cisco gold partner any longer.

Given HP and Cisco’s very public competition, I can’t say this is any surprise.  While HP certainly has contributed significant sales to Cisco in the past in the form of routing and switching equipment, HP has aggressively moved to position their own products in front of Cisco’s recently (and why wouldn’t they?).

Does anyone think this actually means much for the sales of either company, or more of a “yawn” type move?

Full Story Here