Brocade’s Flawed FCoE “Study”

Disclosures
I do not work for Cisco, Brocade, or any of the companies mentioned here. I do work for a reseller that sells some of these products, but this post (as are all posts on this site) is my opinion only, and does not necessarily reflect the views of my employer or any of the manufacturers listed here. Evaluator Group Inc. did invite me to have a call with them to discuss the study via a tweet sent from the @evaluator_group Twitter account. Mr. Fellows also emailed me to offer a call. After my analysis, I deemed such a call unnecessary.

Ok, with that out of the way…

The “Study”

There were quite a few incredulous tweets floating around this week after Brocade publicized an “independent” study performed by Russ Fellows of Evaluator Group Inc. It was also reviewed by Chris Mellor of The Register, which is how I came to know about it. In the review, Mr. Mellor states that “Brocade’s money was well spent,” though I beg to differ.

As of this posting, the study is still available from the Evaluator Group Inc. website, though I would hope that after some measure of peer review, it will be removed given how deeply flawed it is. As I do not have permission to redistribute the study, I will instead suggest that you get a copy at the above link and follow along.

The stated purpose of the study was to compare traditional Fibre Channel (hereafter FC) against Fibre Channel over Ethernet (hereafter FCoE), specifically as a SCSI transport between blade servers and solid state storage. To reduce equipment requirements, only a single path was designed into the test, unlike a production environment that would have at a minimum two. The report further stated that an attempt would be made to keep the amount of bandwidth available to each scenario equal.

The Tech

The vendor of storage was not disclosed, though it should be fairly irrelevant (with one exception to be noted below). The storage was connected via two 16Gb FC links to a Brocade 6510 switch. The Brocade 6510 is a “top of rack” style traditional FC switch that is not capable of FCoE.

The chosen architecture for the FC test was an HP c7000 blade enclosure containing two blades, using a Brocade FC switch. The embedded Brocade switch is connected to the Brocade 6510 via a single 16Gb FC link.

The FCoE test was performed using a Cisco UCS architecture, consisting of a single Fabric Interconnect, connected via 4 10Gb converged Ethernet links to a single blade chassis containing two blades. The Fabric Interconnect is connected to the Brocade 6510 via two 8Gb FC links. As of this writing, the only FC connectivity supported by Cisco UCS is 10Gb FCoE or 1/2/4/8Gb FC.

So what’s the problem?

There are many, many fundamental flaws with the study. I eventually ran out of patience to catalog them individually, so I’m instead going to call out some of the most egregious transgressions.

To start, let’s consider testing methodology. The stated purpose of this test was to evaluate storage connectivity options, narrowed to FC and FCoE. It was not presented as a comparison of server vendors. As such, as many variables as possible should be eliminated to isolate the effects of the protocol and transport. This is the first place that this study breaks down. Why was Cisco UCS chosen? If the effects of protocol and transport are truly the goal of the test, why would the HP c7000 not also be the best choice? There are several ways to achieve FCoE in a c7000, both externally and internally.

The storage in use is connected via two 16Gb FC links. The stated reason for this is that the majority of storage deployments still use FC instead of FCoE, which is certainly true. The selection of the Brocade 6510 is interesting, however, in that Brocade has other switches that would have been capable of supporting FCoE and FC simultaneously. It’s clear that the choice of an FC only switch was designed to force the FCoE traffic to be de-encapsulated before going to the storage. Already we can see that we are not testing FC vs. FCoE, but rather FC natively end to end vs. one hop of FCoE. Even so, the latency and performance impact caused by the encapsulation of the FC protocol into Ethernet is negligible. The storage vendor was not disclosed, and as such, I do not know if it could have also supported FCoE, making for a true end-to-end FCoE test.  Despite the study’s claim, end-to-end FCoE is not immature and has been successfully deployed by many customers.

In UCS architecture, all traffic is converged between the blade chassis and the Fabric Interconnect. All switching, management, configuration, etc, occurs within the Fabric Interconnect. The use of four 10Gb Ethernet links between the chassis and Fabric Interconnect is significant overkill given the stated goal of maintaining similar bandwidth between the tests. At worst, two links would have been required to provide each blade with a dedicated 10Gb of bandwidth. Presumably, the decision to go with four was so that the claim could be made that more bandwidth was made available per blade than was available to the 16Gb-capable blades in the HP solution. The study did not disclose the logical configuration of the UCS blades, but the performance data suggests a configuration of a single vHBA per blade. In this configuration, the vHBA would follow a single 10Gb path from the blade to the Fabric Interconnect (via the IO Module), and would in turn be pinned to a single 8Gb FC uplink. Already you can see that regardless of the number of links provided from chassis to Fabric Interconnect, the bottleneck will be the 8Gb FC uplink. The second blade’s vHBA would be pinned (automatically, mind you) to the second 8Gb FC uplink. Essentially in this configuration, each blade has 8Gb of FC bandwith to the Brocade 6510 switch. The VIC 1240 converged network adapter (CNA) on the blade is capable of 20Gb of bandwidth to each fabric. The creation of a second vHBA and allowing the operating system to load balance across them would have provided more bandwidth. The study mentions the use of a software FCoE initiator as being part of the reason for increased CPU utilization.

We didn’t understand the technology, but…

In the “ease of use” comparison, it was noted that the HP environment was configured in three hours, whereas it took eight hours to configure UCS. The study makes it clear that they did not have the requisite skill to configure UCS and required the support of an outside VAR (who was not named) to complete the configuration. The study also states that the HP was configured without assistance. Clearly the engineering team involved here was skilled in HP and not UCS. How this reflects poorly on the product (and especially FC vs. FCoE – that’s the point, right?) is beyond me. I can personally (and have) configure a UCS environment like this in well under an hour. It would probably take me eight hours to perform similar configuration on an HP system, given my lack of hands-on experience in configuring them. This is not a flaw of the HP product, and I wouldn’t penalize it as such. (There are lots of reasons I like UCS over HP c7000, but that’s significantly beyond the scope of this post)

Many of the “ease of use” characteristics cited reflected an all Brocade environment – similar efficiencies would have existed in an all Cisco environment as well, which the study neglected to test.

A software what?

The study observes a spike in CPU utilization with increased link utilization, which is (incorrectly) attributed to the use of a software FCoE initiator. This one point threw me (and others) off quite a bit, as it is extremely rare to use a software FCoE initiator, and non-existent when FCoE capable hardware is present (such as the VIC 1240 in use here). After a number of confusing tweets from the @evaluator_group twitter account, it became clear that while they say they were using a software initiator, it was a misunderstanding of the Cisco VIC 1240 – again pointing to a lack of skill and experience with the product. My suspicion is that the spike in CPU utilization (and latency, and corresponding increase in response times) occurred not due to the FCoE protocol, but rather the queuing that was required when the two 8Gb FC links (total of 13.6Gb/s total bandwidth available, though not aggregated – each vHBA will be pinned to one uplink) became saturated. This is entirely consistent with observed application/storage performance when the links are saturated. This is entirely speculation, however, as the logical configuration of the UCS was not provided.  Despite there being similar total bandwidth available, neither server would have been able to burst above 6.8Gb/s, leading to queuing (and the accompanying latency/response impact).

Is that all?

I could go on and on with individual points that were wrong, misleading, or poorly designed, but I don’t actually think it’s necessary. Once the real purpose of the test (Brocade vs. Cisco) became clear, every conclusion reached in the FC vs. FCoE discussion (however incorrect) is moot.

If Brocade really wants to fund an FC vs. FCoE study that will stand up to scrutiny, it needs to use the same servers (no details were provided on specific CPUs in use – they could have been wildly different for all we know), the same chassis, and really isolate the protocol as they claimed to do. Here’s the really sad part – Brocade could have proven what they wanted to (that 16Gb FC is faster than 10Gb FCoE) in a fair fight. Take the same HP chassis used for the FC test, and put in an FCoE module (with CNAs on the servers) instead. Connect via FCoE to a Brocade FCoE capable switch, and use FCoE capable storage. Despite the test’s claim, there’s a lot of FCoE storage out there in production – just ask NetApp and EMC. At comparable cable counts, 16Gb FC will be faster than 10Gb FCoE. What a shock, huh? Instead, this extraordinarily flawed “study” has cost Brocade and unfortunately Evaluator Group Inc. a lot of credibility.

I’m not anti-Brocade (though I do prefer MDS for FC switching, which is not news to anyone who knows me), I’m not anti-FC (I still like it a lot, though I think pure FC networks’ days are numbered), I’m just really, really anti-FUD. Compete on tech, compete on features, compete on value, compete on price, compete on whatever it is that makes you different. Just don’t do it in a misleading, dishonest way. Respect your customers enough to know they’ll see through blatant misrepresentations, and respect your products enough to let them compete fairly.

Updated: Check out Tony Bourke’s great response here.