The Unified Computing Blog – Discussions around unified computing and networking

Webinar: How to use the Fibre Channel Speedmap

Or alternatively titled, “Shameless Self Promotion – Dave does a webinar!”

Tony Bourke and I did a short webinar on the Fibre Channel Speedmap, available here:
https://www.brighttalk.com/webcast/14967/246353

If you’re interested in a bit of the history of why storage administrators typically talk about BYTES/second while network administrators talk about BITS/second, you might want to give it a look.

Brocade’s Flawed FCoE “Study”

Disclosures
I do not work for Cisco, Brocade, or any of the companies mentioned here. I do work for a reseller that sells some of these products, but this post (as are all posts on this site) is my opinion only, and does not necessarily reflect the views of my employer or any of the manufacturers listed here. Evaluator Group Inc. did invite me to have a call with them to discuss the study via a tweet sent from the @evaluator_group Twitter account. Mr. Fellows also emailed me to offer a call. After my analysis, I deemed such a call unnecessary.

Ok, with that out of the way…

The “Study”

There were quite a few incredulous tweets floating around this week after Brocade publicized an “independent” study performed by Russ Fellows of Evaluator Group Inc. It was also reviewed by Chris Mellor of The Register, which is how I came to know about it. In the review, Mr. Mellor states that “Brocade’s money was well spent,” though I beg to differ.

As of this posting, the study is still available from the Evaluator Group Inc. website, though I would hope that after some measure of peer review, it will be removed given how deeply flawed it is. As I do not have permission to redistribute the study, I will instead suggest that you get a copy at the above link and follow along.

The stated purpose of the study was to compare traditional Fibre Channel (hereafter FC) against Fibre Channel over Ethernet (hereafter FCoE), specifically as a SCSI transport between blade servers and solid state storage. To reduce equipment requirements, only a single path was designed into the test, unlike a production environment that would have at a minimum two. The report further stated that an attempt would be made to keep the amount of bandwidth available to each scenario equal.

The Tech

The vendor of storage was not disclosed, though it should be fairly irrelevant (with one exception to be noted below). The storage was connected via two 16Gb FC links to a Brocade 6510 switch. The Brocade 6510 is a “top of rack” style traditional FC switch that is not capable of FCoE.

The chosen architecture for the FC test was an HP c7000 blade enclosure containing two blades, using a Brocade FC switch. The embedded Brocade switch is connected to the Brocade 6510 via a single 16Gb FC link.

The FCoE test was performed using a Cisco UCS architecture, consisting of a single Fabric Interconnect, connected via 4 10Gb converged Ethernet links to a single blade chassis containing two blades. The Fabric Interconnect is connected to the Brocade 6510 via two 8Gb FC links. As of this writing, the only FC connectivity supported by Cisco UCS is 10Gb FCoE or 1/2/4/8Gb FC.

So what’s the problem?

There are many, many fundamental flaws with the study. I eventually ran out of patience to catalog them individually, so I’m instead going to call out some of the most egregious transgressions.

To start, let’s consider testing methodology. The stated purpose of this test was to evaluate storage connectivity options, narrowed to FC and FCoE. It was not presented as a comparison of server vendors. As such, as many variables as possible should be eliminated to isolate the effects of the protocol and transport. This is the first place that this study breaks down. Why was Cisco UCS chosen? If the effects of protocol and transport are truly the goal of the test, why would the HP c7000 not also be the best choice? There are several ways to achieve FCoE in a c7000, both externally and internally.

The storage in use is connected via two 16Gb FC links. The stated reason for this is that the majority of storage deployments still use FC instead of FCoE, which is certainly true. The selection of the Brocade 6510 is interesting, however, in that Brocade has other switches that would have been capable of supporting FCoE and FC simultaneously. It’s clear that the choice of an FC only switch was designed to force the FCoE traffic to be de-encapsulated before going to the storage. Already we can see that we are not testing FC vs. FCoE, but rather FC natively end to end vs. one hop of FCoE. Even so, the latency and performance impact caused by the encapsulation of the FC protocol into Ethernet is negligible. The storage vendor was not disclosed, and as such, I do not know if it could have also supported FCoE, making for a true end-to-end FCoE test. Despite the study’s claim, end-to-end FCoE is not immature and has been successfully deployed by many customers.

In UCS architecture, all traffic is converged between the blade chassis and the Fabric Interconnect. All switching, management, configuration, etc, occurs within the Fabric Interconnect. The use of four 10Gb Ethernet links between the chassis and Fabric Interconnect is significant overkill given the stated goal of maintaining similar bandwidth between the tests. At worst, two links would have been required to provide each blade with a dedicated 10Gb of bandwidth. Presumably, the decision to go with four was so that the claim could be made that more bandwidth was made available per blade than was available to the 16Gb-capable blades in the HP solution. The study did not disclose the logical configuration of the UCS blades, but the performance data suggests a configuration of a single vHBA per blade. In this configuration, the vHBA would follow a single 10Gb path from the blade to the Fabric Interconnect (via the IO Module), and would in turn be pinned to a single 8Gb FC uplink. Already you can see that regardless of the number of links provided from chassis to Fabric Interconnect, the bottleneck will be the 8Gb FC uplink. The second blade’s vHBA would be pinned (automatically, mind you) to the second 8Gb FC uplink. Essentially in this configuration, each blade has 8Gb of FC bandwith to the Brocade 6510 switch. The VIC 1240 converged network adapter (CNA) on the blade is capable of 20Gb of bandwidth to each fabric. The creation of a second vHBA and allowing the operating system to load balance across them would have provided more bandwidth. The study mentions the use of a software FCoE initiator as being part of the reason for increased CPU utilization.

We didn’t understand the technology, but…

In the “ease of use” comparison, it was noted that the HP environment was configured in three hours, whereas it took eight hours to configure UCS. The study makes it clear that they did not have the requisite skill to configure UCS and required the support of an outside VAR (who was not named) to complete the configuration. The study also states that the HP was configured without assistance. Clearly the engineering team involved here was skilled in HP and not UCS. How this reflects poorly on the product (and especially FC vs. FCoE – that’s the point, right?) is beyond me. I can personally (and have) configure a UCS environment like this in well under an hour. It would probably take me eight hours to perform similar configuration on an HP system, given my lack of hands-on experience in configuring them. This is not a flaw of the HP product, and I wouldn’t penalize it as such. (There are lots of reasons I like UCS over HP c7000, but that’s significantly beyond the scope of this post)

Many of the “ease of use” characteristics cited reflected an all Brocade environment – similar efficiencies would have existed in an all Cisco environment as well, which the study neglected to test.

A software what?

The study observes a spike in CPU utilization with increased link utilization, which is (incorrectly) attributed to the use of a software FCoE initiator. This one point threw me (and others) off quite a bit, as it is extremely rare to use a software FCoE initiator, and non-existent when FCoE capable hardware is present (such as the VIC 1240 in use here). After a number of confusing tweets from the @evaluator_group twitter account, it became clear that while they say they were using a software initiator, it was a misunderstanding of the Cisco VIC 1240 – again pointing to a lack of skill and experience with the product. My suspicion is that the spike in CPU utilization (and latency, and corresponding increase in response times) occurred not due to the FCoE protocol, but rather the queuing that was required when the two 8Gb FC links (total of 13.6Gb/s total bandwidth available, though not aggregated – each vHBA will be pinned to one uplink) became saturated. This is entirely consistent with observed application/storage performance when the links are saturated. This is entirely speculation, however, as the logical configuration of the UCS was not provided. Despite there being similar total bandwidth available, neither server would have been able to burst above 6.8Gb/s, leading to queuing (and the accompanying latency/response impact).

Is that all?

I could go on and on with individual points that were wrong, misleading, or poorly designed, but I don’t actually think it’s necessary. Once the real purpose of the test (Brocade vs. Cisco) became clear, every conclusion reached in the FC vs. FCoE discussion (however incorrect) is moot.

If Brocade really wants to fund an FC vs. FCoE study that will stand up to scrutiny, it needs to use the same servers (no details were provided on specific CPUs in use – they could have been wildly different for all we know), the same chassis, and really isolate the protocol as they claimed to do. Here’s the really sad part – Brocade could have proven what they wanted to (that 16Gb FC is faster than 10Gb FCoE) in a fair fight. Take the same HP chassis used for the FC test, and put in an FCoE module (with CNAs on the servers) instead. Connect via FCoE to a Brocade FCoE capable switch, and use FCoE capable storage. Despite the test’s claim, there’s a lot of FCoE storage out there in production – just ask NetApp and EMC. At comparable cable counts, 16Gb FC will be faster than 10Gb FCoE. What a shock, huh? Instead, this extraordinarily flawed “study” has cost Brocade and unfortunately Evaluator Group Inc. a lot of credibility.

I’m not anti-Brocade (though I do prefer MDS for FC switching, which is not news to anyone who knows me), I’m not anti-FC (I still like it a lot, though I think pure FC networks’ days are numbered), I’m just really, really anti-FUD. Compete on tech, compete on features, compete on value, compete on price, compete on whatever it is that makes you different. Just don’t do it in a misleading, dishonest way. Respect your customers enough to know they’ll see through blatant misrepresentations, and respect your products enough to let them compete fairly.
—
Updated: Check out Tony Bourke’s great response here.

When your only tool is a hammer…

…everything looks like a nail.

Surely everyone has heard this saying, as a way of suggesting that one’s sphere of knowledge is perhaps limited. Indulge me for a moment in an example from my own past.

It was the early nineties and I was working on my truck, a 1982 Datsun 4×4 pickup.

Datsun 720 Pickip (not mine) — Datsun 720 Pickip (not mine – but similar)

I had been commuting from my house in Sacramento, CA to my job in San Jose, CA, when the pitman arm decided to snap as I pulling out of a Wendy’s in Vacaville, smashing my driver’s side fender into the concrete barrier outside the drive-through. Lucky that I’d stopped for food and the failure occurred at 1mph instead of screaming down the freeway.

Now, for those of you unfamiliar with a pitman arm, here’s where it fits into a typical steering setup:

The pitman arm connects the steering box (manual or power) to the rest of the steering links. In other words, once broken, you have absolutely zero control over the direction of your vehicle. Neat, huh?

A friend with a trailer came to rescue me and dragged my truck back to my Dad’s garage, where he had a large selection of tools that I could use to repair my truck. Now, first step was to get a new pitman arm, which was easily obtained (if expensive) from the local Nissan dealer. After that, it was just a matter of yanking the old arm off, bolting in the new one, and connecting it to the linkage. Right?

Not so fast. Now, the pitman arm is pressed onto a splined shaft coming out of the steering box and then large nut secures it further. Removing the nut took a bit of doing, including a fair bit of penetrating oil, etc. Once that was off, I began trying to pull the arm from the shaft. No amount of pounding, pulling, penetrating oil, or other mechanical motivation could get it to budge. Someone suggested using a propane torch to heat up the arm to loosen in – that didn’t help (and made things worse as I’ll get to later) either. I got the brilliant idea to use a gear puller:

It was perfect! The screw would press against the steering box shaft, and pull the pitman arm off. Uh no. The “fingers” of the relatively soft metal gear puller broke about three turns in.

Desperate (it was now about 18 hours into the removal process), I went to a local mom-and-pop tool store. I asked the grizzled man behind the counter for the biggest, meanest gear puller he had. The conversation went something like this:

Tool Guy: What kind of gear are you pulling?
Me: I just need a really strong gear puller.
Tool Guy: What kind of gear are you pulling?
Me: It’s not a gear. But I need a strong gear puller to do what I’m trying to do.
Tool Guy: What are you pulling?
Me: *rolls eyes* Well, if you MUST know, it’s a pitman arm.
Me (to myself): Pshh, like you even know what a pitman arm is. Sheesh.
Tool Guy: Well, why don’t you use a pitman arm puller like this one?

Me: Uh. Yeah. I’ll take one.

No joke, once I had the right tool, the pitman arm was off in 5 minutes. I installed the new arm, aligned my steering wheel, and drove home.

In the morning, I went out to drive to work, and there’s a puddle of power steering fluid beneath my truck. Uh oh. The heat from the propane torch must have melted the seals inside the steering box. Easiest fix? Get a new steering box from a junkyard and install it. Guess what the new steering box had already installed? A perfectly functional pitman arm that I didn’t have to pull.

So what lessons did I learn here?

Just because you have a large number of tools at your disposal, don’t assume you have them all – or even that you know what tools exist. Simply knowing that there was a tool called a “pitman arm puller” would have saved me days of effort, and ultimately, a lot of money in replacing parts that I damaged by using the wrong tools.

Always be looking for new or better tools. “I’ve always done it this way” is not a valid explanation for why you used the tool you did. “This is the best tool I have found so far, what other ideas do you have?” is a much better approach.

Don’t be so arrogant as to assume you know more than everyone else – even if you happen to know a lot. No one person can contain the whole of human knowledge – reach out to experts in their respective fields (like my Tool Guy) to see if they have ways to make your problems easier to solve. Approach this research with an open mind and humility – it will serve you well.

In technology, it’s very easy to get confortable with the tools/products/vendors that you’ve always used, especially if you can solve the problems you encounter with them. Even if you can solve the problem in front of you with the “way I’ve always done it”, why not be open to new options that might be easier, or solve the problem in a better way?

If you get locked into solving a particular technical problem with a particular product, you may miss opportunities to discover that you’re solving the wrong problem in the first place. Taking a step back from the problem and realizing that you could avoid it in the first place by solving a problem further up the line may ultimately be more efficient. Engage experts. Be humble. Be open to new ideas, technologies, and approaches.

I remind myself of these lessons every day.

Dave’s Pet Peeves

This post could also be named “Why market share doesn’t matter”, “Why I don’t care what now-standard ideas you invented”, or “What have you done for me lately?”

In my career, like most of you, I have sat through too many product presentations, marketing pitches, and technical demos to count. I have talked to countless engineers, account managers, architects, gurus, and charlatans. Some folks fit more than one category.

It struck me recently that I tune out almost immediately when I hear, in the context of explaining why I should choose a particular product, that “we have the largest market share and ship more units of this tech than any of our competitors!” Why? Because if that’s your lead story, and not the quality or innovativeness of your tech, you’re riding on past success and inertia. I’m not interested in inertia. I want to see what you’re doing that’s INTERESTING.

A certain large OEM told me that they invented a particular technology class (when in fact it was invented – more or less – by a company they acquired), as a basis for why their technology was superior. Now, don’t get me wrong – “we’ve been doing this longer than anyone else, and therefore have had more time to refine our solution” is a perfectly valid argument – but not as your lead story. Likewise, telling me (this was another OEM) that you invented a particular idea (even though you didn’t) and that everyone else is copying you now should NOT be your marketing pitch. In fact, if you tell me that you came up with an idea first and then everyone else jumped on the bandwagon later, it actually makes me want to look to your competitors. Since they have the advantage of having seen what you did right and wrong, and were able to craft their solutions afterwards – what’s to say that theirs aren’t better? First to market does NOT necessarily mean best.

Just because you were first to market doesn’t mean you’re not the best either – I’m just saying that to me, that fact is irrelevant.

I’ve been accused of being biased to particular technology companies, but that’s not actually the truth. I’m biased to technologies that make sense to me and solve real problems that I have or see in my customers’ environments. If my “A Game” technology (see link for Joe Onisick’s explanation in the context of Cisco UCS) competes with your product, it isn’t because I dislike your technology, it’s because for the problems I’m trying to solve, I prefer this one. Come out with something better, and I’ll look at it.

In my new role, I have the advantage of partnering with many different OEMs and selecting the right products to meet my customers’ needs. These needs are not always purely technical. But as a technical guy, I’m going to start with my “A Game” solution unless a customer requirement dictates something else, or something better technically comes along.

So this is my message to AMs, PMs, and anyone else that wants to convince me (and I’m very open to being convinced, I just have very high standards) to look at their technology: Do not lead with market share, time in the market, or that you invented a particular class of technology. Tell me what you do that’s innovative, solves my customers’ problems, and does it better/faster/cheaper than your competitors. THAT is what I care about.

Moving on…

And so today begins the next chapter of my processional career.

For the last five years, I have held various roles within Firefly Communications, the premier datacenter education partner within the Cisco ecosystem. I started out as a storage instructor and consulting, teaching primarily Cisco MDS courses, before moving into a little known (for some pretty good reasons) product called VFrame Data Center. It was part of an acquisition Cisco made (TopSpin), and provided an interesting mix of server deployment, automation, network configuration, and policy enforcement across Cisco and other products. As a product, it had massive potential, but a couple of significant flaws – most notably, it’s reliance on the APIs or command lines of third party products that Cisco could neither control nor predict. When that product died, I went back to teaching MDS, and a little bit of a new thing called Fibre Channel over Ethernet on this newfangled line of switches called “Nexus”.

The combination of skill in those three products (MDS, Nexus, and VFrame Data Center) would turn out to be very fortuitous, as I was invited (along with Joe Onisick and Fabricio Grimaldi, both rockstars of the datacenter) to be among the first to learn “Project California”, a mostly-secret project to produce Cisco’s first foray into compute – what eventually would be released as the Cisco Unified Computing System. UCS became my primary technology focus for the rest of my time at Firefly.

Over the subsequent years, I moved through several roles in Firefly, including Product Line Director for the UCS platform, Chief Technology Officer for the Americas, and finally Vice President of Engineering, overseeing all technical instructors, platforms, and internal IT for the company worldwide. It was a challenging but rewarding position, where I was able to use my love of technology and mentoring, and develop my managerial and leadership (not the same things!) skills through interaction with some great mentors.

Eventually, though, it came time to make a change. I had reached the end of what I felt I could accomplish professionally at Firefly, so I decided that it was time to move on.

In deciding what I wanted to move into, I consulted some great peers, associates, and legends for assistance (thanks all of you who provided guidance – especially @drjmetz @bradhedlund @omarsultan @jonisick) and boiled my interests down into a few key areas:

Technology Evangelism

I love taking a piece of technology that I’m passionate about, and getting others just as excited about it or helping them see how to solve key operational or business problems with it. I’ve done that with MDS, Nexus, and UCS over the last five years and been very successful and fulfilled by it.

Staff Mentoring

One of the more rewarding parts of working for Firefly was the opportunity to work with some of the best and brightest people in the datacenter space. Everyone brings their own unique talents and experiences, and I was able to learn just as much from working with them as any class could ever teach. At the same time, I enjoyed being able to help others develop their skills – whether technical, presentation, instruction, or just general business experiences.

Independence

Firefly afforded me a great deal of independence in how I went about mentoring my team and accomplishing the strategy as set forth by the rest of the senior leadership. Not being tied to a desk job with an hour commute each way was very important to me. Being out in the field, in front of my team and customers was always one of the best parts of my job.

After evaluating a number of different roles and companies, I have selected World Wide Technology as my new professional home. I will be working as a Technical Architect in the Federal sales team. I’m excited – and just a little bit nervous – about stepping out of my comfort zone in a familiar company and striking it out on a new adventure. Without challenge there is no growth, and without growth there is only decay. So let’s go see what’s out there.

One of my new year’s resolutions will be to blog more and get more ideas and discussions flowing. In my new role, I will have a much more broad set of technologies to focus on beyond just UCS, so I hope to do the same with my blog here.

Thanks everyone for reading and for your support!

Another “Why doesn’t Cisco…?” post?

On Twitter, @timmylevad said he’d like to see another “Why doesn’t Cisco?” post like I did .

So this is a request for your ideas on what you’d like to see in such a post. I’ll try to answer as many as I can. Please add your suggestions as a comment to this post.

Request for help! NetApp gurus, please chime in!

Haven’t used my blog for this purpose before, but figured it couldn’t hurt to give it a shot!

I’m not going to have an opportunity to test this until next week, so I’m hoping to get some feedback about whether or not I’m attempting the impossible. I’m trying to see if I can multiple sub-interfaces on a NetApp VIF all in the same subnet and with the same IP address. This is for a lab environment where each VLAN represents an isolated lab, but I want all of the labs to be able to connect to the same NetApp device using the same IP address so as to simplify addressing, etc.

This diagram describes basically what I’m trying to accomplish. Can anyone confirm if this will or will not work?

Firefly Communications named Cisco 2012 Global Learning Partner of the Year!

While this is my personal blog, I make no secret of who I work for. I’ve thus far refrained from posting anything remotely sales-y or promoting my company, but I’m very proud of this group accomplishment and wanted to share. Every year at Cisco’s partner conference, Cisco awards a small number of partners (out of the almost 70,000 worldwide!) with awards in various categories for excellence in the prior year.

I’m proud to announce that the company I work for, Firefly Communications, has been awarded the Global Learning Partner of the Year for 2012! I’m extremely proud of the team I work with and the recognition from Cisco on our dedication to education and the adoption of new technologies.

And now back to our regularly scheduled programming.

FCoE vs. iSCSI vs. NFS

The following was just a short note I wrote in an internal discussion about FCoE vs. iSCSI vs. NFS – and spurred by Tony Bourke’s discussion about methods for implementing FCoE.

This wasn’t intended to be a detailed analysis, just a couple of random musings. Comments as always are welcome.

—–

While NFS and iSCSI are completely different approaches to accessing
storage, they both “suffer” from the same ailment – TCP. Remember folks,
TCP was developed in the 70’s for the express purpose of connecting
disparate networks over long, latent, and likely unreliable links. The
overheads placed onto communication solely to address these criteria
simply aren’t appropriate in the datacenter. We’re talking about a
protocol written to support links slower than your Bluetooth headset. 🙂

iSCSI is a hack, plain and simple. It solves a cost problem, not a
technology one. Even its name is misleading – iSCSI. It isn’t SCSI over
IP – it’s SCSI over TCP over IP. So call it tSCSI or tiSCSI.

I’m not saying they’re not “good enough”, but why do “good enough” now
that “better” is getting much closer in price? On the array side, I
expect more and more vendors to go the NetApp route – all protocols in one
box – just turn on which ones you want to use (via appropriate licensing,
of course). 10G DCB makes this even easier and more attractive – one
port, you pick the protocol you’re comfortable with.

As one of my coworkers points out, FCoE is a bit of a cannon – and for many customers,
their storage challenges are more in mosquito scale.

Fibre Channel was developed with storage in mind as a datacenter protocol,
I haven’t seen one yet I like better for moving SCSI commands around *in
the datacenter*.   I’m sure someone will develop a new protocol at some
point that utilizes DCB-specific architectures to replace iSCSI and
FCoE… but why?   If you want a high performance, low latency,
made-for-storage protocol, run FC over whatever wire you feel like.   If
you want a low-cost solution utilizing commodity
hardware/switching/routing, use iSCSI and/or NFS.   I don’t know that
there’s a new problem to solve here.

For customers that already have and know FC, FCoE is a no-brainer.
Nothing new to learn about how to control access, you’re just replacing
the wires. iSCSI and NFS introduce whole new mechanisms and mindsets into
accessing storage if you’re not used to them.

I saw a quote the other day that said that Fibre Channel is like smoking –
if you’re not already doing it, there’s no reason to start now. I get
the sentiment, but I don’t agree. FC as a protocol is the right tool for
a lot of jobs – but it’s not the right tool for every job.

Why do I need an active uplink to use an appliance port?

Reader Peter sent the following question as a comment to my Direct Attach Appliance Ports post.

When I connected my NAS to use the appliance port, in order for the vnic of the blade server to communicate with the NAS, i found that there should be a connected uplink port in the 6100 even though I created a private VLAN for the appliance port and the vnic. Why?

I thought the topic would make for a nice brief post explaining the Network Control policy and its effect on Service Profile vNIC objects.

By default, all vNIC configurations use a Network Control Policy called “default” which is created automatically by UCS Manager.

The policy specifies that CDP frames are not delivered to any vNIC using this policy, and that if no uplinks are available, that the vNIC should be brought down.

In Peter’s case, since there were no uplinks available, his vNIC is kept down keeping him from using the appliance connected to the Fabric Interconnect.

If we change the policy to instead to Warning, the vNIC will be kept up (though a system warning will be generated) even when there are no available uplinks. Note that this effectively disables the Fabric Failover feature on any vNIC using this policy.

If you have some interfaces that you want to stay up even when there are no available uplinks, create a policy with this setting and then specify it in the vNIC configuration. Alternatively, if you want the default behavior of all vNICs (unless specifically configured) to be that they stay up even when no uplinks are available, you can modify the “default” policy as shown here.