Defining VN-Link

The misunderstanding of Cisco’s enhanced network products for VMware environments has hit critical mass.  At this point very few people know what does what, how, and when to use it.  Hopefully this will demystify some of it.

VN-Link:

Product name for a family of products, does not specifically refer to any one product so forget the idea of hardware vs. software implementation, etc.  Think of the Nexus family of switches: 1000v, 2000, 4000, 5000, 7000.  All different products solving different design goals but are components of the Data Center 3.0 portfolio.  The separate products that fall under VN-Link are described below:

Nexus 1000v:

The Nexus 1000v is a Cisco software switch for VMware environments.  It is comprised of two components: a Virtual Supervisor Module (VSM) which acts as the control plane, and a Virtual Ethernet Module (VEM) which acts as a data plane.  2 VSM modules operate in an active/standby fashion for HA and each VMware host gets a VEM.  This switch is managed by a Cisco NXOS CLI and looks/smells/feels like a physical switch from a management perspective…that’s the whole point:

‘Network teams, here’s your network back, thanks for letting us borrow it.’  – The Server Team

The Nexus 1000v does not rely on any mystical magic such as VN-Tag (discussed shortly) to write frames.  Standard Ethernet rules apply and MAC based forwarding stays the same.  The software switch itself is proprietary (just like any hardware/software you buy from anyone) but the protocol used is standards based Ethernet.

Hypervisor Bypass/Direct path I/O:

Hypervisor bypass is the ability for a VM to access PCIe adapter hardware directly in order to reduce the overhead on a physical VMware’s hosts CPU.  This functionality can be done with most any PCIe device using VMware’s Direct-Path I/O.  The advantage here is less host CPU/memory overhead for I/O virtualization.  The disadvantage is currently no support for vMotion and limits as to the number of Direct-Path I/O devices per host.  This doesn’t require Cisco hardware or software to do, but Cisco does have a device that makes this more appealing in blade servers with limited PCIe devices (the VIC discussed later.)

Pass Through Switching (PTS):

PTS is a capability of the Cisco UCS blade system.  It relies on management intelligence in the UCS Manager and switching intelligence on each host to pull management of the virtual network into the UCS manager.  This allows a single point of management for the entire access layer including the virtual switching environment, hooray less management overhead more doing something that matters!

PTS directly maps a Virtual Machines virtual NIC to an individual physical NIC port across a virtualized pass-through switch.  No internal switching is done in the VMware environment, instead switching and policy enforcement are handled by the upstream Fabric Interconnect.  What makes this usable is the flexibility on number of interfaces provided by the VIC, discussed next.

Virtual Interface Card (VIC) the card formerly known as Palo:

The virtual interface card is an DCB and FCoE capable I/O card that is able to virtualize the PCIe bus to create multiple interfaces and present them to the operating system.  Theoretically the card can create a mix of 128 virtual Ethernet and Fibre Channel interfaces, but the real usable number is 58.  Don’t get upset about the numbers your operating system can’t even support 58 PCIe devices today ;-).  Each virtual interface is known as a VIF and is presented to the operating system (any OS) as an individual PCIe device.  The operating system can then do anything it chooses and is capable of with the interfaces.  In the example of VMware the VMware OS (yes there is an actual OS installed there on the bare metal underneath the VMs) can then assign those virtual interfaces (VIF) to vSwitches, VM kernel ports, or Service Console ports, as it could with any other physical NIC.  It can also assign them to the 1000v, to be used for Direct-Path I/O, or to use with Pass-Through Switching.  Even more important is the flexibility to use separate VIFs for each of these purposes on the same host (read: none of these is mutually exclusive.)   The VIC relies on VN-Tag for identification of individual VIFs, this is the only technology discussed in this post that uses VN-tag (although there are others.)

VN-Tag:

VN-Tag is a frame tagging method that Cisco has proposed to the IEEE and is used in several Cisco hardware products.  VN-Tag serves two major purposes:

1) It provides individual identification for virtual interfaces (VIF.)

2) It allows a VN-Tag capable Ethernet switch to switch and forward frames for several VIFs sharing a set of uplinks.  For example if VIF 1 and 2 are both using port 1 as an uplink to a VN-Tag capable switch device the VN-Tag allows the switch to forward the frame back down the same link because the destination VIF is different than the source VIF.

VN-Tag has been successfully used in production environments for over a year.  If you’re using a Nexus 2000, you’re already using VN-Tag.  VN-Tag is used by the: Nexus 2000 Series Switches, the UCS I/O Module (IOM), and the Cisco Virtual Interface Card (VIC.)  The switching for these devices is handled by one of the two VN-Tag capable switches: Nexus 5000 or UCS 6100 Fabric interconnect.  Currently all implementations of VN-Tag use hardware to write the tags.

– Joe Onisick (http://www.definethecloud.net)

Fantastic post on statelessness : HP VirtualConnect vs. Cisco UCS

M. Sean McGee posted this great comparison of VirtualConnect and UCS.   I’ve often struggled to give students a clear picture of the differences – HP will tell you that “VirtualConnect is just as good, and we’ve been doing it for years!”.  Well, yes… it does some things similarly, and you can’t argue the timeframe.   UCS does a lot more – and until now, I didn’t have a great source that directly compared them.   From now on, all I have to do is send them to M. Sean McGee’s post!

The State of Statelessness

UCS Manager 1.2(1) Released

As a full UCS bundle (including all code – from the lowliest baseboard management controller to the UCS Manager in all it’s process-preserving glory), Cisco has released version 1.2(1).

Full release notes are available here.

To summarize, this release adds support for the soon-to-be-shipping “M2” versions of the UCS blades, which support the Intel Xeon 5600-series processors (Westmere), which include the 6-core version of the Nehalem lineage.   There are also numerous bug-fixes (expected in this generation of product), including many on my list of “slightly annoying but still ought to be fixed” bugs.

MAC forwarding table aging on UCS 6100 Fabric Interconnects

I was recently forwarded some information on the MAC table aging process in the UCS 6100 Fabric Interconnects that I thought was very valuable to share.

Prior to this information, I was under the impression (and various documentation had confirmed) that the Fabric Interconnect never ages MAC addresses – in other words, it understands where all the MAC addresses are within the chassis/blades, and therefore has no need to age-out addresses.   In the preferred Ethernet Host Virtualizer mode, it also doesn’t learn any addresses from the uplinks, again, so no need to age a MAC address.

So what about VMware and the virtual MAC addresses that live behind the physical NICs on the blades?

Well, as it turns out, the Fabric Interconnects do age addresses, just not those assigned by UCS Manager to a physical NIC (or a vNIC on a Virtual Interface Card – aka Palo).

On UCS code releases prior to 1.1, learned address age out in 7200 seconds (120 minutes) and is not configurable.

On UCS code releases of 1.1 and later, learned addresses age out in 7200 seconds (120 minutes) by default, but can be adjusted in the LAN Uplinks Manager within UCS Manager.

Why do we care?   Well, it’s possible that if a VM (from which we’ve learned an address) has gone silent for whatever reason, we may end up purging it’s address from the forwarding table after 120 minutes… which will mean it’s unreachable from the outside world, since we’ll drop any frame that arrives on an uplink to an unknown unicast MAC address.   Only if the VM generates some outbound traffic will we re-learn the address and be able to accept traffic on the uplinks for it.

So if you have silent VMs and have trouble reaching them from the outside world, you’ll want to upgrade to the latest UCS code release and adjust the MAC aging timeout to something very high (or possibly never).

Moving UCS Service Profile between UCS Clusters

@SlackerUK on Twitter asked about moving Service Profiles between UCS clusters.

In short, it’s not currently possible with UCS Manager without a bit of manual work.

First, create a “logical” backup from UCS Manager.  This will create an XML file containing all of the logical configuration of UCS Manager, including your service profiles.   Find the service profile you want, and remove everything else from the backup.  You can then import that XML file into another UCS Manager instance.  Be aware that everything comes with that XML, including identifiers – so make sure you’re OK with that or remove the original service profile to eliminate duplicates.

If you’re using BMC Bladelogic for Cisco UCS, it *does* have the capability to move service profiles between clusters.

Recovering from disabled HTTP and HTTPS services in UCS Manager

In order to allow for granular security control, UCS Manager allows you to disable individual services and protocols, in accordance with your policies or security goals.

Since HTTP/HTTPS are the primary methods of administering the system, and the command line is *not* a standard IOS or NXOS, how do you re-enable these protocols if they are inadvertently disabled?

Luckily for me, I got to test this out in a recent class.  🙂

Gain access to the UCS Manager CLI, either through SSH/Telnet (if still enabled) or through the serial console.   Because all configuration options are stored as objects in the UCS Manager database, we need to adjust our scope to the SystemServices layer.

ucsm-A # scope system
ucsm-A /system # scope services
ucsm-A /system/services # enable ?
  cimxml               Cimxml service
  http                 HTTP service
  https                HTTPS service
  telnet-server        Telnet service
ucsm-A /system/services # enable http
ucsm-A /system/services # enable https
ucsm-A /system/services #

That’s it… welcome back.  🙂

A real deployment’s experience with adding hardware to UCS

One of the things we talk about in Cisco UCS is how easy is it is to expand the architecture.   Each time you add a chassis in competing solutions, you have lots of management points to configure – the chassis itself, the Ethernet and potentially Fibre Channel switching/connectivity, etc.   With UCS, it’s simply a matter of “telling” the Fabric Interconnects that the chassis is there and the rest happens auto-magically.  At HealthITGuy’s Blog, Michael Heil posts his experience in adding a new chassis to a running UCS deployment.  The rest of his posts on UCS, from a customer’s perspective, are also excellent.

An interesting UCS use case…

This will the first in a number of posts I make to catch up on the great content I’ve collected and bookmarked since I got involved in UCS.

Steve Chambers over at ViewYonder posted a use case for Cisco UCS that involves using the same physical UCS infrastructure to support (one at a time) multiple customers in a disaster recovery service provider type scenario.

While I don’t necessarily agree that you’d need to have a dedicated infrastructure for each customer without UCS, I think this use case is an excellent example of how easy it is to repurpose hardware quickly and efficiently – regardless of the reason.   I’ll have another post soon discussing this very same idea when it comes to burst and redundant capacity for your various applications.

Check out Steve’s post here:

http://viewyonder.com/2009/11/19/over-commiting-your-infrastructure-for-multi-tenant-dr-with-cisco-ucs/

While you’re there, check out the rest of Steve’s great UCS content:

http://viewyonder.com/cisco-ucs/