Why do I need an active uplink to use an appliance port?

Reader Peter sent the following question as a comment to my Direct Attach Appliance Ports post.

When I connected my NAS to use the appliance port, in order for the vnic of the blade server to communicate with the NAS, i found that there should be a connected uplink port in the 6100 even though I created a private VLAN for the appliance port and the vnic. Why?

I thought the topic would make for a nice brief post explaining the Network Control policy and its effect on Service Profile vNIC objects.

By default, all vNIC configurations use a Network Control Policy called “default” which is created automatically by UCS Manager.

The policy specifies that CDP frames are not delivered to any vNIC using this policy, and that if no uplinks are available, that the vNIC should be brought down.

In Peter’s case, since there were no uplinks available, his vNIC is kept down keeping him from using the appliance connected to the Fabric Interconnect.

If we change the policy to instead to Warning, the vNIC will be kept up (though a system warning will be generated) even when there are no available uplinks.   Note that this effectively disables the Fabric Failover feature on any vNIC using this policy.

If you have some interfaces that you want to stay up even when there are no available uplinks, create a policy with this setting and then specify it in the vNIC configuration.  Alternatively, if you want the default behavior of all vNICs (unless specifically configured) to be that they stay up even when no uplinks are available, you can modify the “default” policy as shown here.

UCSM 1.4 : Maintenance Policies and Schedules

Strange as it may seem with all of the great new features in UCSM 1.4, this is one of my favorites.

To understand the impact, first look at the way disruptive changes were handled prior to this release.   When changing a configuration setting on a service profile, updating service profile template, or many policies, if the change would cause a disruption to running service profiles (i.e. requiring a reboot), you had two options : yes, or no.  When modifying a single profile, this wasn’t a big issue.  You could simply change the configuration when you were also ready to accommodate a reboot of that particular profile.   Where it became troublesome was when you wanted to modify an updating service profile or policy that affected many service profiles – your choice was really only to reboot them all simultaneously, or modify each individually.   Obviously for large deployments using templates and policies (the real strength of UCS), this wasn’t ideal.

With UCSM 1.4, we now have the concept of a Maintenance Policy.   The screenshot below is taken from the Servers tab:

Creating a Maintenance Policy allows the administrator to define the manner in which a service profile (or template) should behave when disruptive changes are applied.   First, there’s the old way:

A policy of “Immediate” means that when a disruptive change is made, the affected service profiles are immediately rebooted without confirmation.   A normal “soft” reboot occurs, whereby a standard ACPI power-button press is sent to the physical compute node – assuming that the operating system traps for this, the OS should gracefully shut down and the node will reboot.

A much safer option is to use the “user-ack” policy option:

When this option is selected, disruptive changes are staged to each affected service profile, but the profile is not immediately rebooted.   Instead, each profile will show the pending changes in its status field, and will wait for the administrator to manually acknowledge the changes when it is acceptable to reboot the node.

The most interesting new option is the is the “timer-automatic” setting.   This setting allows the maintenance policy to reference another new object, the Schedule.

Schedules allow you to define one-time or reoccurring time periods where one or more of the affected nodes may be rebooted without administrator intervention.  Note that the Schedules top-level object is located within the Servers tab:

The only schedule created automatically by UCSM is the “default” schedule, which has one recurring entry to reboot all service profiles that reference a “timed-automatic” maintenance policy associated with the “default” schedule each day at midnight.   This “default” schedule can be modified, of course.

Creating a user-defined schedule provides the ability to control when – and how many – profiles are rebooted to apply disruptive changes.

The One Time Occurrence option sets a single date and time when this schedule will be in effect.  For example, if you wanted all affected profiles to be rebooted on January 18th at midnight, you could create an entry such as the following.

Once the date and time have been selected, the other options for the occurrence can be selected.

Max Duration specifies how long this occurrence can run.   Based on the other options selected below it, it is possible that all service profiles may not be able to be rebooted in the time allotted.   If this is the case, changes to those profiles will not take effect.

Max Number Of Tasks specifies how many total profiles could be rebooted by this occurrence.

Max Number Of Concurrent Tasks controls how many profiles can be rebooted simultaneously.   If, for example, this schedule will be used on a large cluster of service profiles where workload can be sustained even while 5 nodes are unavailable, set this value to 5 and the reboots will occur in groups of that size.

Minimum Internal Between Tasks allows you to set a delay between each reboot.  This can be set to ensure that each node rebooted is given time to fully boot before the next node is taken down.

The Recurring Occurrence option provides for the creation of a schedule that will run every day, or on a specific day, to apply disruptive changes.

This option has the same per-task options as the previous example.

Once you have created your maintenance policy and schedule (if necessary), the service profile or service profile template must reference the maintenance policy in order for it to have any effect.  After selecting your service profile or template, the Actions window has an option to Change Maintenance Policy.

You may then select the Maintenance Policy you wish to use, or create a new one.

The service profile properties will now show that a maintenance policy has been selected.

In this example, a policy requiring user acknowledgement has been chosen.   Now if any disruptive changes are made, the service profile will not reset until manually acknowledged by an administrator.   Any time profiles are awaiting acknowledgement, a warning “Pending Activities” will be shown on the UCSM status bar.

Within the profile properties, a description of the pending changes will be displayed along with the “Reboot Now” option.

I hope this description of the new maintenance policies and schedules options was helpful.  I’m very excited by all the new features rolling into UCS – it was a great system before, and it’s only getting better!

Chassis Discovery Policies in UCS

Cisco UCS provides a configurable “Chassis Discovery Policy” that affects how chassis links (called “Server Ports”) are activated when discovering a new chassis.  See this page on cisco.com.

After a recent discussion on Twitter, I decided to test out a few scenarios.

My configuration is a pair of UCS 6120 Fabric Interconnects, two UCS 5108 Chassis, with 2 links per IO Module.

Additionally, this particular lab is on a reasonably old version of UCSM code, 1.0(1f).    I’m not in a position to upgrade it at the moment – I can re-test at a later date with more current code.  I don’t expect the behavior to change, however.

I started with the default discovery policy of 1-link.   I enabled one link per fabric interconnect to a chassis, and checked the status of the ports.   The ports were in an “Up” Overall Status, which is proper – and means that the link is available for use.   I then enabled a second link per fabric interconnect to the same chassis.   The second activated link went into a “Link Up” state, with “Additional info” of “FEX not configured”.   This means that the physical link has been activated, but is not in use – the IO Module (FEX) is not using the link.   A re-acknowledgement of the chassis activates the second link.

I then changed the discovery policy to 4-links.   I enabled one link per fabric interconnect to a different chassis, and checked the status of the ports.  The ports went into the “Link Up” state, “FEX Not Configured” – in other words, the links are not use.   While we can detect that the chassis is present, no traffic will flow as the FEX has not yet been configured to pass traffic.   Furthermore, the chassis is an error state, in as much as the cabling doesn’t match the chassis discovery policy.

Reacknowledging the chassis activates the links, and removes the error condition, even without changing the discovery policy.

Finally, I tested the scenario of setting the chassis discovery policy to 2 links and activating only one link per Fabric Interconnect.   As expected, the link enters the “link-up”, “FEX not configured” state.   I then activated a second link per Fabric Interconnect.   After a brief period, both ports per Fabric Interconnect enter the “Up” status.

In short, setting the Chassis Discovery Policy determines how many links must be present per Fabric Interconnect to an IO Module before the ports are activated and the chassis is put into service.   If the policy is set at 1 link, and more than 1 link are activated, a simple re-acknowledgement of the chassis will activate the additional links.   If the policy is set to a higher number – and that number of links are not present – the chassis will not be activated unless a manual re-acknowledgement is done.

Frankly, I don’t see a lot of value in this feature – unless you’re adding a large number of chassis that will all have identical numbers of uplinks.  Even then, you’re saving yourself at most a few clicks of the mouse to re-ack the chassis that don’t match the policy.   Why not simply leave the policy at default and re-ack any chassis that has more than 1 link per IOM?   Presumably you’re going to do this before actually deploying any service profiles, so there’s no potential for disruption with the re-ack.

Thoughts or comments?