Cisco UCS provides a configurable “Chassis Discovery Policy” that affects how chassis links (called “Server Ports”) are activated when discovering a new chassis. See this page on cisco.com.
After a recent discussion on Twitter, I decided to test out a few scenarios.
My configuration is a pair of UCS 6120 Fabric Interconnects, two UCS 5108 Chassis, with 2 links per IO Module.
Additionally, this particular lab is on a reasonably old version of UCSM code, 1.0(1f). I’m not in a position to upgrade it at the moment – I can re-test at a later date with more current code. I don’t expect the behavior to change, however.
I started with the default discovery policy of 1-link. I enabled one link per fabric interconnect to a chassis, and checked the status of the ports. The ports were in an “Up” Overall Status, which is proper – and means that the link is available for use. I then enabled a second link per fabric interconnect to the same chassis. The second activated link went into a “Link Up” state, with “Additional info” of “FEX not configured”. This means that the physical link has been activated, but is not in use – the IO Module (FEX) is not using the link. A re-acknowledgement of the chassis activates the second link.
I then changed the discovery policy to 4-links. I enabled one link per fabric interconnect to a different chassis, and checked the status of the ports. The ports went into the “Link Up” state, “FEX Not Configured” – in other words, the links are not use. While we can detect that the chassis is present, no traffic will flow as the FEX has not yet been configured to pass traffic. Furthermore, the chassis is an error state, in as much as the cabling doesn’t match the chassis discovery policy.
Reacknowledging the chassis activates the links, and removes the error condition, even without changing the discovery policy.
Finally, I tested the scenario of setting the chassis discovery policy to 2 links and activating only one link per Fabric Interconnect. As expected, the link enters the “link-up”, “FEX not configured” state. I then activated a second link per Fabric Interconnect. After a brief period, both ports per Fabric Interconnect enter the “Up” status.
In short, setting the Chassis Discovery Policy determines how many links must be present per Fabric Interconnect to an IO Module before the ports are activated and the chassis is put into service. If the policy is set at 1 link, and more than 1 link are activated, a simple re-acknowledgement of the chassis will activate the additional links. If the policy is set to a higher number – and that number of links are not present – the chassis will not be activated unless a manual re-acknowledgement is done.
Frankly, I don’t see a lot of value in this feature – unless you’re adding a large number of chassis that will all have identical numbers of uplinks. Even then, you’re saving yourself at most a few clicks of the mouse to re-ack the chassis that don’t match the policy. Why not simply leave the policy at default and re-ack any chassis that has more than 1 link per IOM? Presumably you’re going to do this before actually deploying any service profiles, so there’s no potential for disruption with the re-ack.
Thoughts or comments?
13 thoughts on “Chassis Discovery Policies in UCS”
I’m not sure this assesment is correct. My experience in testing and customer deployments has been that if the chassis link policy is set to the highest common denominator (i.e. equal or greater number of links) then the IOM will utilize all links I activate.
For instance upon initial setup set the max links to 4 and then activate the 2 links available in your lab. You should see both links come up in an UP rather than Link-Up status meaning they are both being utilized by blades. If however in your environment you leave the default links set to 1 and bring both up you’ll recieve one in an UP status and one in a Link-Up status. Meaning one is up and being used the other is just a port that has been brought up.
In either event my current ‘personal’ best practice is to set every new UCS policy to max 4-links as a policy and reacknowledge each chassis AFTER bringing all links up. This avoids any confusion and ensures proper configuration.
In either case I see this as a non-issue because with any hardware environment it’s best practice to ensure that all equipment/links are operational as expected before deploying production workloads. This means that with UCS I ensure that every available link is in an UP status prior to deploying service profiles.
In the testing I just performed, setting the policy to 4-links and activating 2 caused both links to be in a “link-up” and “FEX not configured” state. It was not until I re-acknowledged the chassis that the links moved to an “Up” state.
If you left the policy at 1 link as you suggest, a chassis could come online with 1 link when you really needed 2 as the minimum. You might have wired up 2 links but only 1 link was functional. The chassis discovery policy assures a functional baseline at the onset.
I did a BUNCH of testing of this setting it to 4, 2, and 1. I then went up and down with the number of uplinks. My conclusion was it was useless as well. You can absolutely bring up a chassis with 1 link with the policy set to 2 and the pinning will be for one uplink. I’ll dig up my notes (and screenshots that I took along the way) tomorrow if needed.
BTW – Brad and Joe, I’m not doubting the intention of it and I think it would be a good idea. But, when I tested it (1.1 code if I remember right) it just didn’t seem to actually work. Has anybody gotten it to work as desired?
It appears to me Dave got this to work as desired. His description of the behavior in this article sounds right. If the policy didn’t match the number of links, the FEX did not come online until he “Re-acknowledged” the chassis, a manual intervention that basically acknowledges the chassis is out of policy. However if the policy matched the uplinks, the FEX comes online without a “Re-acknowledge”. Right?
I guess my point is, like Joe, I’m not going to put a chassis into production without checking the status of the links. I’m not seeing any value in setting the discovery policy, when no matter what it’s set to a 1-click re-ack makes all links active. My process for adding a chassis is “enable ports, re-ack, check link status”. The chassis discovery policy isn’t valuable to me.
Your “process” relies upon someone correctly following it every time to avoid a chassis going into production with only 1 link. Simply setting the policy to 2 links would prevent the risk of human error. With a chassis discovery policy I can be assured a chassis never comes online with fewer than 2 links, for example, without relying on someone clicking “re-ack”. Its a simple insurance policy for the 5 seconds it takes to get it configured.
Brad – In Joe’s example of setting to four as a personal best practice and then “re-ack” or what I have done in the past to just set to one (or two) and re-ack. Everyone here will follow’s Dave’s “enable ports, re-ack, check link status”. It’s the responsible thing to do. I just don’t believe many customers will blindly add chassis and trust the policy. I get your point here and I see the advantages to setting the policy in large environments but I’m in the same boat as Dave on this one.
If you can’t trust the system’s ability to enforce a policy what are you doing with UCS in the first place? The entire system is policy driven.
It’s not about blindly adding chassis. It’s about reducing the potential for human error, and obtaining proper logging of the human error, should it occur.
The policy makes the assumption that all chassis will be configured equally, or at least that some minimum number of links is acceptable. If I have a mixed configuration of 2 and 4 link systems, then I’m right back to the “risk” of putting a system into production with only 2 links when perhaps I expected 4. Or if I set my policy at 4 links, I have to re-ack the 2 link systems to make them functional. In either example, one of the chassis is going to require a re-ack to make them function as expected.
My whole point is not that the chassis discovery policy doesn’t do anything, it’s just that it seems to be providing a very small benefit and creates more confusion (as evidenced here) than it’s worth.
In either case, by setting policy to 2 or 4 links, you’ll never have a chassis online with just 1 link bacause someone forgot to click “Re-ack”. Worth a 5 second investment of your time? I think so.