From the blog.

Managing Digital Racket
The more I tune out, the less I miss it. But that has presented me with some complex choices for a nuanced approach to curb
Complexity – My Friend, My Enemy
Over my years of network engineering, I've learned that the fewer features you can implement while still achieving a business goal, the better. Why? Fewer

Planning A Physical Data Center Rack Cleanup

1,191 Words. Plan about 8 minute(s) to read this.

I’m part of a project that’s going to do some physical rack cleanup. As in, the cables are a mess, labeling isn’t consistent, power distribution isn’t quite what it should be, and it’s gotten to the point where doing maintenance on any of the hardware is tough. So, it’s time to tidy everything up. I’ve been a part of several of these kinds of projects before, and I’ve collected a bunch of data on how to approach it. I thought I’d share here for anyone facing the same challenge. Hopefully, others of you will respond with your own experiences.

As you read this, put yourself in the shoes of the project manager responsible to make this cleanup effort a success.

IMG_0057

Preparation

  • Prep is the most important factor in success. Don’t leave anything to the maintenance window that can be done ahead of time, even little things that seem like no big deal. Your maintenance window isn’t very long, no matter how many hours you are given.
  • Build a process so that everyone knows what’s going to happen when as things move along. More detail is better, because it makes you think through the specifics, and you might recognize dependencies you hadn’t thought of before.
  • For the time of the maintenance, you will want to have on hand: a server-switch network port spreadsheet; and probably a server-PDU power port mapping as well. You might also want a spreadsheet that you can use as a physical rack layout reference. A column can represent a rack with each row an RU. Very helpful when moving things around. (Server X is moving to RU 17-19 in rack 2. Why? Because the spreadsheet shows us it does.)
  • If there’s any possible way to make rack layouts at multiple sites look the same, do that.
  • The last thing to do before starting to tear things apart is take notes on the status of monitoring systems. What’s up? What’s down? That way, you’ll know what was already broken before starting, so that you’re not trying to chase them down after as if you’d broken something during the maintenance.

Cabling

  • Get Velcro rolls. Cut strips ahead of time to bundle cables together. “Too much Velcro” is not a viable phrase in the English language.
  • Route cables along the edges of the cabinet to be sure you can always get a server/router/whatever in and out of the rack. I have had the best luck running network cables along one side of the cabinet.
  • Keep fiber optic cables and copper cables in separate bundles. Fiber optic cables are sensitive. It’s wise to minimize physical strain on them.
  • Be careful of cable bend radius. Don’t fold network cables over on themselves trying to neaten them up. You will make the packets cry. If the cable is too long and you can’t replace it with a proper length cable, coil the excess in loops instead.
  • Put power cables and network cables in separate bundles, even if they are right next to each other. That reduces the risk that running a new network cable will impact power and vice-versa.
  • A project like this is a good time to get all-new network cables. If you want to color code network cables (green for storage, blue for app data, etc.), do that. Personally, I don’t care to color code because the scheme is often impossible to stick with. My personal preference is well-labeled cables that are all one color.
  • Get the network & power cable lengths as close as possible. Gross excess length makes for loops which take up space and block airflow.
  • If you do get new network cables, plan to un-bag and un-twisttie them before the maintenance. It takes a surprising amount of time to do that.

Labeling

  • Every server, router, firewall, PDU, etc. should be labeled with hostname and ideally the management IP address.
  • If it’s a rack-mounted device, the device should be labeled front AND back so that you can stand on either side of cabinet and know what you’re looking at. This is beneficial for new staff members, and also helpful to keep things straight as more equipment is added to the cabinet.
  • Label each power or network cable on both ends with the same label – what the device is, and what it’s going to. This should all be done ahead of time. It won’t be possible to label the cables as you’re installing them due to time constraints, and it’s unlikely you’ll ever go back to label them.

Power

  • Evenly load balance amp draw across PDUs and phases.
  • Redundant power supplies in a device need to go to different PDUs, assuming the PDUs are fed by different sources. Not sure what PDUs are fed by what power sources? Find out.
  • Remember that amp loads are at peak when everything is powering up. So, don’t push a phase to 80+% utilization for a baseline amp draw. For example, assuming a 20 amp phase, a 16 amp “normal” draw is most likely too much. In a total power failure situation, the breaker will trip when the power comes on and everything on the phase spins up, as it will probably exceed 20 amps. So, if there’s any overloaded phase, a cleanup project is a great time to re-balance.

General Rack

  • Assuming redundant systems or cluster members, ideally odd-numbered units are in one cabinet, and even-numbered units are in the same RU next to them (or down the row or across the data center if the systems physically permit that). This scheme is useful for power distribution, and can be helpful in worst-case environmental scenarios such as a water leak or fire. (I have yet to see a fire in a DC, but have dealt with at least one astonishing water leak.)
  • Keep cabinets same-same when possible. Such as, Server1 in Rack1 plugs into the same PDU & outlet as Server2 in Rack2. This doesn’t always work out, but is nice when it can. Predictability is great when you can get it.
  • Consider reserving a specific set of RUs for new systems in a way that preserves the overall rack layout. That might be tough if RUs are tight, but it’s worth a thought.
  • Along those lines, consider what gear you are planning to retire over time. Maybe it makes sense to stick a device scheduled for death at the bottom of the rack, something like that.

The Actual Window

  • Someone needs to tell the rest of the crew what’s happening next, and what their roles are. Brains get foggy during middle of the night maintenance windows, and someone needs to know the process exactly to keep the project on time and the tasks moving ahead. All persons involved need to know who that person is.
  • Again, remember that no matter how much time you’ve been given for the window, it will go by quickly. So do everything you can ahead of time.

Thoughts to share? Things I missed? Share your tips for success when cleaning up a rack with the community in the comments below.