The UPS’s job is that in the event of a power outage, it powers the DC long enough (about 10 seconds) for the on-site generator to kick in and provide DC power indefinitely…as long as the client has enough fuel and the generator doesn’t break down.
For this UPS, the batteries were last changed in 2010. The battery maintenance company was starting to register fluctuations in battery levels that could lead to a failure. Since the batteries are in a string and the UPS is in circuit with the Data Center (it also filters the dirty city power before sending it to our servers), a battery failure could crash the DC.
So given how long it’s been since the last battery change and the possibility of a battery outage causing a Data Center outage, it was time to change all the batteries in the UPS. It’s also good practice to put in fresh batteries every 3-4 years.
Here’s how we approached our UPS battery replacement and how it worked. Hopefully, there are some lessons here that you can use the next time you have to change your DC UPS batteries.
- It wasn’t necessary to power down the entire Data Center during the battery swap, but we had to mitigate our risks. Before the battery swap could occur, the plan was to bypass the UPS and switch the Data Center to straight city power, taking the UPS out of the DC power supply. This meant that DC power wouldn’t be filtered through the UPS and the generator could not take over if there was a power failure. So we devised a risk avoidance and mitigation plan during the battery swap, in case something occurred with city power during that time.
- We determine which DC systems were most at risk to hurt the business if they failed due to power outage. During the swap, Management decided to take down the systems that a) would have a significant impact in the event of a sudden outage; b) had no backup system if a power surge fried their circuits; and c) would be difficult to rebuild and replace. Several systems were identified.
- We negotiated with management as to the best time to take the critical systems listed above down. We settled on Sunday morning. The life of an IT manager is that you have to work while others are sleeping or relaxing.
- The IT Infrastructure team arrived on site in advance of the battery change to shut down the key systems. This can always be a little bit of an adventure because you’re shutting down systems that usually run without interruption for months at a time. We had to shut down two IBM i Capacity BackUp systems (CBUs) and both systems took longer than expected to come down. My advice is here is to plan for delays.
- Flipping the switch to go to city power. This can be a moment of truth in that you don’t know what will happen when your power source changes. Even a brief flicker could be enough to take down sensitive equipment. In our case, there was no issue.
- Keeping management informed. It’s always a good idea to send out emails to other management team members telling them how the project is going. This is a key agenda item for us whenever we make a big switch.
- Flipping back to UPS-filtered power. This can be another moment of truth as we are starting to run on new batteries that have never been used before. In our case, we had two sets of installers on site: the battery installer and the manufacturer of our UPS system. After the batteries were installed, the manufacturer tested and insured they were all good and ready to go.
- Bringing up the downed servers and equipment. The trick here is to bring up the equipment in the proper sequence. In many cases, a companion server has to be up before another server can start. We inventoried the servers that were taken down to avoid risk and restarted them in order, to make sure the entire network came up in the correct order.
- Next-day checks. Because the batteries were provided locally and didn’t need to be shipped, they contained enough of a charge to transition the DC to generator power, in the event of a power outage after the UPS came back on-line. But to insure the batteries were working correctly the first 24 hours, the installers flipped on a battery equalizer mode in the UPS. This functions as a fast charging system to make sure everything was up and running. Equalizing is also designed to insure the batteries form correctly inside your UPS. The downside of equalizing the batteries this way is that you need to turn off equalization within a short amount of time or it could damage the batteries. So the installers will come back on Monday to double-check the new install and to turn off equalizer mode.
So that’s my Sunday morning in the Data Center: the third of five major project weekends we’ve scheduled to kick off 2014 with a bang. Last weekend, my tech staff installed IBM i 7.1 on a development partition. Next week, we’re performing an IBM i high availability on a newly installed CBU machine in a new DC. More dispatches from the field coming later.