How to make sure your servers come back up after an extended power outage

By
Posted on
Tags: , , , , , , ,

If an extended power outage drains your UPS, and your servers are forced to shut down, will they automatically start up again when the power is eventually restored? It’s a good question, especially if your servers are in some distant, unattended server room. Unless you’ve tested your servers, don’t assume that the answer is Yes.

Many servers offer a BIOS configuration option that forces them to automatically power on when they receive line voltage. If your servers have this option, just set it and you’re done.

Unfortunately, some servers, including a Dell PowerEdge 1600SC that I’m using, lack this configuration option. When these servers turn themselves off as the final step of a UPS-controlled shutdown, they don’t start up again when the power is restored. Because they were shut down before the power was cut off, they think they are supposed to remain off when the power is restored. That is, they remember their on/off status across power outages.

Fortunately, there is a way to make sure these servers automatically power on: shut them down without powering them off; halt them instead. That way, when the UPS finally cuts off the supply voltage, the servers will still be in their “on” state, and they will remember this state across the outage. Later, when the power is restored, the servers will automatically restore their pre-outage state and power up.

With Fedora Core Linux and Network UPS Tools, it’s not difficult to make sure the servers are halted instead of powered off, but the implementation isn’t obvious. To spare you the digging, here are the important bits.

  1. When the power fails and the UPS-monitoring software decides that the batteries are almost depleted, it will initiate a server shutdown using the command defined in the /etc/ups/upsmon.conf file. The default command is this:

    SHUTDOWNCMD "/sbin/shutdown -h +0"
  2. The shutdown command will tell the init process to enter runlevel 0, which is the prepare-to-halt-the-system runlevel.

  3. The init process will stop all of the running services in an orderly fashion, and then, as the last step, invoke the final script in the shutdown process: /etc/rc.d/rc0.d/S01halt.

  4. The final lines of the S01halt script will power off the server. Unless, that is, the file /halt is present, in which case the script will halt the server instead.

Thus the trick is to make sure that the /halt file does exist. The trick turns out to be easy to pull off; just redefine the shutdown command in /etc/ups/upsmon.conf:

SHUTDOWNCMD "/bin/touch /halt; /sbin/shutdown -h +0"

And that’s all there is to it!