doc:guides:ups

UPS management on Linux

This is a list of requirements for proper function of all the scenario as well as some advices

  • Systems which will be powered by the UPS must configure BIOS to boot again after a power failure.
  • Linux must do a halt instead of poweroff when is doing the shutdown. You can configure this on HALT variable in /etc/default/halt
  • Is a good practice to first make all the configuration without really powering the system from the UPS so you can shutdown it safely
  • Make a timing on how much time need each system to shutdown to safely state

Slave

Is the system which is powered by the UPS but has no control over the device. It will run the upsmon daemon which queries the Master to know the power status and events and act upon them.

Master

The system which has the direct access and control over the UPS device (is attached to it).

It runs the upsdrvctl to manage it and the upsd daemon to provide UPS status information to connected systems (slaves) as it probably also depend on the UPS power will be also running upsmon as a slave (but with slightly different, and signifcant configuration)

The Master system is responsible to notify all slaves in case of power outage and wait for them to set up accordignly (shutting down), and at last is the responsible to cut the power completely down so when the power is restores all systemes boot again.

There are various daemons and programs involved on the architecture that you should know their role:

Program Configuration System Role
upsdrvctl ups.conf Master Device driver
upsd upsd.conf,upsd.users Master Comuncating with driver and the slaves
upsmon upsmon.conf Master and Slave The NUT client who queries the master to be aware of the power status and events
upscched upssched.conf Master Power Event management scheduler

Remember that the system which has the UPS device connected (via Serial, USB, …) is the master role.

Check the compatibility list and best driver for your device. Some devices are supported by more than one driver, check each one as may be differences on features supported (like bug#266 de driver UPS).

To let upsdrvctl manage the UPS we must configure first the /etc/nut/ups.conf. This file contains a section for each UPS device. Example:

[Ellipse]
    driver = usbhid-ups
    port = auto
    vendorid = 0463
    desc = "Eaton Ellipse 1000"

We can check and test is correctly detected using upsdrvctl:

 # upsdrvctl start

This program will call the needed driver, if you need to debug it better take a look at the backend drivers directly on /lib/nut/

On the system which is the master server, we configure the running MODE on /etc/nut/nut.conf as netserver.

Configure upsd.conf and upsd.users to make upsd accept connections about UPS state and control, start upsd daemon and check

With /etc/nut/upsd.users you configure the users and command privileges will have over the UPS. Is recommended to specify upsmon master or upsmon slave on all users as this will set the minimum required privileges to act as well.

  [admin]
      password = SeCuReD9
      allowfrom = localhost
      actions = set
      instcmds = ALL
      upsmon master

  [node]
      password = client
      allowfrom = 172.16.0.0/24
      action = get
      instcmds = ALL
      upsmon slave

To get a list of values which can be set or get you can run:

# upscmd -l ups
Instant commands supported on UPS [ups]:
beeper.disable - Disable the UPS beeper
beeper.enable - Enable the UPS beeper
beeper.mute - Temporarily mute the UPS beeper
beeper.off - Obsolete (use beeper.disable or beeper.mute)
beeper.on - Obsolete (use beeper.enable)
load.off - Turn off the load immediately
load.off.delay - Turn off the load with a delay (seconds)
load.on - Turn on the load immediately
load.on.delay - Turn on the load with a delay (seconds)
shutdown.return - Turn off the load and return when power is back
shutdown.stayoff - Turn off the load and remain off
shutdown.stop - Stop a shutdown in progress

Start the upsd daemon when finished configuring:

# /etc/init.d/nut-server start

Once the device is configured and the upsd providing access to it, we are going to configure upsmon so clients (and master itself) can query and monitor the status of the power.

Remember to set the running MODE on /etc/nut/nut.conf as netclient on the slave systems and keep it as netserver on master.

upsmon is the client daemon which monitors the power status and allows to configure actions depending on ti in order to react on power events.

As the behaviour of the system must be different if this is a slave (safely shutting down) and for the master (waiting for the rest of systems and cut the power off), There are

Slave

Slave systems only need to do one thing: Safely shutdown when needed. Eventually we may configure other settings like a delay when we are notified, some commands to execute on different power events (better do this on the master), and last but not least monitor the UPS obviously.

The basic configuration neeeded:

MONITOR ups@server 1 user pass slave       # UPS and upsd server, user and passowrd and 
                                           # set we are slave. The 1 min that we depend only on 1 
                                           # power supply
SHUTDOWNCMD "/sbin/shutdown -h now"        # Command to do the shutdown when needed

# Polling a timing
POLLFREQ 10                                # Frequency poll when online
POLLFREQALERT 4                            # Frequency poll when on battery
FINALDELAY 5                               # Delay prior to run SHUTDOWNCMD

Master

As Master will have same configuration, but with slight differences:

NOTIFYCMD /sbin/upssched                   # Command used to notify of power events
MONITOR ups@server 1 user pass master      # UPS and server, upsd user/pass and node role
POWERDOWNFLAG /KillPower                   # Fitxer 'flag' que indicarà al sistema que la parada és d'emergència
HOSTSYNC 60                                # Segons a esperar que els esclaus hagin acabat

# Notify events
NOTIFYFLAG ONLINE   SYSLOG+EXEC            # Alerts
NOTIFYFLAG ONBATT   SYSLOG+WALL+EXEC       # By power status (ONLINE, ONBATT, LOWBATT, ...)
NOTIFYFLAG LOWBATT  SYSLOG+WALL+EXEC       # action to do:
NOTIFYFLAG FSD      SYSLOG+WALL+EXEC       #   SYSLOG - log event
NOTIFYFLAG COMMOK   SYSLOG+EXEC            #   WALL   - warn all
NOTIFYFLAG COMMBAD  SYSLOG+EXEC            #   EXEC   - run NOTIFYCMD
NOTIFYFLAG SHUTDOWN SYSLOG+WALL+EXEC 
NOTIFYFLAG REPLBATT SYSLOG+EXEC
NOTIFYFLAG NOCOMM   SYSLOG+EXEC
  • MONITOR line will specify master and use a user with proper credentials
  • Will use the NOTIFYCMD command to alert about events
  • Set actions to do on each event
  • a HOSTSYNC parameter sets how long two wait for slaves to shutdown
  • POWERDOWNFLAG will mark to the init process that it MUST kill the power on shutdown
  • EXEC flag to run the NOTIFYCMD on events

Is important HOSTSYNC be greater than slaves FINALDELAY value:

HOSTSYNC > FINALDELAY + time needed by slave to properly shutdown

Prior to start tunning and testing is important to clearly understand the sequence of events and processes interactions during a FSD (Forced Shutdown) due a power outage:

  • The Power is off, UPS device change its status from OL (OnLine) to OB (OnBattery)
  • upsmon on master makes its notification events
  • As time passes, the battery wastes until the battery.charge is lower than battery.charge.low
  • UPS device change its status to LB (LowBattery)
  • Master upsmon makes its notification events on new status and send FSD to the slaves. It will wait for HOSTSYNC status seconds before starting the shutdown
  • Slave upsmon see the FSD and act as follows:
    • Generate the event NOTIFYSHUTDOWN
    • Wait FINALDELAY seconds
    • Call SHUTDOWNCMD
    • During the shutdown, upsmon disconnects (so master can see slave is shutting down) and when system down it issues CPU halt waiting for the power cut-off
  • When all slaves disconnected or HOSTSYNC time passwd, master upsmon continue with its process
    • Generate the event NOTIFYSHUTDOWN
    • Wait FINALDELAY seconds
    • Raises the POWERDOWNFLAG
    • Call the SHUTDOWNCMD
    • When init finished stopping services it will check for POWERDOWNFLAG, if active, it will sent the poweroff command to the UPS device to fully cut-of-power.
  • UPS device cuts-off the power output
  • All systems powered off

Please note the critical steps are mainly on the UPS device disconnection.

Master or slave powers off

If the system powers off (instead of halt and sitting on), the last power status will be OFF, and when the power is restored nothing will happen: The system will not boot again and you would have to physcially press the power button.

To solve this, make sure linux is configured to do a halt, and not power off the system. Additionally your BIOS should be configured to recover from previous power status

In Debian, make sure /etc/default/halt set the HALT variable to POWEROFF

Power is restored while master shuttind down

When master started its own FSD, the destinity is thrown: init will carry out the process wether the power returns or not and nut will kill the power.

Here you depend on yur UPS device, but normally they will cut the power when asked to do so weher they are online or not, so you'll get the same behaviour: Power cut-off. In case its online, it will restore the power after a delay so booting again your systems.

Failure killing the power

If for some strange reason (bad communication, driver bug) the master cannot kill the power the init script may reach a shutdown (poweroff) or may be the power restored and we will sit down there waiting for a power off you will never get.

As a matter of help (little as will only apply to master) you can set POWEROFFWAIT to a timeout to wait for the power off, if that timeout passwd and system is still alive it will do a reboot a instead of a halt.

You can set some UPS settings (depending on the driver) with the upsrw command. Some of them:

Setting Description
battery.charge.low Battery level lower limit to set the LowBattery state (LB)
ups.delay.shutdown Delay in seconds before shutting down the power
ups.delay.start Delay in seconds before restore the power when in OnLine status
upsmon -c fsd

Això emularà exactament el procés de caiguda normal del SAI (avís als esclaus i parada controlada)

Un cop configurat, es recomanable fer una sèrie de proves. Aquestes proves tant ens serviran per veure de quin temps disposem en cada situació, com per veure la càrrega que suporta el SAI i verificar el controlador rep la informació adequada del SAI i la resta de màquines també.

Per cada prova convé registrar, almenys, aquests valors:

  • battery.charge
  • battery.runtime
  • ups.load
    • ups.status

    Que podem obtenir amb la comanda upsc <nomdeSAI>.

This verifies that in case the power is restore when all systems are shuting down, the platform will be correctly power-cycled again

  • Plug off the cord and wait for battery depleted.
  • Confirm all slaves shutdown correctly to halt
  • Confirm Master don't start to its own shutdown until slaves finished
  • When you see master it started its shudown process, restore the power
  • Master should continue with the process powering off the UPS
  • The UPS should cut off the power feed and restarted again after a while
  • All systems should boot again normally
  • Plug off the cord and wait for battery depleted.
  • Confirm all slaves shutdown correctly
  • Confirm Master don't start to its own shutdown until slaves finished
  • Confirm that UPS is powered off when master is halted.

With systems halted and powered off from previous system, return the power

  • UPS will change its status, is possible that until battery.charge is not enoguh level won't feed power to systems
  • When UPS restores its power feed output all systems, master and slaves should boot again automatically
  • doc/guides/ups.txt
  • Last modified: 2021/06/10 21:44
  • by 127.0.0.1