UPS management on Linux
Introduction
Requirements and checklist
This is a list of requirements for proper function of all the scenario as well as some advices
- Systems which will be powered by the UPS must configure BIOS to boot again after a power failure.
- Linux must do a halt instead of poweroff when is doing the shutdown. You can configure this on
HALT
variable in/etc/default/halt
- Is a good practice to first make all the configuration without really powering the system from the UPS so you can shutdown it safely
- Make a timing on how much time need each system to shutdown to safely state
Important system roles
Slave
Is the system which is powered by the UPS but has no control over the device. It will run the upsmon daemon which queries the Master to know the power status and events and act upon them.
Master
The system which has the direct access and control over the UPS device (is attached to it).
It runs the upsdrvctl to manage it and the upsd daemon to provide UPS status information to connected systems (slaves) as it probably also depend on the UPS power will be also running upsmon as a slave (but with slightly different, and signifcant configuration)
The Master system is responsible to notify all slaves in case of power outage and wait for them to set up accordignly (shutting down), and at last is the responsible to cut the power completely down so when the power is restores all systemes boot again.
Programs
There are various daemons and programs involved on the architecture that you should know their role:
Program | Configuration | System | Role |
---|---|---|---|
upsdrvctl | ups.conf | Master | Device driver |
upsd | upsd.conf,upsd.users | Master | Comuncating with driver and the slaves |
upsmon | upsmon.conf | Master and Slave | The NUT client who queries the master to be aware of the power status and events |
upscched | upssched.conf | Master | Power Event management scheduler |
Configuring
The device
Remember that the system which has the UPS device connected (via Serial, USB, …) is the master role.
Check the compatibility list and best driver for your device. Some devices are supported by more than one driver, check each one as may be differences on features supported (like bug#266 de driver UPS).
To let upsdrvctl manage the UPS we must configure first the /etc/nut/ups.conf. This file contains a section for each UPS device. Example:
[Ellipse] driver = usbhid-ups port = auto vendorid = 0463 desc = "Eaton Ellipse 1000"
We can check and test is correctly detected using upsdrvctl:
# upsdrvctl start
This program will call the needed driver, if you need to debug it better take a look at the backend drivers directly on /lib/nut/
Master UPS server
On the system which is the master server, we configure the running MODE on /etc/nut/nut.conf
as netserver.
Configure upsd.conf and upsd.users to make upsd accept connections about UPS state and control, start upsd daemon and check
With /etc/nut/upsd.users
you configure the users and command privileges will have over the UPS. Is recommended to specify upsmon master
or upsmon slave
on all users as this will set the minimum required privileges to act as well.
[admin] password = SeCuReD9 allowfrom = localhost actions = set instcmds = ALL upsmon master [node] password = client allowfrom = 172.16.0.0/24 action = get instcmds = ALL upsmon slave
To get a list of values which can be set or get you can run:
# upscmd -l ups Instant commands supported on UPS [ups]: beeper.disable - Disable the UPS beeper beeper.enable - Enable the UPS beeper beeper.mute - Temporarily mute the UPS beeper beeper.off - Obsolete (use beeper.disable or beeper.mute) beeper.on - Obsolete (use beeper.enable) load.off - Turn off the load immediately load.off.delay - Turn off the load with a delay (seconds) load.on - Turn on the load immediately load.on.delay - Turn on the load with a delay (seconds) shutdown.return - Turn off the load and return when power is back shutdown.stayoff - Turn off the load and remain off shutdown.stop - Stop a shutdown in progress
Start the upsd daemon when finished configuring:
# /etc/init.d/nut-server start
UPS clients
Once the device is configured and the upsd providing access to it, we are going to configure upsmon so clients (and master itself) can query and monitor the status of the power.
Remember to set the running MODE on /etc/nut/nut.conf
as netclient on the slave systems and keep it as netserver on master.
upsmon is the client daemon which monitors the power status and allows to configure actions depending on ti in order to react on power events.
As the behaviour of the system must be different if this is a slave (safely shutting down) and for the master (waiting for the rest of systems and cut the power off), There are
Slave
Slave systems only need to do one thing: Safely shutdown when needed. Eventually we may configure other settings like a delay when we are notified, some commands to execute on different power events (better do this on the master), and last but not least monitor the UPS obviously.
The basic configuration neeeded:
MONITOR ups@server 1 user pass slave # UPS and upsd server, user and passowrd and # set we are slave. The 1 min that we depend only on 1 # power supply SHUTDOWNCMD "/sbin/shutdown -h now" # Command to do the shutdown when needed # Polling a timing POLLFREQ 10 # Frequency poll when online POLLFREQALERT 4 # Frequency poll when on battery FINALDELAY 5 # Delay prior to run SHUTDOWNCMD
Master
As Master will have same configuration, but with slight differences:
NOTIFYCMD /sbin/upssched # Command used to notify of power events MONITOR ups@server 1 user pass master # UPS and server, upsd user/pass and node role POWERDOWNFLAG /KillPower # Fitxer 'flag' que indicarà al sistema que la parada és d'emergència HOSTSYNC 60 # Segons a esperar que els esclaus hagin acabat # Notify events NOTIFYFLAG ONLINE SYSLOG+EXEC # Alerts NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC # By power status (ONLINE, ONBATT, LOWBATT, ...) NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC # action to do: NOTIFYFLAG FSD SYSLOG+WALL+EXEC # SYSLOG - log event NOTIFYFLAG COMMOK SYSLOG+EXEC # WALL - warn all NOTIFYFLAG COMMBAD SYSLOG+EXEC # EXEC - run NOTIFYCMD NOTIFYFLAG SHUTDOWN SYSLOG+WALL+EXEC NOTIFYFLAG REPLBATT SYSLOG+EXEC NOTIFYFLAG NOCOMM SYSLOG+EXEC
- MONITOR line will specify master and use a user with proper credentials
- Will use the NOTIFYCMD command to alert about events
- Set actions to do on each event
- a HOSTSYNC parameter sets how long two wait for slaves to shutdown
- POWERDOWNFLAG will mark to the init process that it MUST kill the power on shutdown
- EXEC flag to run the NOTIFYCMD on events
Is important HOSTSYNC be greater than slaves FINALDELAY value:
Tunning
Shutdown sequence
Prior to start tunning and testing is important to clearly understand the sequence of events and processes interactions during a FSD (Forced Shutdown) due a power outage:
- The Power is off, UPS device change its status from OL (OnLine) to OB (OnBattery)
- upsmon on master makes its notification events
- As time passes, the battery wastes until the battery.charge is lower than battery.charge.low
- UPS device change its status to LB (LowBattery)
- Master upsmon makes its notification events on new status and send FSD to the slaves. It will wait for HOSTSYNC status seconds before starting the shutdown
- Slave upsmon see the FSD and act as follows:
- Generate the event NOTIFYSHUTDOWN
- Wait FINALDELAY seconds
- Call SHUTDOWNCMD
- During the shutdown, upsmon disconnects (so master can see slave is shutting down) and when system down it issues CPU halt waiting for the power cut-off
- When all slaves disconnected or HOSTSYNC time passwd, master upsmon continue with its process
- Generate the event NOTIFYSHUTDOWN
- Wait FINALDELAY seconds
- Raises the POWERDOWNFLAG
- Call the SHUTDOWNCMD
- When init finished stopping services it will check for POWERDOWNFLAG, if active, it will sent the poweroff command to the UPS device to fully cut-of-power.
- UPS device cuts-off the power output
- All systems powered off
Critical steps
Please note the critical steps are mainly on the UPS device disconnection.
Master or slave powers off
If the system powers off (instead of halt and sitting on), the last power status will be OFF, and when the power is restored nothing will happen: The system will not boot again and you would have to physcially press the power button.
To solve this, make sure linux is configured to do a halt, and not power off the system. Additionally your BIOS should be configured to recover from previous power status
/etc/default/halt
set the HALT variable to POWEROFF
Power is restored while master shuttind down
When master started its own FSD, the destinity is thrown: init will carry out the process wether the power returns or not and nut will kill the power.
Here you depend on yur UPS device, but normally they will cut the power when asked to do so weher they are online or not, so you'll get the same behaviour: Power cut-off. In case its online, it will restore the power after a delay so booting again your systems.
Failure killing the power
If for some strange reason (bad communication, driver bug) the master cannot kill the power the init script may reach a shutdown (poweroff) or may be the power restored and we will sit down there waiting for a power off you will never get.
As a matter of help (little as will only apply to master) you can set POWEROFFWAIT to a timeout to wait for the power off, if that timeout passwd and system is still alive it will do a reboot a instead of a halt.
UPS settings
You can set some UPS settings (depending on the driver) with the upsrw command. Some of them:
Setting | Description |
---|---|
battery.charge.low | Battery level lower limit to set the LowBattery state (LB) |
ups.delay.shutdown | Delay in seconds before shutting down the power |
ups.delay.start | Delay in seconds before restore the power when in OnLine status |
Prova de parada
upsmon -c fsd
Això emularà exactament el procés de caiguda normal del SAI (avís als esclaus i parada controlada)
Testing
Un cop configurat, es recomanable fer una sèrie de proves. Aquestes proves tant ens serviran per veure de quin temps disposem en cada situació, com per veure la càrrega que suporta el SAI i verificar el controlador rep la informació adequada del SAI i la resta de màquines també.
Per cada prova convé registrar, almenys, aquests valors:
- battery.charge
- battery.runtime
- ups.load
- ups.status
Que podem obtenir amb la comanda
upsc <nomdeSAI>
.
Test power recovery during shutdown
This verifies that in case the power is restore when all systems are shuting down, the platform will be correctly power-cycled again
- Plug off the cord and wait for battery depleted.
- Confirm all slaves shutdown correctly to halt
- Confirm Master don't start to its own shutdown until slaves finished
- When you see master it started its shudown process, restore the power
- Master should continue with the process powering off the UPS
- The UPS should cut off the power feed and restarted again after a while
- All systems should boot again normally
Test power outage
- Plug off the cord and wait for battery depleted.
- Confirm all slaves shutdown correctly
- Confirm Master don't start to its own shutdown until slaves finished
- Confirm that UPS is powered off when master is halted.
Power restore
With systems halted and powered off from previous system, return the power
- UPS will change its status, is possible that until battery.charge is not enoguh level won't feed power to systems
- When UPS restores its power feed output all systems, master and slaves should boot again automatically