Sometimes the Raspberry Pi hangs for all sorts of unforeseen reasons. When I have setup a Pi a long way from home it can be very annoying to have to physically power cycle it. This is where the hardware watchdog comes in.
A watchdog is a piece of hardware (or software in some cases) that waits for an amount of time, if nobody has “stroked” the watchdog it will bark and cause the system to restart.
This post assumes that you are using Rapsbian on your Pi.
How To
Load the kernel module now so we can play around with it, issue the following command:
This will not load the module on boot for that we need to modify the /etc/modules file by adding “bcm2708_wdog”, to add this run the following command.
At this point the watchdog is all loaded will not be doing anything; it is sleeping. The first stroke will wake the watchdog and from then on it will need frequently stroking.
Stroking using a Daemon
This is useful if all you concerned about is Linux locking up. Maybe you monitor your scripts by other means.
Install the watchdog daemon:
Load the daemon on boot:
Now configure the watchdog daemon by editing /etc/watchdog.conf
:
In here we will enable to watchdog and configure the workload level at which to reboot the Pi.
Uncomment the line that starts with #watchdog-device
by removing the hash (#
) this allows the watchdog daemon to start stroking the watchdog.
Another interesting setting is #max-load-1 = 24
this setting equates to the number of Raspberry Pis (24 in this case) it would require to complete a task within 1 minute, if the number of Pis exceeds this amount the Pi will reboot. This should prevent high workloads dragging your Pi to its knees.
Uncomment #max-load-1 = 24
removing the hash, by default this is set to 24. This setting can also be disabled by setting this value to 0
. Be warned - setting this value too low will cause lots of rebooting!
To start the daemon running use the following command:
Stroking from Python
If you want to watch a script and reboot if things go bad you could stroke from within your application. I use this method for long running scripts that I never want to lock up.
Here is some example Python functions:
This combined with a start script on boot should make your application pretty resilient.