#385 Supervise your Home Server with a Watchdog and Heartbeats (Raspberry Pi, ESP8266, Docker)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Home automation systems have to run all the time. Today we will make sure you get an alarm on your Smartphone if something goes wrong with your system. But how can we create an alarm if the server is down? Let s have a closer look. Grüezi YouTubers. Here is the guy with the Swiss accent. With a new episode and fresh ideas around sensors and microcontrollers. Remember: If you subscribe, you will always sit in the first row. Critical systems often use heartbeats or watchdogs for monitoring. We will use both to supervise our private servers like the Raspberry Pi, connections, sensors, and actuators. The principle is simple: If something does not work as expected, we create an alarm via the Telegram app on our Smartphone. But, as said before, we cannot create an alarm on a server that does no more work. So we have to use a watchdog as a supervisor. This watchdog alarms us if something with the server is wrong. So, in this video, we will: - Understand how we can build end-to-end monitoring - Learn what a single point of failure is and how we have to deal with it - Create Node-Red workflows to create alarms if a sensor does no more deliver values or an actuator died or lost connection - Create an expensive watchdog using a Raspberry Pi - Create a cheapo watchdog - And as usual, you will learn some tricks from my lab Let s start with the overview of many of our systems at home: We have sensors that connect, often via Wi-Fi, to a server. Then the messages go to either Node-Red, Home Assistant, or another software to display results and act upon the sensor values. Messages are sent to actuators, and they create physical actions like switching a light off or on. Let s use my awning controller as an example. It has to extend the awning if there is a lot of sun, the temperature is above 22 C, and there is no wind. In all other cases, the awning retracts. Frequent viewers know that I made this controller for my wife. So you can imagine: This is the most critical system in our home. If my Harley does not work, or if my lab lighting does not go on when I enter, who cares? But if the awning does not extend and the house gets hot. Well For this controller, end-to-end is from physical parameters like sunlight, temperature, or windspeed to the physical extension of the awning. If we want to supervise this system from end to end, we have to compare these three parameters with the awning position, which would mean that we need a second system in parallel as a supervisor. Or we build the same system in parallel, which takes over if the first fails like in airplanes. This is called redundancy. NAS systems, for example, also often use RAID controllers for that purpose. These systems use additional disks, which only are needed in case of a failure of the primary disk. But what happens the two systems report different signals? Which one is right? To decide, we have to include a third system. Now they can decide two-to-one. And still, bad accidents can happen, as the example of the 767MAX showed In my case, Wi-Fi, for example, clearly is a single point of failure. If it fails, nothing goes. Or my Raspberry Pi. If it crashes, my controllers do no more work. Or, of course, mains power is a single point of failure, too. As a first step, we have to decide what happens if a system fails and how much effort we want to invest in preventing it. For airplanes, fortunately, they invest a lot. For other systems, we trust that they do not fail, or if so, the effect is minimal. This is with mains in Switzerland. The chance for a power outage is minimal. If it happens, we meet with the neighbors and drink a beer. Because the fridge is still cold, and power usually will be restored before we finished our beers. In the worst case, we have to open a second or third bottle. Wi-Fi is already less stable in my case- Especially with sensors. They sometimes lose connection. All my Home Automation systems are not life-threatening. So I decided to have no redundancy, just an alarm. In most cases, the system can be manually fixed in minutes. Therefore, I want to be alarmed if Ethernet and the internet still work, but either a sensor, a gateway, the server, or an actuator, stops. How can we build such a system? If devices deliver sensor readings in a defined interval, we have no problem. As soon as a sensor stops delivering, we know something is wrong, and we have to create an alarm. This can quickly be done in node-red, for example. We use the timeout node, connect it to the sensor reading we want to supervise, and set its countdown to 2.1 times the expected interval of the sensor. Then, after two lost sensor readings, this node creates an alarm message and sends it via Telegram to my Smartphone. Each sensor in my home has such a timeout node and an alarm. If you want to know how to enable Telegram with node-red, please watch video #270. This system not only supervises the sensor. For example, this sensor transmits its values to a satellite, and I get them via the internet. So you can imagine how much can go wrong. All is covered with this simple timeout node. Here we would set the timeout to 2.1 days, for example, because we expect at least one value per day. But what if our device has no sensor or does not regularly deliver readings like my awning, which works on 433MHz? Frequent viewers remember that I hacked it in video #209 and video #242. Now it is controlled by an ESP, which does not create regular sensor readings. So I had to implement a heartbeat that every minute sends I am ok. But what happens if this Raspberry crashes? If crashed, it cannot send an alarm. So we need a second Raspberry that supervises the main Pi. We call it watchdog. It listens to regular events created by the main Pi. For example, the MQTT broker on this Raspberry creates a lot of messages. Suppose no message is sent for a minute, we know that the system is dead. Without such MQTT messages, we could create a HeartbeatHUB that is sent every minute. On this watchdog Pi, I install only Node-Red. Of course, I use IOTstack to save time. You can also install it barefoot if you prefer. This flow supervises the events of the first Pi using our standard procedure. After one minute without events, a telegram message is created. Cool. Now I am alarmed if my main Raspberry stops working. But what happens if the watchdog Pi crashes? I would not discover the fact until too late. So we need a third Raspberry? And to supervise the third, a fourth? Maybe this is why they invented the Raspberry clusters! Fortunately, this is not needed. We also can create a heartbeat on the second Pi and use the first for supervision. Now I only do not get an alarm if both go down simultaneously or if the network or mains is down enough safety for me. Of course, you find a link to the node-red flows in the video description. But, as usual on this channel, we want more. I do not want to use a Raspberry Pi running the whole year just producing a heartbeat and, very rarely, an alarm. What is the cheapest possibility to get the same effect? An ESP8266-01. The supervisor does not need any display or outside connection. So I wrote a small sketch consisting of merged example files from two libraries: PubSub to create a heartbeat and listen to the Raspberry's incoming MQTT messages and the Universal Telegram Bot Library to create the alarm messages. The logic is elementary. The loop creates a heartbeat MQTT message and listens to the heartbeat from the main Pi. As soon as no heartbeat is heard, an alarm is created. All we need. For a few cents. Let s put it into production. I could use a 5V USB power brick, but I use a simpler method: A Tenstar 3.3 volt power supply and a 1000uF solid polymer capacitor to ensure we have no problem with the current peaks produced by Wi-Fi. Solid Polymer Capacitors are used in quality power supplies because of their low equivalent series resistance or ESR. Just a tiny tip: When comparing different capacitors in my lab, I discovered that these Sanyo caps are fake. These tantalum caps, as well as the solid polymer caps, have the rated capacity, while these fake ones only have 50% or less. So pay attention and check your caps with a simple transistor tester when you get them. I soldered the power supply as well as a header for the ESP-01 on a standard experimental board. Because the ESP-01 has no pull-up resistor for the enable pin, I added one as well as a small capacitor. Like that, it boots when Vcc is already applied. If you use an ESP-12 board, these components are already there. But how do I program my ESP-01s? I use this small board where I added a button switch between GPIO0 and GND. If I want to program the chip, I press the button while I insert it into the USB connector. If I want to run it, I just insert it without pressing the button. Simple and cheap. Today, you even get boards with a built-in button for no additional charge. As the last thing, I adapt the dimensions of the configurable box presented in video #258, create a hole for the cable, and print it. Ready is my watchdog. You can place it wherever you want. It will work as long as it has Wi-Fi and power. What do we have to remember? - Ideally, we create end-to-end monitoring for all our essential systems - After assessing the effects of a failure, we decide which single point of failures we can accept. If we cannot accept them, we have to invest in redundancy. If we can accept a failure, we only have to invest in a supervision system. Or we just accept it as I did with the mains power - We make sure that our sensors regularly transmit values, or we create regular heartbeats for all other systems - Using Node-Red, we can create simple workflows to create alarms for missing values using a timeout and a Telegram node. If a sensor does no more deliver values, or an actuator died or lost connection, an alarm is sent to our Smartphone - Using a second Raspberry, we can create an expensive watchdog to supervise our productive server - To save space, energy, and money, we can create a similar watchdog using an ESP-01 - The ESP-01 can be programmed using a simple and cheap board - Test your electrolytic capacitors when you get them. They may be faked One last thing: You could use an internet service like AWS or Google instead of the ESP-01. This would allow us to monitor our mains power and our internet connection in addition to the rest. Maybe somebody shows us how this can be done? Or even creates a service for that? As always, you find all the relevant links in the description. I hope this video was useful or at least interesting for you. If true, please consider supporting the channel to secure its future existence. Thank you! Bye
Info
Channel: Andreas Spiess
Views: 50,652
Rating: undefined out of 5
Keywords:
Id: IGB2eRvhvB0
Channel Id: undefined
Length: 13min 45sec (825 seconds)
Published: Sun Jun 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.