If your VPS has ever stopped working abruptly, then you know how frustrating it can be at times. I have had days where my service crashed, and I thought to myself, “Why can’t this just fix itself?” This is where auto-healing scripts started to make my life a lot easier. These scripts keep an eye on your server and correct simple failures automatically.
Auto-healing allows your server to:
Auto-healing scripts will continue to check your service every few seconds or so to ensure it is still running, and if it stops, it will start the service again.
An auto-healing script generally does the following:
Building an auto-healing solution with Systemd is very easy and only requires a couple of lines in your service file:
A sample script could check:
“if ! pgrep nginx > /dev/null; then
systemctl restart nginx
fi”
Nice and clean.
Test your script- Sometimes small slips can cause big headaches when the server is under load.
Keep it light - Heavy scripts can bog down the server.
Set up alerts - A simple email or message can alert you before the problem is a problem.
Ever have just one alert save you from a large issue? It happens all the time, even if you don't think so.
Why Auto-Healing Scripts Matter
I use auto-healing scripts because they save so much time. Have you ever thought about how much downtime occurs just because one service neglected to start for no reason? I once thought to myself, "I need to get a better system," and later implemented auto-healing.Auto-healing allows your server to:
- Restart services when they crash
- Minimize downtime
- Remedy failures before your users see it
How Auto-Healing Works
Service CheckingAuto-healing scripts will continue to check your service every few seconds or so to ensure it is still running, and if it stops, it will start the service again.
An auto-healing script generally does the following:
- Checks the service is running
- Restarts the service if it stops running
- Logs the error
- Sends an alert (if you want)
Using Systemd to Build and Auto-Healing Solution
Building an auto-healing solution with Systemd is very easy and only requires a couple of lines in your service file:
- [Service]
- Restart=always
- RestartSec=5
Building a Healing Script by Yourself
For others, you may want more control with a healing script. I use Bash for ease of use and extensibility.A sample script could check:
- CPU load
- Memory load
- Response time
- Whether the service is running
“if ! pgrep nginx > /dev/null; then
systemctl restart nginx
fi”
Nice and clean.
Useful tips for auto-healing
Keep a log of everything - Helpful when you want to interfere and see what happened.Test your script- Sometimes small slips can cause big headaches when the server is under load.
Keep it light - Heavy scripts can bog down the server.
Set up alerts - A simple email or message can alert you before the problem is a problem.
Ever have just one alert save you from a large issue? It happens all the time, even if you don't think so.