N8n crashed silently last night, lost data. need advice on monitoring

hey guys, had a bit of a nightmare situation yesterday.
running n8n on a small vps (digitalocean) and it just… froze? no idea why yet, maybe memory leak.
the problem is i didn’t know about it for like 8 hours. my “Error Trigger” nodes inside n8n obviously didn’t work cause the whole thing was dead lol.
so i missed a bunch of webhooks from clients. not fun.
question: how do you guys monitor this?
uptime robot only checks if the domain is up (which it was, sort of, just timeouts).
i need something that alerts me if a workflow DOESNT finish. like a dead man switch?
tried searching but everything looks super enterprise-y or complicated to set up.
just want a simple “if i dont ping this url in 10 mins → send me a tg message”.
any simple tools for this? or do i have to script something myself?
thx

hello @AndreGle

n8n provides a couple of useful endpoints for monitoring:
/healthz - get basic status of the instance (without checking the db status)
/healthz/readiness - get db status also
Monitoring | n8n Docs

2 Likes

thanks! yeah i know about /healthz, but i still need something external to ping it, right? uptime robot is fine but i want something that integrates with n8n workflow logic too (like monitoring specific critical workflows, not just the whole instance). maybe i’ll just write a script for it.

You can use something light, like uptimekuma
Uptime Kuma - A Fancy Self-Hosted Monitoring Tool

you can also send requests to your webhooks to make sure they are alive, but this will require some workflow changes in the logic to remove the healthchecks from the processing

for internal monitoring in n8n you should use the n8n worflows :slight_smile:

2 Likes

Hello @AndreGle ,

The simplest free tool for this is Healthchecks.io.

  1. Create a Check: It gives you a unique URL (e.g., hc-ping.com/your-uuid) and you set a timer (e.g., “Expect a ping every 15 minutes”).
  2. Add to Workflow: Add an HTTP Request node at the very end of your critical workflow to ping that URL on success.
  3. The Safety Net: If your VPS freezes, the memory leaks, or the database locks up, the ping never happens. Healthchecks notices the silence and sends you a Telegram/Email alert immediately.

This works even if your entire server dies because the monitoring logic lives outside your infrastructure.

best fo luck!

1 Like

thx guys.
looked at kuma but setting up another server just to watch my first server feels like too much work lol.
healthchecks is prob the way to go for now, but man… pasting those UUID links into 50 different workflows is gonna be a pain.

kinda surprised there isn’t a native n8n node for this yet? just wanna drop a ‘Monitor’ node at the end and be done with it.
guess i’m stuck with copy-pasting for now unless i missed some plugin

you don’t need to send all 50 Webhooks. Only one to make sure the n8n instance is alive.
inside each WF you can define an Error workflow which will trigger once there is an error with the execution.
Define an error workflow to handle errors (send them to telegram /email/etc

and then configure all your workflows with webhooks to use that error workflow

2 Likes

You need an external solution, so, you can use Postman for this, if you are familiar with it. you need to:
* setup Postman monitor to act as watchdog.
at the end of each workflow in n8n, create a “heartbeat” as a simple HTTP request that pings a URL or call Telegram (it’s a node).
In postman you should write a test script. this one will check when the last workflow ran. put the rules, so, Postman sends a Telegram message to alert you.
in postman you can schedule monitor to run every few minutes. that way if your n8n worklfow freezes it will notice the missing heartbeat and immediately send a notification.

**
Its a lightweight solution, but you should be familiar with Postman, and the automation in it… **
***
in your n8n workflow:

// n8n Function Node: Heartbeat
return [
  {
    json: {
      last_run_time: new Date().toISOString()
    }
  }
];

Then add a HTTP Request node… if you like.
*********
in postman, write such code, there is an AI assistant, in the test tab:

// parse the response from n8n heartbeat
let lastRun = pm.response.json().last_run_time;
let now = new Date();
let lastRunDate = new Date(lastRun);

// define how long without a heartbeat is considered "problematic"
let thresholdMinutes = 10;

if ((now - lastRunDate) / 60000 > thresholdMinutes) {
    // workflow didn't finish on time, send Telegram alert
    pm.sendRequest({
        url: `https://api.telegram.org/bot<YOUR_BOT_TOKEN>/sendMessage`,
        method: 'POST',
        header: 'Content-Type: application/json',
        body: {
            mode: 'raw',
            raw: JSON.stringify({
                chat_id: '<YOUR_CHAT_ID>',
                text: `⚠️ Workflow has NOT run in the last ${thresholdMinutes} minutes!`
            })
        }
    });
}


The last step is:
Schedule the Postman Monitor, it’s easy…***

1 Like