A few weeks ago, a colleague told me about a restaurant client whose n8n appointment scheduling workflow had been silently failing for over a month. Customers were booking appointments that didn’t exist. By the time anyone noticed, their Google reviews had dropped to 2 stars.
That story stuck with me. n8n doesn’t tell you when something breaks silently — it just shows green.
So I built Pulse.
What it does:
• Connects directly to your n8n instance via API
• Shows all your workflows in real-time with live status
• Visualizes the exact node where execution failed
• Detects silent failures — workflows that run successfully but return 0 results
• Detects stopped workflows — active workflows that missed their scheduled execution
• Sends WhatsApp alerts when something fails, with AI-powered •explanations in plain language (not raw error codes)
• Keeps an incident history so you know what failed, when, and how many times
Example alert you receive on WhatsApp:
CRITICAL ALERT — Pulse
Flow: Newsletter AI
Node: Google Sheets2
The workflow couldn’t access Google Sheets because credentials expired
Cause: Google OAuth token expired or revoked
Action: Reconnect the Google account in the credentials of node Google Sheets2
Still early stage but working in production against my real n8n instance. Would love feedback from people who’ve felt this pain.
1 Like
The silent failure detection is the most valuable part here - that’s the category of failure that’s hardest to catch and usually the most costly in production. The restaurant example is a perfect illustration.
One thing worth adding down the road: execution duration drift. If a workflow that normally runs in 2 seconds starts averaging 15 seconds, that’s usually a leading indicator before it hits errors or timeouts. Tracking p95 execution time per workflow and alerting when it deviates significantly would make Pulse even more useful as a proactive tool, not just a reactive one.
That’s a great point — execution duration drift is
exactly the kind of leading indicator that turns
monitoring from reactive to proactive.
I have basic slow execution detection already
(flagging workflows that exceed a fixed threshold),
but tracking p95 per workflow and alerting on
statistical deviation is the natural next step.
Adding it to the roadmap. Thanks for the input —
this is exactly the kind of feedback I was looking for.
One blind spot worth flagging that might not be visible from the n8n execution side: trigger health.
Pulse can see when a workflow executes and fails – that’s the hard part, and you’ve solved it well. But if the inbound trigger goes quiet (webhook endpoint changes, OAuth refresh dies, scheduler misconfigures), n8n has 0 executions to report. Everything looks green. That’s exactly what happened in your restaurant example: if the booking system stopped firing the webhook, Pulse would show “all workflows healthy” while zero appointments were actually going through.
The fix is monitoring the trigger rate independently, from outside the workflow. Simplest form: expect at least N trigger events per day from each source, and alert when the count drops to 0. For appointment scheduling, “zero new bookings in 12 business hours” is an anomalous signal worth a notification even if n8n itself is perfectly healthy.
For phone/SMS integrations especially, a synthetic test event (a daily test call or a test webhook ping from the provider side) that must complete end-to-end catches dead connections before real traffic misses them.
Complementary to what you’ve built – I’d treat trigger-health as the outer shell and execution-health (what Pulse does) as the inner shell. Both gaps need to be closed for a service business to trust the automation fully.