Built Pulse — a real-time monitoring dashboard for n8n workflows (with AI-powered error explanations)

A few weeks ago, a colleague told me about a restaurant client whose n8n appointment scheduling workflow had been silently failing for over a month. Customers were booking appointments that didn’t exist. By the time anyone noticed, their Google reviews had dropped to 2 stars.
That story stuck with me. n8n doesn’t tell you when something breaks silently — it just shows green.
So I built Pulse.
What it does:
• Connects directly to your n8n instance via API
• Shows all your workflows in real-time with live status
• Visualizes the exact node where execution failed
• Detects silent failures — workflows that run successfully but return 0 results
• Detects stopped workflows — active workflows that missed their scheduled execution
• Sends WhatsApp alerts when something fails, with AI-powered •explanations in plain language (not raw error codes)
• Keeps an incident history so you know what failed, when, and how many times
Example alert you receive on WhatsApp:
:police_car_light: CRITICAL ALERT — Pulse
Flow: Newsletter AI
Node: Google Sheets2
:cross_mark: The workflow couldn’t access Google Sheets because credentials expired
:magnifying_glass_tilted_left: Cause: Google OAuth token expired or revoked
:wrench: Action: Reconnect the Google account in the credentials of node Google Sheets2

Still early stage but working in production against my real n8n instance. Would love feedback from people who’ve felt this pain.

1 « J'aime »

The silent failure detection is the most valuable part here - that’s the category of failure that’s hardest to catch and usually the most costly in production. The restaurant example is a perfect illustration.

One thing worth adding down the road: execution duration drift. If a workflow that normally runs in 2 seconds starts averaging 15 seconds, that’s usually a leading indicator before it hits errors or timeouts. Tracking p95 execution time per workflow and alerting when it deviates significantly would make Pulse even more useful as a proactive tool, not just a reactive one.

That’s a great point — execution duration drift is
exactly the kind of leading indicator that turns
monitoring from reactive to proactive.

I have basic slow execution detection already
(flagging workflows that exceed a fixed threshold),
but tracking p95 per workflow and alerting on
statistical deviation is the natural next step.

Adding it to the roadmap. Thanks for the input —
this is exactly the kind of feedback I was looking for.

One blind spot worth flagging that might not be visible from the n8n execution side: trigger health.

Pulse can see when a workflow executes and fails – that’s the hard part, and you’ve solved it well. But if the inbound trigger goes quiet (webhook endpoint changes, OAuth refresh dies, scheduler misconfigures), n8n has 0 executions to report. Everything looks green. That’s exactly what happened in your restaurant example: if the booking system stopped firing the webhook, Pulse would show “all workflows healthy” while zero appointments were actually going through.

The fix is monitoring the trigger rate independently, from outside the workflow. Simplest form: expect at least N trigger events per day from each source, and alert when the count drops to 0. For appointment scheduling, “zero new bookings in 12 business hours” is an anomalous signal worth a notification even if n8n itself is perfectly healthy.

For phone/SMS integrations especially, a synthetic test event (a daily test call or a test webhook ping from the provider side) that must complete end-to-end catches dead connections before real traffic misses them.

Complementary to what you’ve built – I’d treat trigger-health as the outer shell and execution-health (what Pulse does) as the inner shell. Both gaps need to be closed for a service business to trust the automation fully.

1 « J'aime »

Thanks for the thoughtful feedback — I completely agree that trigger health is another critical layer, especially when no execution is created and everything appears healthy from n8n’s perspective.

My goal with Cuissle is to go beyond simple execution monitoring by helping teams detect different failure types and understand what happened before their clients notice.

I’m currently looking for a small group of early users to test the MVP and share honest feedback. If you’re interested, I’d love to have you on board.

:backhand_index_pointing_right: Early access: https://cuissle.netlify.app/

Note for English speakers: The landing page is currently in Spanish, but I’m actively working on a full English version. Feel free to join the waitlist anyway, and I’ll personally let you know as soon as the English version is ready.

This is the right pain. The extra thing I’d want in the incident view is a recovery receipt: expected trigger/window, last good run, failed node, credential/account involved, action attempted, alert target, and whether downstream actions paused. The AI explanation helps, but the receipt is what lets a client trust the fix.

1 « J'aime »

(post deleted by author)

Hi Ahmad! :waving_hand:
Thanks again for your comment on my post. Your idea about the recovery receipt really stood out to me.

That’s exactly the kind of feedback I’m looking for while building Cuissle. I’m trying to learn from people who have real experience operating n8n in production.

I’d genuinely love to hear more about how you handle incidents today and what information you think is essential for building trust after a failure.
If you’re open to it, I’d love to chat sometime or just continue the conversation here. I think your perspective would be incredibly valuable.

Happy to. The trust-building bits I’d look for after a failure are: what was supposed to happen, last good run, failed node/tool, credential/account used, whether anything downstream already changed, who got alerted, and what got paused or retried.

If you want, send one real Cuissle incident shape here or DM me and I’ll sanity-check what I’d expect in the receipt.

@Samueljesus

This solves a problem I’ve been thinking about a lot lately. The silent failure issue is the one that worries me most when building for clients — n8n showing green while nothing is actually happening is exactly the scenario that damages trust before you even know there’s a problem.

The AI-powered plain English explanation is the bit I find most useful. Clients don’t want raw error logs, they want to know what broke and why in one sentence.

Really solid build. Will be watching this one closely as I scale up client work.

The recovery receipt concept is the right framing. One field worth adding: which outbound actions already fired before the failure.

For service-business workflows, a run often touches external state partway through: a CRM record created, an invoice draft submitted, an SMS sent. If the workflow fails at step 4 of 7, the receipt needs to answer “what is already out in the world” so the retry knows what to skip and the manual fix knows where to resume.

A lightweight pattern: write an audit row BEFORE each outbound call, then update the row status to “complete” after it succeeds. If the workflow dies mid-execution, the sheet shows exactly which steps committed and which did not. This pairs with the trigger-health layer naturally: the outer monitor catches “nothing fired at all,” the execution receipt catches “something fired but failed internally,” and this audit log catches “failed partway through with external side effects already committed.”

That third category is the hardest to recover from cleanly, so it is worth making it explicit in the receipt schema.