[HELP NEEDED] Webhooks Randomly Stop - Require Workflow Toggle to Resume (Not Sustainable)

Declup · May 23, 2025, 12:15pm

[HELP NEEDED] Webhooks Randomly Stop - Require Workflow Toggle to Resume (Not Sustainable)

Problem Description

I have n8n webhook reliability issues where webhooks randomly stop being processed and require manual workflow toggling to resume. This happens repeatedly and is not manageable for a production environment.

Important Context: I implemented a webhook forwarding architecture (Cloudflare + Upstash) specifically to try to solve n8n’s webhook reliability issues, but the core problem persists.

My Setup

Current Architecture: Zoom → Cloudflare Worker → Upstash Redis → n8n
Why This Architecture: Originally had direct webhooks (Zoom → n8n) but those were unreliable
Hosting: Self-hosted n8n via Coolify
Temporary Fix: Disabling and re-enabling workflows restores webhook flow
Problem: This workaround is not sustainable and doesn’t address the core n8n issue

The Pattern

What Happens:

Webhooks work normally for hours/days
Suddenly stop coming through entirely (no new executions)
Complete silence - not even failed attempts in logs
Toggling workflows off→on immediately fixes it
Cycle repeats unpredictably

Evidence:

Upstash logs show: Successful executions, then complete silence
Cloudflare + Upstash are healthy: Forwarding infrastructure working correctly
n8n is responsive: UI works, other workflows function
Root cause confirmed: n8n stops processing webhooks (this happened before the forwarder too)

Current Workaround (Unsustainable)

When webhooks stop:
1. Disable affected workflows
2. Re-enable workflows  
3. Webhooks immediately resume

This works but requires constant monitoring and manual intervention!

Architecture Details

Current Flow:

Zoom Webhook → Cloudflare Worker → Upstash Redis Queue → n8n Polling

Why This Setup: Originally tried direct webhooks (Zoom → n8n) but n8n kept losing webhooks. Implemented the forwarder as a reliability buffer, but n8n is still the weak link.

Key Issue: The problem is specifically with n8n’s webhook processing - it stops consuming from the queue and requires workflow toggling to resume.

Theories on Root Cause (n8n-Specific)

1. n8n Webhook/Polling Engine Issues

n8n stops polling Upstash Redis queue after time/load?
Webhook processing engine getting stuck internally?
Workflow execution threads dying but not restarting?

2. n8n Database Issues (SQLite)

SQLite locking causing webhook processing to halt?
Database connection issues preventing queue consumption?
Should switch to PostgreSQL for better reliability?

3. n8n Memory/Resource Issues

Memory leaks causing webhook engine to fail?
Resource exhaustion stopping polling threads?
Multiple workflows causing internal conflicts?

4. n8n Internal State Problems

Workflow registration state getting corrupted?
Internal queues/buffers filling up and not clearing?
Thread pool exhaustion in webhook processor?

Note: Cloudflare and Upstash are confirmed working - this is specifically an n8n reliability issue.

What I Need Help With

n8n root cause identification: Why does n8n’s webhook processing randomly stop?
n8n monitoring strategies: How to detect when n8n stops processing webhooks?
n8n configuration fixes: Settings/tweaks to prevent this issue?
Database recommendations: Will PostgreSQL solve this vs SQLite?
Automated n8n recovery: Can I script the workflow toggle workaround?

Environment Details

n8n: Self-hosted via Coolify, default SQLite database
Cloudflare: Worker with webhook forwarding logic
Upstash: Redis as webhook queue/buffer
Multiple workflows: Some sharing webhook endpoints
Load: ~200 webhooks per day

Questions for the Community

Has anyone seen n8n webhook processing randomly stop requiring workflow restarts?
Are there known n8n reliability issues with webhook/polling architectures?
Is SQLite the culprit? Should I switch to PostgreSQL for webhook reliability?
Any n8n configuration tweaks to prevent webhook processing from dying?
Can I monitor n8n’s internal state to detect when webhook processing stops?
Any way to auto-restart workflows when n8n stops processing webhooks?

Context: I already tried working around this with external infrastructure (Cloudflare + Upstash), but the issue is clearly within n8n itself.

This is becoming a critical reliability issue for production use - any insights would be hugely appreciated!

Tags: #webhooks #reliability zoom #cloudflare #upstash #production #debugging

King_Samuel_David · May 23, 2025, 2:45pm

Okay, so nice implementation cloudflare and upstash side, but am wondering whats causing this too, I’ve not faced it, you could setup some alerts, in grafana N8n + Grafana Full Node.js Metrics Dashboard (JSON Example Included!)

say if you notice idleness, am sure you can implement some other methods too checking, but hope the dashboard might help.

Am wondering about ure setup, I see you mentioned sqlite as the db, it could be a bottleneck in the system, and switching to postgres I would recommend yes, my next few questions,

Do you have webhook nodes? and worker nodes? or just single setup atm?

If you system is being overloaded it could bug out webhook processing, and separating the main node, from webhook overloading is possible

Webhook link above

If you still see issues, after sending the traffic to the webhook nodes then it could suggest it’s not just bottleneck situation from single instance, but tbh 200 webhooks calls aday may suggest it’s an error elsewhere, do you see any errors in the logs, around the same time it stops? This would help dig deeper as we may see stacktraces or some error which would help.

You could try enabling debug logs further,

I don’t see this as a common issue in the forum, it could be network side issues too with host. But hopefully the above helps to dig deeper into the issue.

Hope this helps,

pdwarf · May 26, 2025, 8:44am

Hey @Declup, this line made me pause, can you expand on the use case here? Do multiple workflows share the exact same webhook path, ie. do you expect multiple workflows to trigger from just one request?

If so, that might be the source of the issue as webhook paths must be unique per workflow, or else, just the last activated workflow will trigger. This was enforced with a fix in 1.91.0 here.

AI_Blueprint · May 26, 2025, 10:44pm

hello my friend, same is happening with me, when it sudenly stops after working without erorr loginig , since Webhook triggers not reliable after n8n restart
try to dectivate and activate the workflow, is it going to work?

pdwarf · June 2, 2025, 7:15am

@AI_Blueprint can you please share more about your n8n setup? Are you self-hosting or using n8n cloud? What n8n version? Are you talking about a webhook-based trigger for an app or the n8n webhook trigger?

mredodos · June 2, 2025, 7:38am

have you try to set a workflow specific for error? and you are sure the sever have enough memory and cpu for handle all your automation? Are you sure the entire workflow dont go in strange loop behavior?

AI_Blueprint · June 3, 2025, 7:55pm

yes i’m sure its not related to my set up

AI_Blueprint · June 3, 2025, 7:56pm

Here’s my complete setup:

VPS: Hostinger cloud VPS
Processor: 2 vCPUs
OS: Ubuntu 22.04
Docker Compose: Running all services
n8n: Queue mode
1 main (n8n-main)
5 workers (n8n-worker)
2 webhook workers (n8n-webhook-worker)
Redis + PostgreSQL
Reverse proxy: Traefik (Let’s Encrypt SSL)
Secrets: Managed via .env files
No external gateway/queue yet (considering for future scale)

mredodos · June 4, 2025, 7:45am

i say if you set the workflow specif for error for capture error in the workflow.

how many ram do you have?

have you activate task runner ? Task runners | n8n Docs

Derrick_GTC · August 26, 2025, 6:34am

I am having the same issue, did you ever find a solution?

Self hosted, I have a webhook trigger that fires on a Google Chat script. It works for about a day. I’ve noticed sometimes it responds with workflow started (but the workflow does not run) and sometimes it responds with 404 workflow not active.

If I toggle the workflow inactive and back to active it works fine (for another day or so).

King_Samuel_David · September 5, 2025, 11:31am

@Derrick_GTC its always worth looking deeper into n8n logs, u can enable debug also and test the workflow on a seperate isolated instacne if that helps too. But I would check logs around time fails, see if its infra issue or workflow/connection issue. It could be timing out, I would also add some waits just to make sure not calling apis to much, do you see any errors when looking at past executions on the gui?

Samuel

Alex_Galo · September 16, 2025, 3:08pm

Hello.

I am also having the same issue, did you find a solution? I also tried to create a Cron Job that calls an Edge Function in Supabase that makes an request to the n8n API to deactivated it

await axios.request({ method: "POST", url: `https://${instance}.app.n8n.cloud/api/v1/workflows/${workflowId}/deactivate`,headers: {"X-N8N-API-KEY": apiKey}});

and then activate it

await axios.request({method: "POST",url: `https://${instance}.app.n8n.cloud/api/v1/workflows/${workflowId}/activate`,headers: {"X-N8N-API-KEY": apiKey}});

But, still, after 1 day or more, unless is manually deactivated and activated, the webhooks stop working. And since the workflow stops receiving any requests, doesn’t generate any logs whatsoever.

Alex_Galo · October 6, 2025, 2:57pm

I fix this after a long time just by updating to the last version of n8n lol

system · January 4, 2026, 2:58pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.