[HELP NEEDED] Webhooks Randomly Stop - Require Workflow Toggle to Resume (Not Sustainable)
Problem Description
I have n8n webhook reliability issues where webhooks randomly stop being processed and require manual workflow toggling to resume. This happens repeatedly and is not manageable for a production environment.
Important Context: I implemented a webhook forwarding architecture (Cloudflare + Upstash) specifically to try to solve n8n’s webhook reliability issues, but the core problem persists.
My Setup
- Current Architecture: Zoom → Cloudflare Worker → Upstash Redis → n8n
- Why This Architecture: Originally had direct webhooks (Zoom → n8n) but those were unreliable
- Hosting: Self-hosted n8n via Coolify
- Temporary Fix: Disabling and re-enabling workflows restores webhook flow
- Problem: This workaround is not sustainable and doesn’t address the core n8n issue
The Pattern
What Happens:
Webhooks work normally for hours/days
Suddenly stop coming through entirely (no new executions)
Complete silence - not even failed attempts in logs
Toggling workflows off→on immediately fixes it
Cycle repeats unpredictably
Evidence:
- Upstash logs show: Successful executions, then complete silence
- Cloudflare + Upstash are healthy: Forwarding infrastructure working correctly
- n8n is responsive: UI works, other workflows function
- Root cause confirmed: n8n stops processing webhooks (this happened before the forwarder too)
Current Workaround (Unsustainable)
When webhooks stop:
1. Disable affected workflows
2. Re-enable workflows
3. Webhooks immediately resume
This works but requires constant monitoring and manual intervention!
Architecture Details
Current Flow:
Zoom Webhook → Cloudflare Worker → Upstash Redis Queue → n8n Polling
Why This Setup: Originally tried direct webhooks (Zoom → n8n) but n8n kept losing webhooks. Implemented the forwarder as a reliability buffer, but n8n is still the weak link.
Key Issue: The problem is specifically with n8n’s webhook processing - it stops consuming from the queue and requires workflow toggling to resume.
Theories on Root Cause (n8n-Specific)
1. n8n Webhook/Polling Engine Issues
- n8n stops polling Upstash Redis queue after time/load?
- Webhook processing engine getting stuck internally?
- Workflow execution threads dying but not restarting?
2. n8n Database Issues (SQLite)
- SQLite locking causing webhook processing to halt?
- Database connection issues preventing queue consumption?
- Should switch to PostgreSQL for better reliability?
3. n8n Memory/Resource Issues
- Memory leaks causing webhook engine to fail?
- Resource exhaustion stopping polling threads?
- Multiple workflows causing internal conflicts?
4. n8n Internal State Problems
- Workflow registration state getting corrupted?
- Internal queues/buffers filling up and not clearing?
- Thread pool exhaustion in webhook processor?
Note: Cloudflare and Upstash are confirmed working - this is specifically an n8n reliability issue.
What I Need Help With
-
n8n root cause identification: Why does n8n’s webhook processing randomly stop?
-
n8n monitoring strategies: How to detect when n8n stops processing webhooks?
-
n8n configuration fixes: Settings/tweaks to prevent this issue?
-
Database recommendations: Will PostgreSQL solve this vs SQLite?
-
Automated n8n recovery: Can I script the workflow toggle workaround?
Environment Details
- n8n: Self-hosted via Coolify, default SQLite database
- Cloudflare: Worker with webhook forwarding logic
- Upstash: Redis as webhook queue/buffer
- Multiple workflows: Some sharing webhook endpoints
- Load: ~200 webhooks per day
Questions for the Community
- Has anyone seen n8n webhook processing randomly stop requiring workflow restarts?
- Are there known n8n reliability issues with webhook/polling architectures?
- Is SQLite the culprit? Should I switch to PostgreSQL for webhook reliability?
- Any n8n configuration tweaks to prevent webhook processing from dying?
- Can I monitor n8n’s internal state to detect when webhook processing stops?
- Any way to auto-restart workflows when n8n stops processing webhooks?
Context: I already tried working around this with external infrastructure (Cloudflare + Upstash), but the issue is clearly within n8n itself.
This is becoming a critical reliability issue for production use - any insights would be hugely appreciated!
Tags: #webhooks #reliability zoom #cloudflare #upstash #production #debugging