Most n8n workflows I’ve reviewed in the last few months pass the “happy path” check fine. They break when the unusual happens: a 504 retry, a credential rotation, a third-party API drift.
So I’ve been refining a 6-dimension checklist when I audit a workflow for production-readiness. Sharing it here because I’ve found it useful in conversations — feel free to copy, fork, or steal:
1. Idempotency
Does the workflow handle duplicate inbound events safely?
- Stripe / Shopify / custom webhooks WILL be retried on 504s
- If you don’t dedupe by event ID, you ship duplicate orders / duplicate charges
- Cheapest fix: write the inbound event ID to a dedupe table (or n8n’s Set node if low volume) at the very start
2. Retry strategy
When an external API fails, what happens?
- n8n’s per-node retry is OK for transient failures, but configure it explicitly — defaults are usually too aggressive (5 retries with no jitter = thundering herd against a recovering API)
- For longer backoffs, route failures into a delayed re-trigger workflow
3. Audit trail
Can you reconstruct what happened on any given run, after the fact?
- n8n’s execution log retains 14-30 days by default — depending on whether you need longer, write to your own sink (Postgres, S3, or a logging service)
- Structured records with timestamp + payload hash + outcome let you answer “what happened on order X at 02:14 UTC last Tuesday” in under a minute
4. Secrets handling
Are credentials stored cleanly or pasted in plain text?
- Credentials should live in n8n’s credential vault, not hardcoded in node parameters
- Document the rotation procedure for each credential (when it expires, what to update, what to test)
5. Dead-letter queue (DLQ)
When retry exhausts, where does the payload go?
- Failed payloads should land in a DLQ sub-workflow with the original input preserved
- Replay should be a one-click operation (a manual trigger workflow that re-fires the payload)
- Bonus: alert on DLQ entry — anything landing here means something broke past your retry budget
6. Monitoring hooks
Does anything ping you when the workflow stops working?
- Healthcheck pings on success (Uptime Kuma / healthchecks.io / Better Stack)
- Failure-state alerts to your channel of choice (Slack / email / Telegram)
- Don’t rely on n8n’s UI alerts alone — they go silent when n8n itself is down
If your workflow runs anything that matters — checkout, fulfillment, billing, customer-facing webhooks — and you want to grade yourself against this list, drop the workflow name + your biggest worry in a reply. I’ll point out the first place I’d look.
(Background: I run noorflows — productized n8n consulting. But the checklist above stands on its own; you don’t
need me to use it.)