I’m working on setting up point-in-time restore for my n8n instance and wanted to ask if anyone here has done something similar — or has tips to share.
Here’s what I’m doing so far:
PostgreSQL is the backend DB.
I’m planning to enable WAL (Write-Ahead Logging) in Postgres.
I’m using Restic for backing up the ~/.n8n folder.
I’m currently taking frequent snapshots:
Every 1 minute for the past hour
Then every 30 minutes for 1 day
Then gradually spaced out over time
Anything on Redis side?
My question is:
Has anyone successfully implemented a point-in-time restore setup for n8n (or similar)?
Anything else I should be considering, especially around consistency between the DB, Redis, and the .n8n folder?
Additional note on recovery strategy:
My current recovery plan is to:
Use the timestamp from the Restic snapshot as the reference point.
Restore PostgreSQL to that same timestamp using WAL + base backup.
Rebuild the Redis queue/memory by:
Checking logs for the last execution ID or timestamp around that snapshot.
Replaying or re-queuing any external calls (webhooks, messages, etc.) that hit n8n after that point but were lost during the rollback.
The idea is to have a clean, coordinated state across .n8n, Postgres, and Redis. But curious if anyone has done this in production or has automation tips to streamline this process.
Would love to hear how others have approached this — tools, strategies, pitfalls, etc.
You don’t need to back up the .n8n folder frequently. Moreover, you need to back it up only once, as it contains only the secret key. There are no more important files there.
What is the point of restoring the whole n8n to some point in time?
@barn4k Hey thanks for replying, yes, am noticing this unless you store anything there but in my case I don’t atm, and I have my enc key on my yml file, this makes life much simpler. thanks for confirming I wasn’t 100 percent sure.
So point in time i’m planning a pre-disaster recovery, in this I want to break the functionality of n8n, while processing transactions / KYC checks etc, but because it may be high throughput alot of executions, I need to ensure that in case of an emergency, I can easily recover.
This mean reprocessing potential data that came in via webhooks etc. I need to ensure it’s very reliable and downtime is recoverable in most circumstances, this is kind of a last resort, quick restore with data intact, compared to just redeploy but lose data consistency.
I hope it never happens but trying to plan ahead and avoid it from happening.
Anyway, if you have important stuff, like transactions, it’s better to place the pending ones in the message broker (like RabbitMQ, AmazonMQ, Redis, etc) and remove them only when the execution is finished.
@barn4k Thanks bro, I’ll likely be using Kafka and you’re right, I probably don’t need to worry much about backing up n8n itself, since it’s mainly handling orchestration. If something fails, I can redeploy n8n (I’ll probably use Kubernetes for orchestration anyway) and have it pick up where it left off, as long as the event stream (Kafka) is intact. That’s got me rethinking the whole setup, so thanks for the nudge!
My main concern is point-in-time restore and ensuring data integrity. If the system goes down, I need a reliable way to restore to a consistent state, ideally by replaying events from Kafka or restoring the DB to a specific point. The issue is, if n8n is offline for too long, Kafka could build up a massive backlog — and if the consumer can’t catch up fast enough, it might overwhelm Kafka or lead to retention problems.
Also, since I’ll be processing high-value transactions (like trades or KYC events), I need to make sure all database updates happen in a fully ACID-compliant way Atomicity, Consistency, Isolation, Durability so I never miss or duplicate a transaction. A single missed trade could mess up everything and potentially cost thousands, so I want to be sure the workflow is bulletproof. Maybe n8n can handle that with careful design, but I’m still figuring out the best approach.
I’m still in the planning phase, but want to cover these failure and recovery scenarios from the start.
One more question, is it worth processing transactions strictly one by one, or is that overkill? I get that n8n’s real value is in transforming data within the workflow; otherwise, if I’m just listening for events and dumping to a DB, I might as well just write a script for that. Maybe it’s better to transform once stored to db from kafka in batches. Hmm really got me thinking
Well, you can process more than one transaction at a time, but keep an eye on the CPU, as n8n utilizes only one core…
As for the design, I don’t have much experience in building a high-loaded process, but it seems there is a need in the DB cluster (maybe geo?) with transaction states and storage, n8n in the queue mode (k8s or docker-compose) and ALBs in-between