N8n Raspberry Pi Cluster- Howto

Luke_Austin · November 3, 2023, 10:12am

The idea is:

My business workflows have got VERY elaborate, and running on two servers, i’m getting clashing workflows giving me memory errors.

I saw this:

And would like to build a, say, 8 PI 4b (8gb?) cluster to distribute the flows across multiple boards - has anyone done this and would they be willing to share a howto / tips & tricks?

My use case:

I have written a PIM (Product Information Management) Routine in n8n that gathers disparate product information for 400k skus on a daily basis (some via email, some via FTP, some via POST), updates my local Postgres server with the new information after tidying and normalising it, then pushes it to my CRM (odoo), creating new products, updating existing ones, and archiving obselete ones.

Clashing workflows, running on triggers simultaeonously are beginning to make it a bit unreliable!

I think it would be beneficial to add this because:

It’s cool!
It would be great for the community who may want to try something similar…

Any resources to support this?

Are you willing to work on this?

I’m happy to write it up with my experience - I just need a hand to get going!

Jon · November 3, 2023, 10:22am

Hey @Luke_Austin,

In theory a basic version of this would be to set up docker on all the PIs then follow the normal queue mode setup steps to get it working.

If you took the “Bramble” route things would be a bit more complicated, It sounds like a fun community project for someone to pick up though

Luke_Austin · November 3, 2023, 10:34am

Ace - well that is a starting point. n8n / docker / caddy (confirmed) / and apparently Redis all work on arm so that should be good.

https://doc.n8n.io/hosting/scaling/queue-mode/#start-redis

Is there anything else i need to be aware of? A couple of things off the top of my head…

Do I need to store the flows/credentials on a seperate database (postgres?) or does the main instance send the JSON / Credentials to each worker?
I’ve already switched my sub flows to use the Webhook node for calls, to isolate them and minimise memory issues (it also means I can send whatever JSON I want to the reciever, whereas the Execute Workflow Node doesn’t have an “expression” type field to control the outgoing data - Execute Workflow: Incorporate "Set" Functionality ), would i need to switch them all back to Execute Workflow in this scenario?

Jon · November 3, 2023, 11:17am

Hey @Luke_Austin,

You would need to use an external database, Postgres would be the best option for that Using the webhook node is one option but what I do in my worklfows is add an Edit Fields (formally known as Set) node to my worklfow before the execute workflow node and set my parameters that way.

Luke_Austin · November 3, 2023, 11:31am

Yeah, I did that, but it made my flows a bit messy!

If i stuck with webhook, I suppose I would have a node recieving trigger information / initiating routines, then sending webhooks, and a master node recieving the calls, and distributing to workers from there from there.

1 - 1 - 6, with 8 Pis

Whereas with execute workflow, i could maybe do

1 - 7

Not sure which would be better, especially as I have very heavily nested flows. My inclination would be 1 - 1 - 6.

Jon · November 3, 2023, 11:41am

I would maybe test both approaches, Set up a small scale test with some mock workflows and see which works best for you. It could be that a mix of both approaches is the way forward.