Optimize runtime with 100k+ object count

Hi Community,

Working on a recursive crawler. It’s working correctly now, but with medium-big ecom sites it can freeze.
Help me identify optimization points.

I understand if I move parts of the workflow into subroutines, then I can free up memory. However with around 100k items in the flow, it can freeze at some point in the precess. So like with the test site - warhammer.com - I get 50 XMLs and around 35-37 it freezes.

I tried to build in a batcher expecting the objects are being released at the end of a cycle and this way I can reduce the size of the workflow if I control the incoming data in the task loop. However this still clogs around 100k.

Is this an iron issue? I’m on a small size digital ocean droplet. Or is this a frontend issue where my mediocre laptop runs out of recources? I think no tho, even if the ‘test workflow’ approach uses local resources.

Looking forward to some optimization.

BR and :two_hearts:

Flow:

  • n8n version: 1.80.4

  • Database (default: SQLite): supabase/studio 20231123-64a766a

  • n8n EXECUTIONS_PROCESS setting (default: own, main): own

  • **Running n8n via (Docker, npm, n8n cloud, desktop app):**docker

  • Operating system: ubuntu

What are the server specs?

You could and some wait time using a Wait node.
Then the server would have a bit more time between each execution

By the way… If you’re running that in test mode, it is more likely to hang, because data gets stored in your browser.

Try running that in execution mode to truly test its performance.

Hey bro, thanks for stopping by.
1 GB Memory / 1 Intel vCPU / 35 GB Disk / FRA1
Pretty crap currently, no need to scale till I get things running.

However I tried this approach and this went through pretty smoothly:

I guess because it is cleaning the mem for the bigger part of the task after each run.
I will also give a try to execution mode.

1 Like

This is a very interesting case.
Would you mind explaining your chain-of-thought to come up with this solution?

I mean… What made you think that this would be a better optimized logic?

I’m eager to understand how you made this work :muscle:

EDIT: Oh! I actually see it now. It’s almost like you divided the required memory in half. Because you were waiting until everything finished for each item. And now you process items in parallel. Nice!

Ye, I could have been a bit more precise and highlight the sub-workflow better, but all I just moved the active task complitely into a workflow.

An interesting lesson is I was unable to make it work when there were 2 layers of sub-workflows. So when I left the sub-workflow of database insert inside the sub-workflow of the task itself.
But simply moving it into one layer, and boom. It’s fine.

This way with the batcher I can control pretty well the object count and with the task itself the majority of the runtime objects are being deleted at the end of the sub-workflow.