Crazy lag when scraping a lot of data in two different flows at the same time. I keep running into the entire n8n crashing or n8n sending emails saying they have deactivated my workflows. This only happens for the flows that run Apify and scrape data in the thousands or even 10k records.
Is this a limitation of n8n?
I’ve been using ChatGPT to understand this problem, and it seems to be telling me that the issue is that n8n is always running everything in memory.
For example, I have one workflow processing 5,000 records, and then another two doing the same, all running at the same time.
If that’s the case, will a higher plan even help, like Enterprise, or even jumping over to the highest Hostinger VPS plan?
For example:
8 vCPU cores
32 GB RAM
400 GB NVMe disk space
32 TB bandwidth
What is the error message (if any)?
You’ve run out of memory, or your workflow has crashed repeatedly.
Please share your workflow
(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)
The short answer is yes, there is definitely a limitation with processing large amounts of data using n8n. This is mostly due to n8n being based on nodejs and javascript, which are not technologies known for the fastest data handling and memory efficiency out there.
N8n also have a limitation on concurrent workflows running at the same time. See below documentation.
Having said all that, there are potentially some architectural design decisions one can make to help process large datasets for efficiently. Remember also the size of each item in your dataset will have an impact.
You didnt share your current setup, however enabling queue mode could potentially have a positive impact on performance. There is a benchmarking tool n8n provides for your to test your setup.
Have a look at the video below to start with and maybe if you can share your workflow and your current setup, it will help getting a better idea of your current setup.
Remember, throwing more cpu and memory doesnt automatically make anything process faster if you didnt build your solution with efficiency in mind.
Oh man, that’s out of my pay grade as Im not sure how the n8n team hosts the cloud instances in the back. It is probably in that case already setup in queue mode and optimised properly.
So, the next step would be to dive deeper into your workflows and the data you’re trying to process so we can see how to enhance your workflows. Are you able to elaborate more on one of your workflows giving issues
So for now I’ve kinda fixed it by lowering the amount of data that it’s running in each flow, and also having other flows control the number of flows running that require heavy memory usage. So I’m not really running into any more issues.
The thing is, what if we want to build more large-scale scrapers? Because right now we’re just chunking down workflows and making sure automations hold less than 100 items. It works, but it’s not super efficient all the time. But yeah, I’ll just have to think about this.
It’s difficult to say as it all depends on the complexity, volume and size of the data you need to process and where those potential bottlenecks might be. I’m open if you need a consult on one of these workflows as there are a number of ways to make such processing more efficient. Even for your current flows as you’re not immune yet incase your volumes increase. It could take just one record being unexpectedly larger than the rest to throw you back with memory issues