Error to import 500K rows to database. How to do?

vdeapps · May 22, 2021, 7:33am

Hello to all,

I discovered n8n a short time ago, and I immediately saw my interest in this product.
I regularly have automatic tasks to perform like syncing data from various sources to multiple databases.
So far I have built quite complex workflows (with code) to do this work.

But now I’m stuck! Not because of the logic of the workflow to implement but because of the memory used.

Let me explain, I have to read (every day) a CSV file of 500 000 lines that I have to integrate in a database.
Doing it in one pass was not very successful.
My plan “B” was to split this file into several files with fewer lines. This list of files is then injected into the SplitInBatches component which is responsible for sending the file name to another workflow (containing the data import).
In the logic everything works fine except that as the loops go on the memory load keeps increasing. I thought that the RAM would be freed up as we go along, but no.

Have you done a similar case, which would be my plan “C”?

Thank you all for reading

jan · May 22, 2021, 12:52pm

Sorry to hear that you have problems! To get that fixed two steps are important:

Understand how n8n works
Data in an execution does never get “freed up”. Once it is loaded it will stay there until the workflow execution is finished. That said, we have planned to change that in the future. It will get a mode where data does get freed up after every node if gets enabled. This will then obviously come with a big performance hit but for such use cases, it should however not matter if an execution is slower, as long as it runs successfully. After that, your plan “A” should also work. Sadly no ETA for it.
Using that understanding to fix the workflow
You have to make sure that the data does never end up in an execution. Your plan “B” should theoretically work totally fine. But I assume that the data of each of the iterations ends up again in the main worflow. Most likely because the last node of the second workflow (which does the import) still contains the whole data. If that is the case it will then send this whole data back to the main workflow. So the solution would be to make sure that the last node sends back as little data as possible. Best a single empty JSON. The easiest and most efficient way to do that, is adding a Set-Node two the second workflow. You have to make sure to activate the option “Keep Only Set” and under “Settings” additionally “Execute Once”.

Hope that helps!

Example Set-Node:

vdeapps_vdeapps · May 22, 2021, 2:32pm

Thanks to answer.
currently, I use plan “B” with a workflow to split and use the SplitInBatches and a workflow to insert data to my table. See pictures.
In the workflow I already return a empty function to clear output data but I have a same problem.

And the sub workflow

After a insert/update, n8n must implements an option that return a custom function againts return all rows to output.

jan · May 22, 2021, 2:36pm

Sorry, I sadly do not understand. What do you mean with “After a insert/update, n8n must implements an option that return a custom function againts return all rows to output.”

Miquel_Colomer · May 22, 2021, 3:15pm

Hi @vdeapps_vdeapps

I did several tests with n8n sometime ago with huge amounts of data.

This is not recommended because n8n was created with another goal. For small iterations (say less than 5000 rows) n8n could achieve what you are looking for.

But if you want to work with those amounts of data, I recommend using another option like etl/elt software or other projects like https://meltano.com/

Hope this helps

vdeapps · May 22, 2021, 3:18pm

Sorry, It was an idea that n8n can add

vdeapps · May 22, 2021, 3:25pm

Ok 500k rows are big but I splitted this Big file in many files with 10k rows.
I think n8n can add an option to clear memory After a workflow to fix this memory leak. I don’t know if it is possible

Miquel_Colomer · May 22, 2021, 3:44pm

Yes. That could be an option.

You need to discard saving history data (purging it with an environment variable when running n8n).

Additionally, you can try to execute the workflow outside docker by terminal to improve performance.

This is not tested by myself, but it makes sense to reduce memory usage and increase computing ratio (a host is always faster).

jan · May 22, 2021, 4:41pm

Memory does already get cleared after a workflow did run. Not aware of any memory leaks.

vdeapps · May 22, 2021, 5:38pm

I have some other workflow which does the same thinks but with less rows and it’s ok.

I’ll make another algo to do my Big import. A cron which import 3 files by 3 files and delete previous

vdeapps · May 22, 2021, 5:53pm

Ok I’ll try it

vdeapps_vdeapps · May 22, 2021, 6:35pm

This this the error message:
<— Last few GCs —>

[27:0x55c591dc1f80]   339627 ms: Mark-sweep (reduce) 2042.7 (2056.5) -> 2042.2 (2057.0) MB, 1300.6 / 0.1 ms  (average mu = 0.189, current mu = 0.004) allocation failure scavenge might not succeed
[27:0x55c591dc1f80]   340965 ms: Mark-sweep (reduce) 2043.2 (2054.0) -> 2042.8 (2055.5) MB, 1334.6 / 0.1 ms  (average mu = 0.102, current mu = 0.003) allocation failure scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory

georgejeffcock · May 26, 2021, 12:09pm

often and frequently as the saying goes in integration. most ipaas struggle under volume because they are not designed to be database migration tools…they are work optimally using deltas