Large (CSV) Files, streaming, Looping, batching - all does not work

Describe the problem/error/question

i need to process data from a larger CSV file (80MP ~ 100k entries)
After the processing resulted in a shutdown of my account and the lack of a streaming function, i decided to iteratively process the data with an IF loop and the “start line” feature.
THe issue is:
a: that after 21 Loops the flow still stops and before that slows down.
b: doing it this way, i need to download the file each time in the flow.

so all options are highly frustrating.
i am in the process of moving from synesty to n8n, and am ready to do whatever is needed for this to work.

Thank you for your help
Seb

Please share your workflow

Information on your n8n setup

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • *Running n8n via (Docker, npm, n8n cloud, desktop app): CLOUD
  • Operating system:

Since you are able to read the entire file (into memory), I’m guessing it’s actually the processing-of each batch/item that is adding too much more “in memory” demand.

Have you tried moving the processing of each batch into a sub-workflow? That may work to limit how much data is in memory at one time. In other words, because a batch is processed in a separate workflow, it can finish/unload before the next one starts, and only 1 batch worth will be in memory at a time.

You can even define everything in one workflow item that “calls itself” (sorta).

Thank you for your suggestion @hubschrauber
the issue happens in the “extract from file” here the connections gets lost. that is why i cannot extract the full file but rahter need to extract only a subset. 1-1000.
in order to do that i need to include the “extract from file” inside the loop and also the download (because the data needs to be in the node before this).
its a bit problematic, running into these issues but of course i am still considering myfelf a beginner.
i am a bit lost here.
if you have another idea i would appreciate
Seb

So it sounds like the problems start when you read far enough into the CSV file that n8n can’t/won’t pull it all into memory.

IF you were self-hosted, you could split the file up first, but that wouldn’t be supported in n8n-cloud. If I think of a way you could do this within the resource limitations in cloud, I’ll let you know. However, I think you might need to use some kind of external storage like S3 to pursue a similar approach.

dear @hubschrauber deal Community.

i have not seen a solution for the above mentioned issue, but now i solved it as follows:

  1. MAIN Workflow is triggering a subflow until the file is empty (batch size < items in subflow)
  2. subflow needs to download the file every time it runs

in my case the file is 81MB and has 55k rows. i have to use a batch size of 500. this meant the subflow runs 111 times - consuming workflow executions and useless bandwidth.

So i am still looking for a better solution. maybe n8n can work on streaming CSV files to avoid both issues?

Looking forward to hearing everyones thoughts