Reading big CSV on self hosted business plan: Invalid string length

I am trying to read a 2.9 Mio Lines CSVs using this approach Convert Binary file To JSON or CSV - #5 by MutedJam (but in a single workflow, as I don’t want to spawn 1000s or new workflows)

I added a console.log() for the current lines and the full file i read, just to crash at the very end.

I first thought it’s about saving the execution data, but I turned off execution data saving already.

We are on Version 1.109.2, docker based, with 5 workers.

What can be causing this?

Regards,

Alex

docker compose logs output

n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2911002 - 2912001'
n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2912002 - 2913001'
n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2913002 - 2914001'
n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2914002 - 2915001'
n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2915002 - 2916001'
n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2916002 - 2917001'
n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2917002 - 2918001'
n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2918002 - 2919001'
n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2919002 - 2920001'
n8n-worker-2        | [Workflow "lUrk0WXKm4Htw02A"][Node "Code1"] '2920002 - 2921001'
n8n_stage-webhook   | Problem with execution 2160: This execution failed to be processed too many times and will no longer retry. To allow this execution to complete, please break down your workflow or scale up your workers or adjust your worker settings.. Aborting.
n8n_stage-webhook   | This execution failed to be processed too many times and will no longer retry. To allow this execution to complete, please break down your workflow or scale up your workers or adjust your worker settings. (execution 2160)
n8n_stage-webhook   | Error: job stalled more than maxStalledCount (execution 2160)
n8n-worker-2        | There was a problem running hook "workflowExecuteAfter" RangeError: Invalid string length
n8n-worker-2        |     at Array.join (<anonymous>)
n8n-worker-2        |     at stringify (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/[email protected]/node_modules/flatted/cjs/index.js:78:23)
n8n-worker-2        |     at ExecutionRepository.updateExistingExecution (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+db@file+packages+@n8n+db_@[email protected]_@opentelemetry+sdk-trace-base@1._5b802e9bb4ce6a4d8b16db2ab27576e1/node_modules/@n8n/db/src/repositories/execution.repository.ts:429:43)
n8n-worker-2        |     at updateExistingExecution (/usr/local/lib/node_modules/n8n/src/execution-lifecycle/shared/shared-hook-functions.ts:90:43)
n8n-worker-2        |     at ExecutionLifecycleHooks.<anonymous> (/usr/local/lib/node_modules/n8n/src/execution-lifecycle/execution-lifecycle-hooks.ts:436:33)
n8n-worker-2        |     at ExecutionLifecycleHooks.runHook (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/n8n-core@file+packages+core_@[email protected]_@[email protected]_5aee33ef851c7de341eb325c6a25e0ff/node_modules/n8n-core/src/execution-engine/execution-lifecycle-hooks.ts:120:28)
n8n-worker-2        |     at processTicksAndRejections (node:internal/process/task_queues:105:5)
n8n-worker-2        |     at /usr/local/lib/node_modules/n8n/node_modules/.pnpm/n8n-core@file+packages+core_@[email protected]_@[email protected]_5aee33ef851c7de341eb325c6a25e0ff/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:2281:6
n8n-worker-2        | Worker finished execution 2160 (job 1007)
n8n-worker-2        | Worker errored while running execution 2160 (job 1007)
n8n-worker-2        | Queue errored
n8n-worker-2        | Queue errored
n8n-worker-2        | Error: Missing key for job 1007 failed
n8n-worker-2        |     at Object.finishedErrors (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/[email protected]_patch_hash=a4b6d56db16fe5870646929938466d6a5c668435fd1551bed6a93fffb597ba42/node_modules/bull/lib/scripts.js:287:16)
n8n-worker-2        |     at Job.moveToFailed (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/[email protected]_patch_hash=a4b6d56db16fe5870646929938466d6a5c668435fd1551bed6a93fffb597ba42/node_modules/bull/lib/job.js:345:19)
n8n-worker-2        |     at processTicksAndRejections (node:internal/process/task_queues:105:5)
n8n_stage-postgres  | 2025-09-26 17:02:12.811 CEST [27] LOG:  checkpoint starting: time
n8n_stage-postgres  | 2025-09-26 17:02:14.626 CEST [27] LOG:  checkpoint complete: wrote 19 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.806 s, sync=0.005 s, total=1.815 s; sync files=17, longest=0.002 s, average=0.001 s; distance=74 kB, estimate=18447 kB; lsn=13/ABF5EE40, redo lsn=13/ABF5EE08

If I were to propose the way to handle this, i’d say - load the file into a proper database before you do any other processing, filtering, sorting, ordering, grouping etc. Having data loaded in a database will make the rest of the process just so much smoother.

1 Like

Hi Jabbson,

thanks for your response.

Sure, I was thinking of alternative solutions.

But my usecase is pretty much “read the CSV and forward it to an API after creating md5 hashes for the IDs”, so there is no random access / filtering needed which would justify loading it to a database first.

I think the way I am using, with a combination of sed and head to read only chunks of the CSVs to memory should be efficient enough.

Even if I read the data in chunks from a different source (database), n8n still would keep them in memory.

The interesting part is, that the full file I read and n8n crashes only in the end. So I believe this is some kind of bug?

The use case is pretty simple and we could of course do this outside of n8n. But we need to integrate it into other file watchers for other imports, so this would not be very elegant (and we would have problems to justify the business plan payments if we cannot use n8n for such a seemingly simple usecase)

It sounds like you might be hitting memory limits with such a large file. I’d recommend loading the data into a database first, which will make processing way smoother. You could also try splitting the file into smaller chunks to prevent overloading memory. If you’re using workers, make sure they’re efficiently handling manageable portions of the data.

That’s exactly what I am doing, I split it in small chunks via “see" and load those chunks only. Also according to my logs the full file is processed and then the workflow just crashes right before the end.

I already switched of the save options in the workflow settings so I am wondering how this can be investigated further

Regards Alex

The issue you’re dealing with is how n8n handles data. It doesn’t have the ability to read a file line by line with a pointer like a developer normally would have the capability of having.

Recommendations of putting it in a database first are smart, yet are overseeing the issue that you can’t load the file in n8n because of memory limitations.

If you’re limited to only n8n, my recommendation would be to set up a code node (or potentially a separate workflow), that processes the external file using a pointer system (here’s reference to do it in Node JS), and this way you send it to a database (or data table).

From there you can query your data from within n8n.

If you have more freedom from a coding perspective, I would set this all up in a Python install. Just host it on something like Sevalla, and you’re golden.

1 Like

Thanks for your response!

I think there is some misunderstanding here - maybe because I did not share my workflow. I am posting below now.

According to what I analyzed so far, I assume it’s a problem with finishing the workflow, not a memory leak on the way. I am already reading in chunks.

As written in the original question I use the linked approach Convert Binary file To JSON or CSV - #5 by MutedJam which uses sed -n start, end in the execute Command node to read the file in chunks.

I am pretty sure that this technique doesn’t use much memory in the n8n node process - but only for the actual chunk of data.

I tried with reading 10000 and 1000 lines chunks and a single chunk seems to fit properly in n8n’s memory. I read such still quite big chunks because I also want to pass data to the Shopware 6 sync API in chunks and not line by line. The lines are pretty short (one line ~ 64 Bytes, but 2,920,269 lines in total, full size 189MB, but a 1000 line chunk would only be 65KB)

As you can see in the workflow I also added a progress logging which prints out the current progress to the docker’s console (when executed via the webhook for testing, instead of a manual test run)

There I can see that the batching runs well until the full file is read (until line 2921001) as printed in the original log (I renamed the Node from earlier “Code1” to “Log current status”)

So I think reading of the file is actually working. But then after everything was read, I get the above error message and I don’t know why.

I think it has to to something with the finalization of the workflow - because it it’s a memory problem during the reading process, I would be a big coincidence that it just happens after everything was read?

The first thing I would highly recommend is to update the nodes to the latest version.

  • Code node (first, outdated)
  • Split in Batches (outdated)
  • Move to file (deprecated)
  • Spreadsheet file (deprecated)
  • Set node (last in the wf, outdated)

I’d recommend, you create a new canvas and re-build the workflow from scratch using up to date versions of nodes and nodes which replaced some of the other nodes from over 2 years ago.

1 Like

Oh wow, I did not know that. Thanks a lot. That was copied from the forum and I rebuilt it now.

:crossed_fingers:

It looked quite promissing at first, but then, after the last chunk was read, it too a minute or so and the error appeared again.

I guess I would need to create a subworkflow to read the data as n8n keeps all the previous nodes data in memory, because they still can be accessed … I kind of overlooked that in the initial posting where I got the approach from.

So it seams for work very well when I put the loop into a subworkflow and read 10000 lines at once.

I first was afraid I would SPAM the execution log with 100s of workflows, but I just set the Chunk Workflow to “no save” and it’s all good.

Thanks everybody!

3 Likes

Great job, @alexm, running the subflow was a very good call here.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.