Help on using Loop to process multiple files (up to 200) by batch

Help on using Loop to process multiple files (up to 200) by batch;

as below snapshot shows, using csv files to upload to postgres, facing multiple files (up to 200) to be processed.
trying to leverage loop and split, looking for any node

  1. Could only read files’ name only (with naming rule doing filtering, such as timestamp), not load file content.
  2. Continue with split and loop to process csv to postgres by batch, (if possible, for single file process, anothe inner loop by 10000 records per process?)
    Thanks

Hey @snxnj hope all is well.

Let’s try to unwrap this.

if you need to filter by file name this node needs to come before the read from disk node. Put a Filter node just before and only pass the files with the name which satisfies the filter.

Not clear what is the question.

Could you tell us about what you are trying to automate? I assume you need to read large csv files and upload them to postgres? Are you running cloud n8n? If not, and you are running local n8n, do you also run a local postgres? Please expand on your setup a little bit.

thanks for response, Jabbson.
Using local n8n and postgres. below as csv-postgres automation done, which works fine for limited number of files. But doesn’t work with 100 csv files, as Node “read/write file from disk” will take long time to read all 100 files(with content) for hours, which causing service down due to memory issue.
Problem trying to resolve, looking for node could :

  1. Node to read all files’ name, not content.
    2.Based on files list, using loop/split to process file 1 by 1.

You could you an Execute a command node to list the tiles with their respective paths, then split it and loop over these split values to process you csvs.

Something like this:

Thanks, will try. any idea how to restrict up to 10000 records process for postgres node? Thanks

How many are in one file?

with each batch loop runs, each csv 0.1M records

0.1M records… do you mean 0.00001e10 records? :slight_smile:

Anyhow, reading records to memory in N8N doesn’t appear to be the best idea to begin with. You may want to re-think the ingestion pipeline, with something like
mounting the shared folder to the postgres db and then reading it directly from the database with COPY target_table(col1,col2,...) FROM '/data/file.csv' or a similar solution. This will literally eat next to no memory and will be MUCH faster.

lol. Copy approach surely more efficient, but still need to resolve the conflict ones (comparing to insert/update).

This can be done within the same transaction: copy from csv into a temp table, then populate your intended table with data from the temp table while resolving conflicts (ON CONFLICT (X) DO UPDATE). All of this can be done from execute action with SQL instead of per line writes with memory buffer in n8n.

1 Like

Thanks, Jabbson. Will try it out.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.