Working with larger files (2MB - 20MB+)

messi198310 · February 2, 2022, 12:29pm

We are looking for some advice on best practices around working with larger files. We have scenarios where we would like to download multiple files from S3 buckets and upload them to other endpoints i.e. Sharepoint or SFTP. We are noticing latency and also failures when it comes to larger files in this process. This could potentially be resource constraints but this is the only workflow running at the point in time.

The files will be a mixture of sizes ranging from 2MB to potentially 40MB

We have tried to download locally to the AWS instance we are running N8N on and read them in but we then encounter 413 errors.

Any advice would be appreciated.

Our set-up is N8N (1 node) running in Docker on a t4g.small instance with the default container resource limits.

Database is Postgres running in Docker

Thanks

MutedJam · February 2, 2022, 2:02pm

Hi @messi198310, as mentioned in the past (e.g. Upload all files from FTP to Nextcloud - #16 by MutedJam) I suggest using a more specialized tool when working with larger binary files such as rclone (which can still be controlled through n8n).

That said, if you want to do this with n8n the below might help:

Address the 413 issue by setting the N8N_PAYLOAD_SIZE_MAX variable to a higher value (you can find more background information on this here). When using a reverse proxy you might also need to make adjustments in there.
When running the latest version of n8n you can also set the N8N_DEFAULT_BINARY_DATA_MODE variable to filesystem to avoid n8n loading binary data in memory and use the filesystem instead. This can help if in addition to the 413 errors you also see your entire instance crashing with out of memory problems. Data will still be kept in memory completely if you convert binary data into JSON data though.

You should also consider reducing the size of the data being processed for example by:

Splitting the data processed into smaller chunks (e.g. instead of fetching multiple files with each execution, process only one file per execution)
Split the workflow up into into sub-workflows (e.g. instead of having your data pass 50 nodes in one workflow, have it pass 10 nodes in 5 workflows each)
Avoid using the Function node
Avoid executing the workflow manually (as this means another copy of the data is kept for the UI)

messi198310 · February 2, 2022, 3:49pm

Thanks @MutedJam Will update N8N and test this in our Dev environment and feedback.

kp-kun-dip · March 28, 2023, 2:19am

@MutedJam
How Do I update N8N_PAYLOAD_SIZE_MAX variable when I am using n8n via docker on AWS?
I tried to change another variable (EXECUTIONS_PROCESS to main) by adding the variable in task definitions on AWS ECS where n8n is deployed.

When I did the same with N8N_PAYLOAD_SIZE_MAX, it does not work.
I practically tried working with file sizes from 2KB to 30MB.
It crashes exactly when file size crosses 16MB (default value for N8N_PAYLOAD_SIZE_MAX)

Task definitions is not the place to add N8N_PAYLOAD_SIZE_MAX variable?

Jon · March 28, 2023, 4:00am

Hey @kp-kun-dip,

Looks like you opened a new thread for this one which is handy. We will work from that instead of here.

MutedJam · June 26, 2023, 4:01am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.