Working with larger files (2MB - 20MB+)

We are looking for some advice on best practices around working with larger files. We have scenarios where we would like to download multiple files from S3 buckets and upload them to other endpoints i.e. Sharepoint or SFTP. We are noticing latency and also failures when it comes to larger files in this process. This could potentially be resource constraints but this is the only workflow running at the point in time.

The files will be a mixture of sizes ranging from 2MB to potentially 40MB

We have tried to download locally to the AWS instance we are running N8N on and read them in but we then encounter 413 errors.

Any advice would be appreciated.

Our set-up is N8N (1 node) running in Docker on a t4g.small instance with the default container resource limits.

Database is Postgres running in Docker

Thanks

Hi @messi198310, as mentioned in the past (e.g. Upload all files from FTP to Nextcloud - #16 by MutedJam) I suggest using a more specialized tool when working with larger binary files such as rclone (which can still be controlled through n8n).

That said, if you want to do this with n8n the below might help:

  • Address the 413 issue by setting the N8N_PAYLOAD_SIZE_MAX variable to a higher value (you can find more background information on this here). When using a reverse proxy you might also need to make adjustments in there.
  • When running the latest version of n8n you can also set the N8N_DEFAULT_BINARY_DATA_MODE variable to filesystem to avoid n8n loading binary data in memory and use the filesystem instead. This can help if in addition to the 413 errors you also see your entire instance crashing with out of memory problems. Data will still be kept in memory completely if you convert binary data into JSON data though.

You should also consider reducing the size of the data being processed for example by:

  1. Splitting the data processed into smaller chunks (e.g. instead of fetching multiple files with each execution, process only one file per execution)
  2. Split the workflow up into into sub-workflows (e.g. instead of having your data pass 50 nodes in one workflow, have it pass 10 nodes in 5 workflows each)
  3. Avoid using the Function node
  4. Avoid executing the workflow manually (as this means another copy of the data is kept for the UI)
2 Likes

Thanks @MutedJam Will update N8N and test this in our Dev environment and feedback.

1 Like