I have integrated n8n with Retool, I am fetching data from retool using webhook i.e, user is uploading CSV(with 4000-15000 records) on retool and we are fetching all rows in n8n for processing.
Quick summary of what each node is doing for better understanding of the issue:
Webhook is fetching records from file uploaded on retool.
1st node, code is reading webhook output and creating a json(contain sku and quantity) for each row.
2nd node, supabase node is taking each json(with sku) as input and returning only those sku and it’s quantity which are present in supabase .
3rd node, code2 is taking values from 1st code and supabase and doing inventory addition.
4th node, supabase1 is used for updating latest inventory quantity for each json
6th node, code 4 is returning all those items which are not present in supabase.
7th node, spreadsheet is creating a spreadsheet for those skus
8th node is sending mail of that spreadsheet.
Each execution is taking atleast 30 minutes. Workflow is running file when user is uploading file with 4000 records but it gets break when user is trying to upload file with 13000 records. It is taking almost 70-80 minutes to execute and on last node it gets failed.
If allocating additional resources isn’t an option you might want to break the data down in smaller chunks. Don’t fetch 13,000 records at once but maybe 100 at a time. How to best approach this would depend on your exact set up and data structure. Perhaps you can store the CSV file dataset on an SSH/SFTP server, then fetch only 100 lines at once using something like sed -n '1,100p' filename.csv in n8n’s ssh node?
Lastly, there is a good chance the slowness you have reported is coming from the database. Perhaps you want to disable storing execution data for large datasets like this?
Couldn’t a split in batches node in combination with a subworkflow also fix/help with the issue?
(as those release memory as soon as they finish)
The code nodes are very memory hungry, as they duplicate each item in the backend, maybe you can replace those with other nodes?
I don’t know what you are doing exactly in your code nodes, but hot candidates for replacing them are probably the “List” node, the “compare Datasets” node and the “merge” node.
Only forwarding needed data and removing not needed data with the “Set” node can also help quite a lot.
@MutedJam and other folks,
I tried splitting the data into batches and execute sub-workflow. However I am facing some issue.
Here I am calling another workflow after batch node, this workflow has binary files as output, so if main workflow will run 5 times so it will be having 5 binary files. In next node(named FTP1) I am uploading these binary files on FTP server. What I want is firstly sub-workflow for all batches should run, it should collect all files and then I will upload that files to FTP and then create spreadsheet out of all files once. But what actually is happening for each execution, everytime ftp node is also executed so it is uploading file again.
How to handle this issue? i.e execute subflow for all batches first and once every batch executed then FTP node and further nodes till spreadsheet node should execute once.