I am trying to download a CSV file and then load into a MongoDB collection. The file is large and can easily contain 100k-200k records, file size approx 20-30 MB. I tried processing direct but it is giving me memory warning, possible reason seems to be having more than 100k json records. Thats ok and understandable. Now I am trying to process it in batches using Extract from File node, batch size 1000. But now i am confused as in How to iterate to next 1000 records. I tried using simple loop with Starting Line option in Extract From File to dynamically read the next 1000 record based on runIndex but that node expects another binary file instead.
Can any one please guide how to loop over and successfully process 1000 records until there is no more records in CSV?
There are a number of great examples of that node in use in our forum here, if you search for the node. You may also look into decreasing the batch size or splitting the CSV file into many, storing it in a cloud storage tool like S3, then iterating through all of the csv files and breaking down the batches even further in size.
But that’s just a recommendation. Happy building!
Perhaps i was not very clear on my requirement or may be i am missing something. If I understand correctly, Loop Over Items will split a large number of items into batches and then loop until last item. But in my case, it will only be applicable after all the records are extracted from the CSV file and feeded into Loop over Items node. Thats where the problem is, i am unable to extract the full records from the csv file, i am getting out of memory or something sismilar error there, when i checked the documentations, it said i cant have very large number of JSON records. So i am extracting the records in the multiple of 100k records but i am facing hard time looping back to extract node to extract next 100k records.
In this case if I cant get the full records out how can I pass it to Loop Over Items node?
When i am loopong back, it says it is expecting object type as input. Please confirm if my understanding is correct and advice on the possible solution. Also confirm of Loop Over Item can extract records from csv directly and loop?
Hi Vishnu, yeah due to the memory limitations you are limited in terms of your options. Your best one (and fastest) right now is to upgrade to a higher tier cloud plan or to self-host to manage memory.
What are the numbers in terms of memory limitation in each cloud tier? In self host also what are the recommended memory / cpu for processing upto 200k json records? I have tried it on 1 gb ram and 1 cpu server just for this simple workflow, no other load, it hanged entire server, seems inefficient. Before running this workflow cpu and memory utilization were under 20%. If processing 21 mb file consumes whole 750 mb if memory, i have serious doubts on the performance and efficiency of N8N.
Is there any way to restrict such workflows from utilising all the available memory or cpu in self host?
One thing I wanted to highlight was I said your fastest (not only) solution was to upgrade or self-host. I was able to run your workflow with just a minor tweak on a Pro tier cloud plan. That being said, there are alternative paths you could go down for preprocessing your file in such a way that you wouldn’t ever encounter a memory warning even on our lower tier paid plans, if that’s your desired approach.
You can find a number of these mentioned by community members here if you search for CSV-related issues. Example:
Regarding your memory-related questions, I encourage you to reach through these articles.
Can you please share what exact minor tweak you did to make my workflow work even in Pro plan? Meanwhile I will go through the the shared links, thank you!
Thanks, but that was anyways not required and i was unable to reach there in the loop on first place. It was failling everytime on the extract node. Umable to extract 150k or more JSON records. Loop was to process it in batches of 1k or 10k.
How did it worked on yours while not on mine? I am on Pro plan i believe? Did you mean higher pro plans?
I have taken a look and at the moment we can’t loop CSV files like that so it would be a case of making smaller files with chunks of data which should work if that fails I would look again at the API to see if there is a better way to handle it until we can support looping csv files.
I’ve looked at part of the document and see that you may not need to log this information very often, and if it was a third party viewer for MongoDB with a one time manual csv upload it would make it easier.
Directus, Supabase, phpMyAdmin have such functionality. So you can do it manually.
Make two bases the first active, the second for synchronisation. And compare them with each other, adding or discarding unnecessary records.
Or find a service that receives exactly the same parameters you need and filtering by API