I have a project involving a large CSV file (30-100 GB) containing leads. The file does not update frequently.
Goal:
When a user submits a form with dynamic filters (e.g., Location = United States, Position = CEO/Owner, Email Address = Non-empty), the request should be sent to N8N.
N8N should filter the file based on the user’s criteria.
The filtered results should be emailed to the user.
Challenges:
The file is extremely large, and I’m not sure how to perform the filtering efficiently within N8N or if it’s even feasible.
Some delay in processing is acceptable, but the entire process should preferably happen within N8N. However, I’m open to external tools if absolutely necessary.
Questions for Guidance:
How can I efficiently filter such a large CSV file in N8N?
If N8N alone cannot handle this, what external tools can be integrated while maintaining simplicity?
What’s the best way to process large files incrementally (e.g., chunking) to avoid memory issues in N8N?
Are there specific N8N nodes or workflows tailored for handling such scenarios?
Update: I`m somewhat found the answer using Supabase.
Could you share the solution you have come to so that the other community members could benefit from it?
Having CSV file that large is indeed a challenge to process due to resource limitation available, in particular to workspaces on n8n Cloud.
The solution depends of where the file is located and if there is an API to access it. You could, for example, use Snowflake to load such a file if, say, it is stored on AWS S3 bucket. Having data hosted on Snowflake makes it much easier to retrieve in batches with n8n.
My solution was to use Supabase. I didn’t have the chance to test it with a real huge database because I wanted to know if I can figure the filters out first.
So far I only managed to filter on criteria’s using only 1 OR variable and multiple AND variables.
Because the input will be inconsistent, I’ve run it through AI Agent to create the string that is needed for the Supabase search. it is still work in progress