Large supabase table

I want to fill a supabase database with information from API requests.

The source is the scoro CRM system with about 60,000 lines I want to put in the database.

The first issue I encountered is that n8n keeps everything in its memory so running the workflow to read from API and update supabase results in n8n becoming slower and slower and slower… Maybe this is not a n8n thing and I need to do it with code?

Secondly I now have about 8000 lines in my database and like to chat with this database using an AI agent but it is using up my API credits fast!

How to go about working with large data like this?

Information on your n8n setup

  • **n8n version:1.84.3

Hi @Sebastiaan

  1. N8n has memory limitations. You are trying to insert all entries in bulk. This will exhaust n8n memory. One option you could do is using a SplitInBatches node, limiting batch size to 1, add the postgres node after SplitInBatches, add a wait node, and loop to SplitInBatches. This will reduce memory footprint by only inserting one row at time.

  2. Using a local LLM could work to save on API costs. By running the agent locally, you can process the data without hitting external APIs, which eliminates the need to pay for each query. This way, you only pay for the infrastructure needed to run the agent, not for every API call. You can also filter your full database to reduce costs when querying Gemini.

Hi Miquel

Thanks for your reply,
I don’t really understand how this works in my version of n8n (1.84.3) I don’t have a dedicated SplitInBatches node only a Loop over Items with a batch size option and a reset.

My question still how does this limit the memory as still each loop is added to the datapool? Can I reset the data within n8n somehow?

There is no option to free memory during the process.
Try to reduce per page value in first Set to reduce consumed memory.

By the way, your process tries to work with lots of rows. This is a known limitation for n8n.
Another option could be to insert only some rows every time the workflow triggers.

But you need to save somewhere which was the last page visited (like Google Sheets or postgres).

Thanks, thats probably the best option to save the last page and then run it a few times. This will only be my initial database load after that it is only the modified fields.

Then I come to the second part of my question, when I have the database completed then it’ll have more then 60,000 rows. What is the correct setup for an ai agent to work with such a database. The desired result is that I can chat with this database and ask questions.

1 Like

You need to create a RAG with all your rows. This allows you to query by search and get the answers to the question sorted by similarity.

You have multiple options out there. Qdrant is the reference for opensource.

After getting responses, you can postprocess with OpenAI or a local LLM to prepare output response for end-user.

many thanks, that helps me a lot :pray:

1 Like