Conversational RAG Agent retrieving vectorized data

Hi everyone,

I’m designing a new workflow to enable conversations with an expert agent for delivery notes.
This agent uses a tool called retrieve-delivery-notes, which accesses a Supabase vector database.
Each delivery note is vectorized and stored in a table with the following structure: id, content, metadata, and embedding.

Example:

*content*
DeliveryNote 0200AL05/247238  client GARGA TRUCKS,O.E.:
Product: 0.5 LT DOT-4 LIQAD (ref: 34500)
Quant: 1.0000
Price: 11.8000€
Discount: 40.00%
Brand: AD
Subfamily: LIQUID F
Group: Chemical Prod
Category: Chemical 
Date: 2025-01-02T14:39:13.301Z

*metadata*
{"loc":{"lines":{"to":10,"from":1}},"date":"2025-01-02T14:39:13.301Z","group":"LIFR","brand":"AD", ...}

*embedding*
[-0.0026369363,-0.01097608,-0.036765408, ... ]

Later, I plan to extend this agent to retrieve and reason over additional tables like purchases, invoices, and more.

How would you recommend approaching this?
Here’s what I currently have:

The Agent usually gives info but it sometimes doesn’t react as I need, truncating data or answering it can’t help.

Does everything depend on the prompt message into the AI Agent? Do I need adding anything else?

I want it to answer questions as:

Which clients show a drop in sales (in euros and units) of more than X% in a specific product line, or in Route X, or on Fridays?
Which clients purchase the most from a specific brand or product line?
Provide a ranking of sales for the “CHEMICALS” group under brand “AD”, or for “TOOLS”.
What product does [CLIENT] usually buy?

Thank you!

Information on your n8n setup

  • n8n version: 1.88
  • Database (default: SQLite): Supabase
  • n8n EXECUTIONS_PROCESS setting (default: own, main): own
  • Running n8n via (Docker, npm, n8n cloud, desktop app): cloud
  • Operating system: windows

This presents two main challenges:

  1. I need to vectorize a large volume of data — potentially thousands of records per day. I’ve encountered issues with this, as n8n doesn’t seem well-suited for processing such high volumes. I’m calling REST web services that return thousands of records, and I vectorize each one before inserting it into a Supabase database. It doesn’t work, cause the problem basically freezes at some point.
  2. This data needs to be accessed by my Agent, which retrieves it to answer client questions. I’m also wondering whether handling such a large dataset could become a performance issue for the Agent as well.

Any ideas?