Supabase Vector Store creates fragmented / contextless documents from CSV FAQ import

Hello Wouter_Nigrini,

Thank you for your reply. The questions and answers are not longer than two average sentences. BUT I’VE SOLVED IT NOW! I’m getting the right answers from the chat now!

Solution (fixed fragmented / contextless rows in Supabase Vector Store)

Root cause:
The issue was caused by the Default Data Loader ingesting more than just my intended document text.
When the Data Loader was set to load the full JSON input (or effectively treated the whole item as input), it ended up creating separate documents from individual string fields (especially metadata fields like category, faq_question, and source). That’s why Supabase documents.content contained rows like:

  • Lieferung

  • faq_csv

  • the question alone

  • etc.

So the vector store was not “randomly chunking” the FAQ — it was actually embedding metadata values as their own documents.

What I changed

  1. Default Data Loader
    I changed it to load only the actual document text, via expression input:
  • jsonMode: expressionData

  • jsonData: ={{ $json.text }}

This forces the Data Loader to treat only $json.text (my combined Category + Question + Answer) as the document content.

Result

One coherent vector-store document per CSV row (full FAQ)
No more contextless fragments in documents.content
Retrieval quality improved because embeddings are created from the complete FAQ text only

Splitter:

The Recursive Character Text Splitter works as well :slight_smile:

Now i love n8n automation again

1 Like

Hi @Sneggo,

Are you able to share the csv you’re using? I’d like to get an idea of of how long the answers are.

In the meantime to answer your question, yes this is expected behaviour of any vector db. The reason larger pieces of text is split over multiple records is to preserve token count when using semantic searching, when if too long could cut off the answer and give poor results. However if you’re answers are generally short, then it should fit into one vector. Im curious about your choice of using a char splitter of the recursive text splitter.

Oh, I think I just deleted my original error post, didn’t I? Sorry about that — it was my first post.