PGVector + n8n RAG: High Token Usage, Slow Performance & Same Results for Different Queries

Hi everyone,

I’m working on a RAG setup using n8n with PostgreSQL + PGVector and facing a few issues:

1. Performance & token usage

Even with low Top-K (e.g. 10), response times are very high (minutes) and token usage becomes extremely large.

  • How can I reduce token usage and ensure only relevant context is passed?

2. Top-K / Limit behavior

It seems like the Top-K setting just limits the number of rows (e.g. 10, 20, 30).

  • If relevant data is outside this range, it won’t be found.

  • It feels like the system is not ranking by similarity but just limiting results.

How exactly does Top-K work in PGVector? Should it always return the most similar results?


3. Same results for different queries

No matter what I ask, I often get the same Top-K results.

  • What could cause identical results for different queries?

  • Could this be a query embedding or workflow issue?

Thanks for any help.

Hi @Leon22 if your goal is to get the most accurate retrieval, using pinecone vector store would be a better take. (at least is worth giving a try) (easy injestion & easy retrieval):

@Leon22 Hi :waving_hand:

I agree with Mr @Benjamin_Behrens and Mr @Anshul_Namdev
this article may help you, I recommend it:

Thanks, this is really helpful — I’ll go through your points step by step.

Chunking setup:

  • Chunk size: 1200

  • Overlap: 200

  • Recursive Character Splitter Embedding setup:

    • Model: nomic-embed-text-latest

    • Batch size: 200

Thanks, I’ll take a closer look at this :+1:

I adjusted the configuration:

Chunk size: 512
Overlap: 100
Batch size: 32

After testing these settings, everything is working well now. Thanks again for your support!