Building the Ultimate RAG setup with Contextual Summaries, Sparse Vectors and Reranking

You’ll probably need to update the collection’s dimension size when creating the collection if you’re using OpenAI embeddings.

  • Cohere’s embedding model has a dimension size of 1024.
  • OpenAI’s text-embedding-small has a dimension size of 1536.
  • You can’t create a collection with one size of vector and then try to save differently sized vectors to it.
{
  "vectors": {
    "default": {
       "distance": "Cosine",
       "size": "1536"  // <-- example for text-embedding-small
    }
  },
  ...
}
1 Like

Thank you, that was it :slight_smile:

Hi,

I went ahead and also tried the local version wthout any changes and the
Qdrant with BM25 ReRank seems to be not properly configured.

it says Query is not defined as an answer to the query

{
  "query": "What is BTC?"
}


query is not defined
query is not defined

Error details
Other info

n8n version

1.62.4 (Self Hosted)

Time

10/11/2024, 3:03:31 PM

Error cause

{}

I tried to understand the code, but i dont get where is the input coming from and the query is obviously there

The issue and fix in the retriever for anyone else.

 const rankedDocs = await retriever.invoke(input);  // Use 'input' instead of 'query'
1 Like

@Jim_Le , I’m struggling with the following error in LangChain Code node:
Cannot read properties of undefined (reading ‘json’) [line 24]

Using latest template (local-only ver) with n8n v1.62.5

UPDATE
FIX of error in line 24:
Do NOT touch stock parameters of " Recursive Character Text Splitter" (size 2000, overlap 0).

@Jim_Le I’m getting a Problem in node ‘Insert Documents with Sparse Vectors‘ Bad Request null error using llama3.2 locally

If I remove this line const res = await client.upsert(collectionName, { points }) the code node completes correctly.

Update the vector dimensionality to 3072. To ensure the collection is created before inserting documents, add the following code at the beginning of your script:

const collectionExistence = await client.collectionExists(collectionName);
if (!collectionExistence.exists) {
  console.log(`Collection "${collectionName}" does not exist. Creating...`);

  const collectionConfig = {
    vectors: {
      default: {
        size: 3072,
        distance: 'Cosine'
      },
    },
    sparse_vectors: {
      bm42: {
        modifier: 'idf'
      }
    }
  };

  await client.createCollection(collectionName, collectionConfig);
  console.log(`Collection "${collectionName}" created successfully.`);
}
1 Like

Hi @Jim_Le,

This is amazing! I learned so much about advanced use of n8n and sparse vectors from this. However, I suspect that there might be a bug.

Currently, the vocabulary for sparse vectors is dynamically generated per item using TfidfVectorizer, which results in inconsistent vector spaces across runs and may lead to misaligned vector representations. This is noticeable as each chunk has different number of indices.
I assume that it should build a shared vocabulary first.

I am not a programmer so hope that it makes sense :slight_smile:

Cheers

2 Likes

Hello Jim! Thank you so much for the article! This content is not only highly interesting to me but also fantastic! I am working on a project where I have actually encountered the issue of a lack of accuracy in some cases in my agents’ responses. I started studying techniques like RAG Fusion to try to improve the quality of the model’s responses, but I still need to understand how to implement this technique using n8n.

1 Like

Hello

well i got an error :



if u know what is wrong please ?

1 Like

Can confirm. We update about every 3 or so versions and since then the workflow has been broken. We’re on 1.73.1 as of this moment.
I’ve tried to figure out the cause but did not find it. I can retrieve dense vectors by themselves and the generation of sparse vectors still works. However, retrieving a hybrid of sparse and dense does not work for me anymore. @Jim_Le Do you have any insight as to what might be happening here?

Thanks both for the heads up.

After a quick check, I’ve boiled it down to (and this is my best guess!)

  • Qdrant’s API has stricter schema validation
  • n8n’s “custom workflow tool” has been updated.

For context, I’m currently on and tested using 1.77.0 but this reply should still be relevant to 1.73.0 (I think).

@Issa2024 Unfortunately, I wasn’t able to reproduce the error in your screenshot - my test document (bitcoin.pdf) was inserted into qdrant without issue. My assumption is it may have to do with your qdrant version and best advice is to try and debug the payload.

Try capturing a sample of the points and run this as a query within the qdrant dashboard. If there is an error, it’ll be clearer in the dashboard.

console.log(points.slice(0, 5)); // <-- use a sample output in the dashboard

@Poppi It seems your issue might be related with the new “custom workflow tools” changes. If you locate this line in the retrieval…

- const sparseVector = JSON.parse(await sparseVectorTool.invoke(query));

and append .response to the end?

+ const sparseVector = JSON.parse(await sparseVectorTool.invoke(query)).response;

I’m getting around to updating the templates but no guarantees this week.
Hope this helps!

Hi @Jim_Le, I have a quick question. I don’t know why the TF-IDF Node always crashes in my n8n, even with simple text. Does this happen to you as well?
image

Hi man! Thanks for sharing this amazing set-up! However, it does not work on the paid version (cloud managed plan). Any way to circumvent this? Thanks in advance!

Firstly: Thanks for the amazing contribution. I know you are busy but if you get a chance, I’m running into an error I can’t fix.

The code to store new documents works fine, my issue is when using the chat tool to attempt to retrieve a reply. The “Qdrant rerank with cohere” tool is failing:

The specific error is “Bad Request”:

{ "headers": {}, "url": "http://192.168.68.240:6333/collections/contextual_rag_work/points/query", "status": 400, "statusText": "Bad Request", "data": { "status": { "error": "Format error in JSON body: Expected some form of vector, id, or a type of query at line 1 column 12986" }, "time": 0 } }

The embedding and sparse vector portions seem to be running ok, it looks like when calling the qdrant endpoint it’s returning that error but it makes no sense to me:

I was able to resolve the issue. The problem is actually with the sparse vector tool specifically how it returns the results. The results are returned as such:

{
    "response": {
        "indices": [],
        "values": []
    }
}

Qdrant does not tolerate the “indices” and “values” being wrapped inside “response”.

To fix this, I edited the code in this section: 5. Retrieval using Sparse Vectors and ReRanker (Chat Agent Example)
Node: Qdrant with Cohere ReRank

Edit the code inside the node and replace the “5. Tool Definition” portion of the code with this:

// 5. Tool definition
const vectorStoreTool = new DynamicTool({
  name,
  description,
  func: async (input) => {
    const denseVector = await embeddings.embedQuery(input);
    const sparseVector = JSON.parse(await sparseVectorTool.invoke(input));
    console.log(denseVector);
    console.log(sparseVector.response.indices);
    const response = await client.query(collectionName, {
      prefetch: [
        {
          query: denseVector,
          using: 'default',
          limit: 100
        },
        {
          query: {
              values: sparseVector.response.values,
              indices: sparseVector.response.indices,
            
          },
          using: 'bm42',
          limit: 100
        },
     ],
     query: { fusion: 'rrf' },
     with_payload: true,
     limit,
    });

The specific area that was changed was:

{
          query: {
              values: sparseVector.response.values,
              indices: sparseVector.response.indices,
            
          },
          using: 'bm42',
          limit: 100
        },

All I have done is “unwrapped” the values and indices and passed them to Qdrant as the API currently expects based on this documentation:
Hybrid Queries - Qdrant

We’ve just published the n8n-nodes-semantic-splitter-with-context community node inspired by @Jim_Le 's workflow here. Hope it helps folks here as this workflow helped me before :slight_smile:

Coming up is a custom tool node to handle the re-ranking after retrieval with the option of applying different sub query pre-retrieval strategies.

2 Likes

Just putting n8n-nodes-query-retriever-rerank out there. It introduces the missing reranking capability that this flow lacks. it also adds in some common/useful sub-query strategies to improve results.

1 Like

I run through the workflow with Gemini CLI as I wanna convert the TF-IDF generation code to javascript instead of python because the new n8n update block os module which make the scikit-learn unusable.

Along the way, the Gemini CLI prompted me that the implementation of this code in RAG pipeline is flawed as the indexing of vocab is done on chunk level instead of the whole document level to build the MASTER VOCAB MAP. For example, in Chunk 1 “apple” may be indexed to “60” and in Chunk 2 “Car” may also be indexed to “60”. If true, the inconsistency of this will provide incorrect or irrelevant results. I also confirm this through observation of results. The max index value assigned is the max number of the words in the chunk. Please correct me if I am wrong.

Another consideration of sparse vector implementation for RAG is requirement to re-train the index whenever there is new document added into the RAG system as the MASTER VOCAB MAP will need to be updated.

Edit: I believe I am saying the same thing as @Koobah.

Hey @Eric_Ng
Thanks for the heads up - I haven’t got around to confirming but if it’s true that the latest branch has made scikit-learn unusable, that’s pretty terrible news!

For sparse vectors, I switched to Qdrant’s Fastembed quite a while ago and would totally recommend doing the same. Fastembed runs okay-ish on CPU but ideally you should run it on a GPU - check out this guide for hosting a lightweight FastEmbed server for free on lightning.ai.