Need Help building a RAG in n8n

Hi all,

I’m working on a local RAG pipeline for engineering reports. While the vector search works for the “content,” the Agent is unable to “see” the source document title.

I suspect the issue is either in how I’m passing the metadata to Qdrant during the indexing phase or how the AI Agent node is configured to view that metadata.

Has anyone successfully implemented a “Source Citation” style response using local Ollama models? I need the agent to explicitly state which file the information came from.

1 Like

Welcome to the community :tada: @mitidi

This is likely a metadata handling issue, not a bug with Ollama or Qdrant.

The core problem is that when a PDF is chunked, the document title (from the filename or first page) isn’t automatically carried into each chunk. By default, Qdrant stores the vector embeddings but not the surrounding context, so that title information is lost unless you explicitly preserve it.

Here’s how to fix it:

You need to manually attach the title to each chunk as metadata before sending it to the vector database.

  1. Extract the Title: Early in your workflow, get the document title (e.g., from the filename or by extracting text from the first page).
  2. Attach Metadata: For every text chunk, add fields like document_title and a document_id to its metadata.
  3. Store in Qdrant: Ensure this metadata is saved alongside the vector embedding.
  4. Retrieve Intelligently: When querying, group the results by document_title to summarize or process them per document.

Important: The Agentic RAG node does not do this automatically. You must explicitly inject the metadata during ingestion and aggregate by document during retrieval. Embeddings only capture semantic meaning, not structural details like titles, unless you include them as metadata.

2 Likes

omg thanks bright, can i do this by adding some Code notes?

thanks once again

1 Like

You’re on the right track! Here is the workflow logic:

Extract: Pull text from the PDF.

Split: Chunk the text as usual.

Enrich (Code Node): Loop through the chunks and add a metadata property to each e.g., item.json.metadata.title = ‘Project_A.pdf’

Upsert: Send those enriched chunks to the Qdrant node.

This prevents the ‘identity loss’ you’re seeing during retrieval because the title is now hardcoded into the data being searched.

kindly mark as the solution if this helps!

3 Likes

really appreciate the help.

1 Like

,Hi mitidi

I read your thread about the RAG Source Citation issue. Brighto’s logic is correct, but the real challenge is the exact Code Node implementation to ensure the document_title

is injected into every chunk before upserting to .Qdrant

I specialize in building complex RAG pipelines in n8n. I can provide you with the exact n8n JSON for the Code Node that extracts the title and maps it to the .metadata of each chunk automatically

I’ll send you this JSON for free right now to get you moving. If it works (which it will), we can agree on a small fixed price (e.g., $100) to help you optimize the retrieval or add the final “Source Citation” logic to your .Al Agent

?Should I send the JSON over

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.