Complex RAG System with OCR and Image Handling in N8N?

Hey everyone,

Has anyone built a more advanced RAG system in N8N that also handles images effectively? I’m dealing with detailed manuals (like mechanical engineering documents) where text and images/drawings are intertwined. Traditional RAG setups often focus purely on text, but I need a way to split and embed data (both text and images), possibly via OCR, and then retrieve the images themselves when needed—rather than only converting everything to text or vectors.

Any tips or workflows on how to:

  1. OCR/process these manuals (often in complex PDF formats).
  2. Store/retrieve images (so I can query a manual and get an actual image back).
  3. Keep it all integrated within N8N?

Would really appreciate any insights… I added 3 images to showcase what kind of information retrieval I am mainly talking about. But in general I am talking about more complex elements of a pdf.

Thanks!

Information on your n8n setup

Hey @Kiremit

You might want to look into AI vision and the “grounding” approach which has been quite popular in the legal space over the past year (See OrbitalWitness). The idea may be a few steps removed from what I assume you ultimately want but a good starting point:

  1. split document into pages and store pages (as image assets) separately ie. on-disk or via object store
  2. create multivector embeddings for each page asset [^1]
  3. attach the document ref, page ref, page number etc to each relevant embedding as metadata
  4. when retrieving matching results from vector store, use the metadata to fetch the previous stored page asset.
  5. Display page asset to the user
  6. Optional, you can try post-process as needed to extract the specific image which may be tricky.

There was a recent n8n x Qdrant webinar where Evgeniya @ Devrel, Qdrant showed how to use Voyage.ai’s new Multimodal Embeddings API with Qdrant - pretty sure this could work for tech manuals as well. She shared the template here - Uploading image datasets to Qdrant [1/3 anomaly][1/2 KNN] | n8n workflow template

Here are some other great data extraction examples using AI vision & n8n:

1 Might also make sense to create and manage a separate collection/index for images. ie. If you need to rebuild your text-only vector collection, you’d want to avoid re-vectorising all the images again.

Hope this helps.

6 Likes

not really able to add anything that @Jim_Le has not already stated but just wanted to chime in and say that was an awesomely detailed answer. Kudos!

1 Like

If n8n can implement langchain MultiVectorRetriever which can have vectorstore and docstore and this would easier to create the pipeline to retireve source. Is it possible?

Hi @Jim_Le,

Thanks for the great answer! I’m mainly interested in a solution where, based on the document type, I can easily determine which OCR algorithm or technique to use. Otherwise, if the document type changes slightly and a different solution might be more suitable, I’d need to completely adapt the workflow. I’m looking for something that can be set up to work across multiple use cases, if that makes sense.

Hey @Jim_Le

Regarding the integration of Voyage specifically for embeddings within n8n, how do you integrate it?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.