Contextual Adaptive Retrieval: A High-Dimensional Embeddings + Adaptive n8n RAG Experiment

I’m sharing the components of a retrieval experiment I’ve been working through for community feedback. If you’ve wanted to test larger dimension embedding models (like gemini-embedding-001 with 3072 dimensions), implement contextual retrieval, or experiment with multiple sub-query strategies within n8n, this setup might be for you.

:file_cabinet: Part 1: Vector Database Setup

Start with a pgvector database - Supabase is excellent for this (either self-hosted or their managed service).

Then run this ridiculous SQL in the editor that implements:

  • Adaptive Retrieval: Two-stage vector search that uses lower dimensions for fast filtering, then full dimensions for precise ranking
  • Hybrid Search: Combines keyword and semantic search with reciprocal rank fusion
  • High-dimensional support: Handles up to 3072-dimensional embeddings efficiently

Why this matters: This should result in 3-4x faster retrieval (compared to relying on the larger dimensions alone) while maintaining accuracy, plus the comprehensiveness of hybrid search that catches both exact matches and conceptually related content.

:memo: Part 2: Document Ingestion

Set up a workflow to insert documents using:

This node provides two key capabilities:

  1. Semantic double-pass merge: Groups related concepts and isolates distinct ideas into coherent chunks
  2. Contextual Retrieval: Adds situational context to each chunk within the overall document, significantly improving search relevance

:magnifying_glass_tilted_left: Part 3: Retrieval Workflow

Create a retrieval workflow using:

This setup enables:

  • Multiple sub-query strategies: Multi-Query (generates diverse phrasings) or Multi-Step (breaks complex queries into steps)
  • Reranking: Post-processes retrieved documents for optimal relevance ordering

:rocket: Part 4: Enhanced Embedding Nodes

For the embedding components, try these extended nodes that unlock Google AI capabilities:

These expose critical parameters missing from official nodes:

  • Higher dimensions (up to 3072 for gemini-embedding-001)
  • Batch processing options
  • Task type specification for optimized embeddings
  • Support for a single document per request requirement unique to gemini-embedding-001

:money_bag: Part 5: Cost Considerations

Fair warning: This setup significantly increases API costs compared to typical RAG implementations. Here’s what to budget for:

Ingestion Costs (One-time per document)

  • Contextual retrieval: 1 chat API call per chunk to generate context (typically 50-100 tokens output)
  • Higher-dimensional embeddings: More expensive than standard models
  • Example: For a 50-page document split into 100 chunks, expect ~100 chat API calls plus 100 embedding calls

Query-time Costs (Per user question)

  • Multi-query strategies: 3-5x more embedding calls (generates 3-5 variations per question)
  • Multi-step queries: Additional chat API calls to decompose complex questions
  • Reranking: Extra API calls to reranking services (e.g., Cohere Rerank)
  • Example: Depending on set parameters, a single user question may trigger 5 embedding calls + 2-3 chat calls + 6 reranking calls

Depending on your use case, the performance gains may justify the costs for production use cases, but definitely factor this into your experimentation budget. Consider starting with a small document set to test effectiveness before scaling up. YMMV, but I’d love to hear about it!

:bullseye: What This Gives You

This experimental setup combines some advanced document treatment and retrieval techniques :

  • Speed: Adaptive retrieval should result in a 3-4x performance improvement compared to relying on higher dimensions alone
  • Accuracy: Hybrid search catches both keyword and semantic matches with the accuracy of higher dimensions
  • Context: Situational awareness in document chunks which should reduce retrieval failure rates by more than 40%
  • Reranking: Which should reduce retrieval failure rates by another 15%
  • Flexibility: Multiple query strategies to support different use cases and content types
  • Scale: Support for high-dimensional, state-of-the-art embedding models

Perfect for testing the limits of what’s possible with RAG in n8n!


Would love to hear your experiences if you try this - what works, what doesn’t, how it performs compared to your current RAG, and what improvements you discover!

3 Likes

@blah Wow, this is a really interesting set of components you’ve built.
Will definitely give this a go. Thanks for making them available!

1 Like

Hi @blah could you please upload a workflow example? I have all the “setup” prepared but im struggling to connect all things properly.
many thanks

Hi @blah . Thanks for your work. I recently started playing around with n8n and am trying to build a decent assistant with RAG, including reranking. I found an info about n8n-nodes-query-retriever-rerank community node and decided to use it. For bigger volume vector stores it is a real game changer! I think this node should become an official n8n node!

One comment about its reaction to Timeout on LLM node. I am running an extremely slow setup on local laptop with CPUs only and my LLM node takes more than 60 seconds to answer. I got “Request timed out” every time reranker node used LLM and did not know how to solve it until I realized about this Timeout option. If it is somehow possible to inform better the n8n user about it, then maybe this could be an improvement.

Debugging and LLM Debug analysis options are amazing too! Incredibly helpful in further optimizations.