Semantic Document Processing with Embeddings and AI Agents in n8n

:rocket: Semantic Document Processing with Embeddings and AI Agents in n8n

User Guide for n8n Workflow Setup


:clipboard: What This Workflow Does

This intelligent workflow combines multiple AI technologies to analyze images and extract meaningful insights:

  • OCR (Optical Character Recognition): Extracts text from images
  • RAG (Retrieval-Augmented Generation): Uses custom knowledge base for accurate analysis
  • Vector Embeddings: Converts documents into searchable semantic data
  • AI Agent: Intelligent decision-making powered by Google Gemini
  • Webhooks: Receives image data via HTTP endpoints
  • JavaScript Code: Custom data processing and formatting
  • Query Data Tool: Retrieves relevant information from your knowledge base

Use Cases: Product analysis, document processing, label scanning, receipt parsing, ingredient verification, or any image-to-insight workflow you need!


:wrench: Setup Guide

Step 1: Get Your API Keys

A) Google Gemini API (Required)

  1. Visit Google AI Studio
  2. Click “Get API Key” or “Create API Key”
  3. Copy your API key
  4. Keep it safe - you’ll need it for n8n

B) OCR.space API (Required)

  1. Go to OCR.space API Page
  2. Register for a free account
  3. Get your free API key (25,000 requests/month)
  4. Copy the API key

Step 2: Import Workflow to n8n

  1. Open your n8n instance
  2. Click the “+” button → Import from File or Import from URL
  3. Paste the workflow JSON
  4. Click Import

Step 3: Configure API Credentials

Set up Google Gemini API:

  1. Click on “:brain: Google Gemini AI Model” node
  2. Click “Credentials” dropdown
  3. Select “Create New Credential”
  4. Choose “Google PaLM API” (or Google Gemini)
  5. Paste your API key
  6. Click “Save”

Repeat for: “:1234: Text Embedding Generator” node (use the same credentials)

Set up OCR.space API:

  1. Click on “:magnifying_glass_tilted_left: Extract Text from Image (OCR)” node
  2. Find the “apikey” parameter under Headers
  3. Replace 7a2452e8b188957 with your OCR.space API key
  4. Click “Save”

Step 4: Activate Your Workflow

  1. Click the toggle switch at the top (should turn green)
  2. Your workflow is now ACTIVE and ready to receive requests!

:books: How to Use the Workflow

Option 1: Analyze Images (Main Flow)

Get your Webhook URL:

  1. Click on “:bullseye: Receive Product Image” node
  2. Copy the Production URL (looks like: https://your-n8n.com/webhook/...)

Send an image for analysis:

# Using curl
curl -X POST https://your-n8n.com/webhook/fcc2d240-... \
  -H "Content-Type: application/json" \
  -d '{
    "image": "BASE64_ENCODED_IMAGE_HERE",
    "filename": "product.jpg",
    "mimeType": "image/jpeg"
  }'

Or using Python:

import base64
import requests

# Read and encode image
with open("image.jpg", "rb") as f:
    encoded = base64.b64encode(f.read()).decode()

# Send to workflow
response = requests.post(
    "https://your-n8n.com/webhook/fcc2d240-...",
    json={
        "image": encoded,
        "filename": "image.jpg",
        "mimeType": "image/jpeg"
    }
)

print(response.json())

Option 2: Build Your Knowledge Base (RAG)

Upload training documents:

  1. Click on “:outbox_tray: Upload Knowledge Documents” node
  2. Click “Execute Node” → Copy the Form URL
  3. Open the URL in your browser
  4. Upload your documents (PDFs, DOCX, TXT, CSV, MD)
  5. Click Submit

Supported document types:

  • Scientific research papers (PDF)
  • Knowledge databases (CSV, TXT)
  • Reference guides (DOCX, MD)
  • Any text-based content relevant to your use case

The documents will be:

  • Automatically chunked into optimal sizes
  • Converted to vector embeddings
  • Stored in memory for fast retrieval
  • Used by the AI Agent during analysis

:control_knobs: Customization Options

Modify the AI Prompt

  1. Click on “:robot: AI Safety Analyzer” node
  2. Edit the “text” field to change analysis behavior
  3. Customize output format, analysis focus, or add new fields

Change OCR Settings

  1. Click on “:magnifying_glass_tilted_left: Extract Text from Image (OCR)” node
  2. Modify parameters:
    • language: Change from “eng” to other languages
    • OCREngine: Switch between engine 1 or 2
    • scale: Improve accuracy for small text

Adjust Output Format

  1. Click on “:sparkles: Format Results for Output” node
  2. Edit JavaScript to change how results are structured
  3. Add new fields or modify existing ones

:magnifying_glass_tilted_left: Understanding the Workflow Components

1. OCR (Optical Character Recognition)

Converts images to text using OCR.space API. Supports multiple languages and auto-detects text orientation.

2. RAG (Retrieval-Augmented Generation)

Enhances AI responses with your custom knowledge base. Documents are split, embedded, and retrieved semantically.

3. Vector Embeddings

Google Gemini converts text into numerical vectors for semantic similarity matching. Powers the RAG system.

4. AI Agent

The brain of the workflow. Makes intelligent decisions using the language model and RAG tools.

5. Webhooks

HTTP endpoints for receiving image data. Easy integration with any application or service.

6. JavaScript Code Nodes

Custom data processing: image formatting (input) and result structuring (output).

7. Query Data Tool (RAG)

Retrieves relevant information from your knowledge base during AI analysis. Connected to the AI Agent as a tool.


:hammer_and_wrench: Troubleshooting

“Authentication failed” error

  • Double-check your Google Gemini API key
  • Ensure API is enabled in Google Cloud Console
  • Verify billing is set up (required for Gemini API)

OCR returns empty results

  • Check image quality (minimum 300 DPI recommended)
  • Verify base64 encoding is correct
  • Try increasing the scale parameter to true

RAG not working

  • Make sure you’ve uploaded documents first
  • Check that embeddings are configured correctly
  • Verify the memory key matches: vector_store_key

Workflow not triggering

  • Ensure workflow is ACTIVE (toggle at top)
  • Check webhook URL is correct
  • Verify request format matches expected payload

:light_bulb: Best Practices

  1. Test with small images first (under 1MB for OCR free tier)
  2. Upload high-quality reference documents for better RAG accuracy
  3. Monitor API usage to stay within free tier limits
  4. Use clear, well-lit images for optimal OCR results
  5. Customize the AI prompt to match your specific use case

:glowing_star: Next Steps

Once your workflow is running:

  • Integrate with your app: Use the webhook URL in your frontend/backend
  • Expand your knowledge base: Upload more documents regularly
  • Monitor performance: Check execution logs in n8n
  • Scale up: Consider paid API tiers for higher usage
  • Customize analysis: Modify the AI prompt for your domain

:telephone_receiver: Support & Community


:page_facing_up: License

This workflow is free to use, modify, and share. No subscription required for n8n Community Edition (self-hosted).

Happy Automating! :tada:

2 Likes

thats a great worklow!!

1 Like