Semantic Document Processing with Embeddings and AI Agents in n8n
User Guide for n8n Workflow Setup
What This Workflow Does
This intelligent workflow combines multiple AI technologies to analyze images and extract meaningful insights:
- OCR (Optical Character Recognition): Extracts text from images
- RAG (Retrieval-Augmented Generation): Uses custom knowledge base for accurate analysis
- Vector Embeddings: Converts documents into searchable semantic data
- AI Agent: Intelligent decision-making powered by Google Gemini
- Webhooks: Receives image data via HTTP endpoints
- JavaScript Code: Custom data processing and formatting
- Query Data Tool: Retrieves relevant information from your knowledge base
Use Cases: Product analysis, document processing, label scanning, receipt parsing, ingredient verification, or any image-to-insight workflow you need!
Setup Guide
Step 1: Get Your API Keys
A) Google Gemini API (Required)
- Visit Google AI Studio
- Click “Get API Key” or “Create API Key”
- Copy your API key
- Keep it safe - you’ll need it for n8n
B) OCR.space API (Required)
- Go to OCR.space API Page
- Register for a free account
- Get your free API key (25,000 requests/month)
- Copy the API key
Step 2: Import Workflow to n8n
- Open your n8n instance
- Click the “+” button → Import from File or Import from URL
- Paste the workflow JSON
- Click Import
Step 3: Configure API Credentials
Set up Google Gemini API:
- Click on “
Google Gemini AI Model” node - Click “Credentials” dropdown
- Select “Create New Credential”
- Choose “Google PaLM API” (or Google Gemini)
- Paste your API key
- Click “Save”
Repeat for: “
Text Embedding Generator” node (use the same credentials)
Set up OCR.space API:
- Click on “
Extract Text from Image (OCR)” node - Find the “apikey” parameter under Headers
- Replace
7a2452e8b188957with your OCR.space API key - Click “Save”
Step 4: Activate Your Workflow
- Click the toggle switch at the top (should turn green)
- Your workflow is now ACTIVE and ready to receive requests!
How to Use the Workflow
Option 1: Analyze Images (Main Flow)
Get your Webhook URL:
- Click on “
Receive Product Image” node - Copy the Production URL (looks like:
https://your-n8n.com/webhook/...)
Send an image for analysis:
# Using curl
curl -X POST https://your-n8n.com/webhook/fcc2d240-... \
-H "Content-Type: application/json" \
-d '{
"image": "BASE64_ENCODED_IMAGE_HERE",
"filename": "product.jpg",
"mimeType": "image/jpeg"
}'
Or using Python:
import base64
import requests
# Read and encode image
with open("image.jpg", "rb") as f:
encoded = base64.b64encode(f.read()).decode()
# Send to workflow
response = requests.post(
"https://your-n8n.com/webhook/fcc2d240-...",
json={
"image": encoded,
"filename": "image.jpg",
"mimeType": "image/jpeg"
}
)
print(response.json())
Option 2: Build Your Knowledge Base (RAG)
Upload training documents:
- Click on “
Upload Knowledge Documents” node - Click “Execute Node” → Copy the Form URL
- Open the URL in your browser
- Upload your documents (PDFs, DOCX, TXT, CSV, MD)
- Click Submit
Supported document types:
- Scientific research papers (PDF)
- Knowledge databases (CSV, TXT)
- Reference guides (DOCX, MD)
- Any text-based content relevant to your use case
The documents will be:
- Automatically chunked into optimal sizes
- Converted to vector embeddings
- Stored in memory for fast retrieval
- Used by the AI Agent during analysis
Customization Options
Modify the AI Prompt
- Click on “
AI Safety Analyzer” node - Edit the “text” field to change analysis behavior
- Customize output format, analysis focus, or add new fields
Change OCR Settings
- Click on “
Extract Text from Image (OCR)” node - Modify parameters:
language: Change from “eng” to other languagesOCREngine: Switch between engine 1 or 2scale: Improve accuracy for small text
Adjust Output Format
- Click on “
Format Results for Output” node - Edit JavaScript to change how results are structured
- Add new fields or modify existing ones
Understanding the Workflow Components
1. OCR (Optical Character Recognition)
Converts images to text using OCR.space API. Supports multiple languages and auto-detects text orientation.
2. RAG (Retrieval-Augmented Generation)
Enhances AI responses with your custom knowledge base. Documents are split, embedded, and retrieved semantically.
3. Vector Embeddings
Google Gemini converts text into numerical vectors for semantic similarity matching. Powers the RAG system.
4. AI Agent
The brain of the workflow. Makes intelligent decisions using the language model and RAG tools.
5. Webhooks
HTTP endpoints for receiving image data. Easy integration with any application or service.
6. JavaScript Code Nodes
Custom data processing: image formatting (input) and result structuring (output).
7. Query Data Tool (RAG)
Retrieves relevant information from your knowledge base during AI analysis. Connected to the AI Agent as a tool.
Troubleshooting
“Authentication failed” error
- Double-check your Google Gemini API key
- Ensure API is enabled in Google Cloud Console
- Verify billing is set up (required for Gemini API)
OCR returns empty results
- Check image quality (minimum 300 DPI recommended)
- Verify base64 encoding is correct
- Try increasing the
scaleparameter totrue
RAG not working
- Make sure you’ve uploaded documents first
- Check that embeddings are configured correctly
- Verify the memory key matches:
vector_store_key
Workflow not triggering
- Ensure workflow is ACTIVE (toggle at top)
- Check webhook URL is correct
- Verify request format matches expected payload
Best Practices
- Test with small images first (under 1MB for OCR free tier)
- Upload high-quality reference documents for better RAG accuracy
- Monitor API usage to stay within free tier limits
- Use clear, well-lit images for optimal OCR results
- Customize the AI prompt to match your specific use case
Next Steps
Once your workflow is running:
- Integrate with your app: Use the webhook URL in your frontend/backend
- Expand your knowledge base: Upload more documents regularly
- Monitor performance: Check execution logs in n8n
- Scale up: Consider paid API tiers for higher usage
- Customize analysis: Modify the AI prompt for your domain
Support & Community
- n8n Documentation: docs.n8n.io
- n8n Community Forum: community.n8n.io
- OCR.space Docs: ocr.space/ocrapi
- Google Gemini Docs: ai.google.dev
License
This workflow is free to use, modify, and share. No subscription required for n8n Community Edition (self-hosted).
Happy Automating! ![]()