Local OCR in n8n with Ollama: How to extract text from scanned PDFs without external services?

Hi everyone,

I’m building a fully local document processing workflow in n8n and I’m running into an issue with scanned PDF files.

Setup

  • n8n (self-hosted)

  • Ollama running locally (used for AI processing)

  • Documents are stored internally and processed in a workflow

  • No external APIs or cloud services are allowed (everything must stay local)

Problem

We receive scanned documents as PDFs, but:

  • The built-in n8n nodes cannot properly read or extract text from them

  • The PDFs are image-based (not text-based)

  • Therefore, normal PDF extraction nodes return empty or unusable results

Goal

We want to:

  • Extract text from scanned PDFs (OCR)

  • Keep everything 100% local (no external OCR APIs like Google Vision, AWS Textract, etc.)

  • Feed the extracted text into downstream processing (and optionally into Ollama for summarization / structuring)

Questions

  1. What is the best way to implement local OCR inside an n8n workflow?

  2. Is there a recommended approach using tools like:

    • Tesseract OCR (local)

    • Docker-based OCR services

    • CLI tools integrated via Execute Command node

  3. Has anyone successfully combined n8n + local OCR + Ollama in a production-like workflow?

What I’m looking for

Ideally:

  • Example workflows or architecture ideas

  • Best practice for handling scanned PDFs in n8n

  • Fully local OCR pipeline that can be triggered inside a workflow

Any guidance or real-world examples would be greatly appreciated.

I am using tesseract ocr , hosting it in ec2 and exposing as external service , its very much runnable as local ocr service, its free to use,

hi @Leon22
I would use a fully local OCR service exposed over HTTP or running in a separate container and call it from n8n before sending the text to Ollama, because with scanned PDFs the text must first be generated by a dedicated OCR step outside the standard PDF extraction flow.

Hey! Since you already have Ollama running locally, easiest route is just convert your PDF pages to images with pdftoppm in an Execute Command node and then send those to an Ollama vision model like llama3.2-vision for the OCR, taht way you skip needing any extra services.

Hey, thanks a lot for your help — really appreciate it!

I have a few follow-up questions because I’m still pretty new to n8n and this setup:

From what I understand, I should use pdftoppm to convert PDF pages into images first. I’ve read that it’s part of the poppler-utils package — is that correct?

  • How exactly would I install that in my setup?

  • do I need to install poppler-utils with docker ?

  • If it’s inside Docker, would I extend the n8n image or run a separate container?

Also, once it’s installed:

  • How do I actually call pdftoppm from within n8n?

    • Would I use an Execute Command node for that?

    • Or is there a better approach (e.g. Code node, external service, etc.)?

Sorry if these are basic questions :sweat_smile: I’m still learning, but I’d really appreciate any guidance or example workflows!

Thanks again for your help

@Leon22 yeah poppler-utils is correct. If you’re using the official n8n docker image just extend it with a custom Dockerfile like FROM n8nio/n8n:latest then USER root and RUN apk add --no-cache poppler-utils then USER node, rebuild (e.g. docker build -t n8n-ocr .) and point your compose/run at the new tag. The n8n image is Alpine-based so it’s apk not apt.

To add to @achamm’s spot-on Docker instructions, to answer your second question: Yes, you will use the Execute Command node to run pdftoppm.

The tricky part for beginners is that CLI tools expect physical files on the disk, while n8n holds your PDF in memory as binary data. The standard pattern for this is:

  1. Use a Read/Write Files from Disk node to save your PDF binary to a temporary path like /tmp/input.pdf.

  2. Use the Execute Command node to run: pdftoppm -png /tmp/input.pdf /tmp/output

  3. Use another Read/Write Files from Disk node to read the generated /tmp/output-1.png back into n8n as binary data so you can send it to your Ollama node.

Just don’t forget to add a final Execute Command node to rm those temp files afterward, or your Docker container will eventually run out of space!

Great approach with pdftoppm + Ollama! One thing to add: if your PDFs are multi-page, you might want to loop through each generated image file and pass them to Ollama one by one, then merge the text outputs at the end. I’ve done similar pipelines in n8n using a Split In Batches node after the Execute Command step. Also worth noting: make sure your Ollama model (like llava or minicpm-v) is actually good at OCR - some vision models are better than others for dense text extraction.

Spot on about the multi-page handling! Throwing a massive PDF at a vision model all at once is a guaranteed way to hit context limits or crash the instance.

If anyone implements this approach using the Loop node (formerly Split In Batches), I highly recommend adding a short Wait node or configuring automatic retries on the Ollama request step. If n8n fires 20 heavy image processing requests at your local Ollama container simultaneously, the container can easily choke and drop requests, leaving you with missing pages in your final merged text. Excellent call on minicpm-v as well, it’s a beast for OCR!

Spot on! Batch-testing the models against the actual PDF artifacts is definitely the right move. I’ve noticed Llava can sometimes hallucinate on dense tables where minicpm-v stays a bit more strict, but it really does depend on the scan quality. Appreciate the shoutout!

For local PDF OCR in n8n, the most reliable approach I’ve found is using a vision-capable model in Ollama (like llava or llava-llama3) combined with converting PDF pages to images first using the Extract PDF node, then sending each page image to Ollama for text extraction.

The key is to set raw: true in the Ollama options to prevent the model from adding reasoning artifacts to the output. You then collect the extracted text across pages and concatenate.

This keeps everything local without needing Tesseract or external OCR services. Works well for structured documents, though accuracy drops on low-quality scans.

Thanks a lot for the detailed explanation — that’s actually exactly the approach I’d like to use :+1:

The only issue I’m running into is with scanned PDFs. When I pass them into the Extract from PDF node, it doesn’t return any text at all (the output is basically empty), which I assume is expected since there’s no embedded text layer.

Right now my workaround is:

  • convert the PDF pages into images (PNG)

  • then send those images to an Ollama OCR model (I’m using qwen2.5vl:7b)

That part actually works really well for me.

However, I’d prefer to handle the PDF → image conversion directly inside the n8n workflow, instead of doing it externally beforehand.

So my questions would be:

  1. Is there a recommended way in n8n to convert PDF pages to images (PNG/JPG) within the workflow?

  2. Or is there any way to make the Extract from PDF node handle scanned PDFs that I might be missing?

Appreciate any tips — would love to keep everything fully local and inside n8n if possible :folded_hands:

For the OCR step specifically, you can skip the pdftoppm + Docker extension setup and use the SealDoc node instead. It runs ocrmypdf + Tesseract internally on a self-hosted SealDoc instance, so nothing leaves your infrastructure.

Node config in n8n:

  • Resource: Job

  • Operation: Create

  • Enable: Run OCR (toggle on)

  • OCR Languages: eng (or eng+deu, nld+fra, etc.)

The node outputs the extracted text, which you then wire straight into your Ollama node for summarisation or structuring. SealDoc handles the image conversion and Tesseract pass so you don’t need Execute Command nodes or a custom Docker image.

Self-hosted install: app.sealdoc.eu

Disclosure: I built the SealDoc node.

Hey, that sounds interesting. However, as I can see, it becomes paid once you reach a certain size. Also, I can’t access the website because after entering my company information, I end up stuck in an endless loop.

Hey Leon, the endless loop was a real bug. It hit a few people today and we pushed a fix just now. Hard-refresh or clear site data for app.sealdoc.eu if it still shows the old page.

On pricing: the free tier covers 50 documents/month with full OCR and text extraction, which should be enough to evaluate whether it fits your workflow. Paid plans kick in if you need higher volume or retention beyond 24h.

Let me know if you run into anything else.