Local OCR in n8n with Ollama: How to extract text from scanned PDFs without external services?

Here’s a simple comment reply:


You can use the Execute Command node in n8n to run Tesseract OCR locally. Just install Tesseract and Poppler on your n8n host, then chain it as: Read Binary File → Execute Command (OCR) → HTTP Request to Ollama. For multi-page PDFs, OCRmyPDF in a Docker sidecar is cleaner, and you can call it via HTTP Request node without touching the n8n host. Everything stays 100% local.

I solved this with a local Kreuzberg docker. You can have it in your docker compose file and then access it simply via the http request node. Kreuzberg docker has an API exposed. You can have additional configurations in the Kreuzberg config file. GitHub - kreuzberg-dev/kreuzberg: A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server. · GitHub

In general I found it a really nice pattern to deploy micro services in the docker network, which you then can just call with the https node (just use the name of the docker service as the baseURL). Happy to give you more information if you think this is a viable way for you.

@Leon22 please share your json

I have now solved it. I installed Paperless locally on one of our servers and upload the documents there via an HTTP node.

The files are then automatically processed through OCR.

After that, I retrieve the extracted text data again via another HTTP node and then automatically delete the document from the system.

This setup has been working reliably with all file types so far.

Thanks to everyone for the support and help!