Automated n8n workflow for analyzing, renaming, and AI-assisted processing of 10,000 OCR-scanned PDF documents

Chrystian_Nowak · June 14, 2025, 5:54am

Hello dear n8n community,

I’m reaching out for support in building a comprehensive workflow using n8n to process around 10,000 locally stored PDF documents. All of these files have been scanned and OCR-processed using NAPS2. The goal is to automatically open each of these files and extract key information from the text, including: date, category (e.g., invoice, notice, court ruling), authority or institution, sender, recipient, reference number or case ID, subject or title, the name of the responsible person, and the number of pages.

The primary task, first and foremost, is to automatically rename each PDF file using the extracted information, following this filename pattern:
[Date]_[Category]_[Authority or Sender]_[Subject or CaseID]_[PageCount].pdf

This automated renaming is the main goal. However, in order to save future steps and avoid redundant processing, it would be ideal if the final result could be achieved right at the beginning. Therefore, I’d also like to generate a structured Excel table listing all extracted data per file, along with a .txt file for each PDF that summarizes the contents. These text files would later be used for AI-based processing (e.g. content checking, reconstruction, or classification via GPT).

In addition, I plan to connect a second AI agent that, based on the extracted authority or institution, automatically searches and adds contact information: email address, phone number, fax number, website, street, postal code, the responsible director or department head, as well as the superior and subordinate authorities. These details will feed into a structured contact directory for future reference.

If one or more required fields in a document cannot be identified or clearly assigned, the file should be automatically moved to a separate “Miscellaneous” folder, marked with an indication of which fields are missing. A follow-up workflow or AI bot should later be able to handle these cases and fill in the missing information.

So my main questions to you are:
Does a similar workflow already exist? Can someone help me with building it? And if not — would anyone be interested in creating this workflow together with me?

I deeply appreciate any tips, assistance, or collaborative offers. This task is part of a larger project in which n8n is intended to serve as the central automation platform, integrating structure, efficiency, and AI from the ground up.

Warm regards,

DanielS · June 15, 2025, 8:32am

Hi, I am trying to achieve a similar workflow where I am trying to develop a home document manager. Ultimately, what this workflow does is analyse correspondence received by reviewing a scanned PDF and then renames it accordingly and saves it in the appropriate folder within google drive. The AI also determines if any action is required and emails me to notify. It can also determine if the action is a task done by a certain date and provide a link in the email to allow to add the task to Tasker or my calendar. I have most of this working, however my stumbling block is recognising the contents of the PDF, which being a scan of the corresponence as an image in a PDF rapper, I get an error message saying that there is no binary data.

I would be happy to help you with your workflow but also you or somebody in this forum may be good enough to help me with my issue.

Chrystian_Nowak · June 17, 2025, 4:08pm

Hi,

thank you very much for your kind response and for sharing your workflow with me – it sounds very well thought out and aligns closely with the vision I have for my own system. Your setup with automatic renaming, categorization, saving to Google Drive, and even detecting action items with calendar integration is exactly the kind of smart document processing I’m aiming for. Really impressive how far you’ve already come!

I’m currently working on implementing everything with n8n as the central automation platform. All of my documents have been scanned and OCR-processed using NAPS2. The trigger that fires when a new file is added to the folder already works on my side. However, the extraction of text content from the PDFs remains a major obstacle. Although tools like ChatGPT claim to support OCR-processed PDFs, in practice the AI often fails to extract the embedded text – especially when the PDF layout or encoding is more complex. Technically, it should work, but something still seems to be blocking the process.

Over the next few days, I’ll be double-checking my Docker folder mappings and file permissions to make sure n8n can access everything correctly. Meanwhile, I’m continuing to scan thousands of documents – and I hope to be presented with a working solution by the end of the month – whether via n8n, OpenAI, or with help from someone in the community. I truly believe that this process is important, technically feasible, and should be relatively straightforward – even if, at the moment, it raises more questions than it answers.

If you’d be willing to share your workflow (or even just parts of it), I’d be very grateful. It might help us better understand the issues around text recognition and figure out what’s going wrong. Of course, I’ll also share any new insights I discover myself or receive from others right here in the forum.

Thanks again for your openness. I truly believe this kind of automation has huge potential to bring clarity, structure, and intelligent processing into otherwise chaotic document workflows – and that, with a bit more collaboration, we’re very close to achieving something practical and powerful.

Warm regards,
Chrystian

Z_JOLIE · July 21, 2025, 8:54am

I am also looking for an answer to this question, the problem I am facing is that I can access the folder where the pdf is stored locally, but when sending the file to the ocr model it always says that there is a problem with my request, I don’t know if you solved this problem yet!

DanielS · August 15, 2025, 9:42pm

Hi, have you been successful with this workflow you have been working on?

ronald_hrsub · September 1, 2025, 2:12pm

we are very interested in a solutions. please let me know if anybody has this solution in house

Topic		Replies	Views
N8n Dockers > Send local pdf-invoices to OpenAI > Extract data to Google Sheets > Rename local files - Is this even possible? Questions	10	459	June 1, 2025
Integrate PDF-App.net with n8n for Powerful Document Automation Feature Requests	0	150	January 23, 2025
How to Build an n8n Workflow to Automatically Send Personalized PDFs to Different Recipients Questions	19	839	August 2, 2025
Local OCR in n8n with Ollama: How to extract text from scanned PDFs without external services? Questions credentials , core , data-transformation , deployment , node , code , ai	22	1313	June 4, 2026
Home management system Questions	9	239	July 21, 2025

Automated n8n workflow for analyzing, renaming, and AI-assisted processing of 10,000 OCR-scanned PDF documents

Related topics