Hello dear n8n community,
I’m reaching out for support in building a comprehensive workflow using n8n to process around 10,000 locally stored PDF documents. All of these files have been scanned and OCR-processed using NAPS2. The goal is to automatically open each of these files and extract key information from the text, including: date, category (e.g., invoice, notice, court ruling), authority or institution, sender, recipient, reference number or case ID, subject or title, the name of the responsible person, and the number of pages.
The primary task, first and foremost, is to automatically rename each PDF file using the extracted information, following this filename pattern:
[Date]_[Category]_[Authority or Sender]_[Subject or CaseID]_[PageCount].pdf
This automated renaming is the main goal. However, in order to save future steps and avoid redundant processing, it would be ideal if the final result could be achieved right at the beginning. Therefore, I’d also like to generate a structured Excel table listing all extracted data per file, along with a .txt
file for each PDF that summarizes the contents. These text files would later be used for AI-based processing (e.g. content checking, reconstruction, or classification via GPT).
In addition, I plan to connect a second AI agent that, based on the extracted authority or institution, automatically searches and adds contact information: email address, phone number, fax number, website, street, postal code, the responsible director or department head, as well as the superior and subordinate authorities. These details will feed into a structured contact directory for future reference.
If one or more required fields in a document cannot be identified or clearly assigned, the file should be automatically moved to a separate “Miscellaneous” folder, marked with an indication of which fields are missing. A follow-up workflow or AI bot should later be able to handle these cases and fill in the missing information.
So my main questions to you are:
Does a similar workflow already exist? Can someone help me with building it? And if not — would anyone be interested in creating this workflow together with me?
I deeply appreciate any tips, assistance, or collaborative offers. This task is part of a larger project in which n8n is intended to serve as the central automation platform, integrating structure, efficiency, and AI from the ground up.
Warm regards,