Extract from File, tesseract node
The idea is:
To upgrade the version of the pdfjs-dist library used by the core “Extract from File” node from the current ~5.3.x to at least ^5.4.x to resolve a dependency conflict with the community Tesseract.js node and enable advanced PDF handling workflows.
My use case:
I am building automated document processing workflows that require Optical Character Recognition (OCR) on PDF files. The current optimal path is to first use the “Extract from File” node to get text from a PDF and then use the Tesseract.js node for OCR on image-based pages. However, the newer versions of the Tesseract node can handle PDFs directly, which would simplify this workflow into a single step. This is currently broken due to the library version mismatch, which throws an error: The API version “5.4.54” does not match the Worker version “5.3.31”.
I think it would be beneficial to add this because:
Resolves a Critical Conflict: It immediately fixes a runtime error that occurs when both the core and community nodes are used in the same environment, providing stability for users.
Unlocks Workflow Simplification: Once the versions are aligned, the Tesseract node can use the same, updated PDF.js library as the core. This allows users to point the Tesseract node directly at a PDF file, bypassing the need for a separate extraction step. This creates simpler, more robust, and more efficient OCR workflows.
Keeps Dependencies Current: Upgrading the library ensures n8n benefits from the latest bug fixes, security patches, and performance improvements in pdfjs-dist.
Any resources to support this?
The issue is documented by the pdfjs-dist library itself, which throws a version mismatch error if the main and worker scripts are from different major versions.
The Tesseract.js community node’s changelog (since v1.4.0) confirms the addition of PDF support and its dependency on a newer pdfjs-dist version.
Are you willing to work on this?
I am not a core contributor, but I am willing to provide any additional testing and information required to support this change.