Hello, n8n community!
I’m trying to build a workflow to automate data extraction from tables within PNG files. I already have a working flow that handles DOCX (using Pandoc) and PDF (using Poppler), but I’m stuck on the image OCR step.
The Goal
The objective is to take a PNG image containing a complex table (with multi-line descriptions) and convert it into a structured JSON, where each object in the array represents a complete row from the table.
Example of the desired JSON:
[
{
"Item": "1",
"Descrição": "BUCHA DE FIXACAO HE 308...",
"CATMAT": "601358",
"Unidade": "UNI",
"Quant.": "10",
"Valor Unit.": "15,00",
"Valor Total": "150,00"
},
{ ...next item... }
]
(Note: Column names are in Portuguese)
Current Workflow Architecture
After a lot of debugging, we’ve settled on an architecture that executes without pathing or environment errors:
- Trigger (
On changes...
): Detects a new.png
file in an input folder and provides itspath
. - IF Node: Confirms if the file extension is
.png
. - Execute Command (“Run OCR”): This node runs a batch file (
run_ocr.bat
).
- Command:
"C:\ocr_service\run_ocr.bat" "{{$json.path}}"
- The
run_ocr.bat
file, in turn, executes a Python script (app.py
) located in a dedicated virtual environment. This script uses thepytesseract
library to perform the OCR. - Expected Output: The
stdout
from this node should be a single text string containing the structured JSON of the table.
- Code Node (“Parse JSON”): Takes the
stdout
from the previous node and usesJSON.parse()
to convert the text string into a usable JSON object for n8n. - Final Steps:
Convert to File
andWrite Binary File
nodes to save the result to disk.
The Current Problem and Roadblock
The workflow executes without “file not found” or permission errors. The Execute Command
node successfully runs the Python script.
However, the final OUTPUT
from the Code
node shows a data
field with an empty array (data: []
). This proves that the problem is not with the n8n architecture, but with the programming logic inside the Python script (app.py
) .
The Python script is successfully running Tesseract and extracting the words from the image, but it’s failing at the most complex task: reconstructing the table geometry . The current logic is not robust enough to correctly group the words into lines and assign them to the proper columns, especially with multi-line cells.
The Ask
I need help creating or fixing a Python script (app.py
) that can:
- Receive an image file path as a command-line argument.
- Use
pytesseract
(image_to_data
) to extract all words and their coordinates. - Contain a robust logic to analyze these coordinates and reconstruct the table structure, correctly handling multi-line cells and slight misalignments from the OCR.
print
the final result as a single, structured JSON text string so that n8n can capture it in thestdout
.
Thank you immensely for any help or guidance the community can provide!