Difficulty Converting Image Table (PNG) to Structured JSON with Tesseract and Python

Hello, n8n community!

I’m trying to build a workflow to automate data extraction from tables within PNG files. I already have a working flow that handles DOCX (using Pandoc) and PDF (using Poppler), but I’m stuck on the image OCR step.

The Goal

The objective is to take a PNG image containing a complex table (with multi-line descriptions) and convert it into a structured JSON, where each object in the array represents a complete row from the table.

Example of the desired JSON:

[
  {
    "Item": "1",
    "Descrição": "BUCHA DE FIXACAO HE 308...",
    "CATMAT": "601358",
    "Unidade": "UNI",
    "Quant.": "10",
    "Valor Unit.": "15,00",
    "Valor Total": "150,00"
  },
  { ...next item... }
]

(Note: Column names are in Portuguese)

Current Workflow Architecture

After a lot of debugging, we’ve settled on an architecture that executes without pathing or environment errors:

  1. Trigger (On changes... ): Detects a new .png file in an input folder and provides its path .
  2. IF Node: Confirms if the file extension is .png .
  3. Execute Command (“Run OCR”): This node runs a batch file (run_ocr.bat ).
  • Command: "C:\ocr_service\run_ocr.bat" "{{$json.path}}"
  • The run_ocr.bat file, in turn, executes a Python script (app.py ) located in a dedicated virtual environment. This script uses the pytesseract library to perform the OCR.
  • Expected Output: The stdout from this node should be a single text string containing the structured JSON of the table.
  1. Code Node (“Parse JSON”): Takes the stdout from the previous node and uses JSON.parse() to convert the text string into a usable JSON object for n8n.
  2. Final Steps: Convert to File and Write Binary File nodes to save the result to disk.

The Current Problem and Roadblock

The workflow executes without “file not found” or permission errors. The Execute Command node successfully runs the Python script.

However, the final OUTPUT from the Code node shows a data field with an empty array (data: [] ). This proves that the problem is not with the n8n architecture, but with the programming logic inside the Python script (app.py ) .

The Python script is successfully running Tesseract and extracting the words from the image, but it’s failing at the most complex task: reconstructing the table geometry . The current logic is not robust enough to correctly group the words into lines and assign them to the proper columns, especially with multi-line cells.

The Ask

I need help creating or fixing a Python script (app.py ) that can:

  1. Receive an image file path as a command-line argument.
  2. Use pytesseract (image_to_data ) to extract all words and their coordinates.
  3. Contain a robust logic to analyze these coordinates and reconstruct the table structure, correctly handling multi-line cells and slight misalignments from the OCR.
  4. print the final result as a single, structured JSON text string so that n8n can capture it in the stdout .

Thank you immensely for any help or guidance the community can provide!