Execute Command Node Failing With "No data found for item-index: '1'" Error, But Data Exists

E_B · July 7, 2024, 3:13pm

n8n version: Version 1.47.2
Database (default: SQLite): default
n8n EXECUTIONS_PROCESS setting (default: own, main): default
Running n8n via (Docker, npm, n8n cloud, desktop app): npm
Operating system: ubuntu

{
  "errorMessage": "No data found for item-index: \"1\" [item 1]",
  "errorDetails": {},
  "n8nDetails": {
    "itemIndex": 1,
    "runIndex": 0,
    "parameter": "command",
    "time": "07/07/2024, 16:43:08",
    "n8nVersion": "1.47.2 (Self Hosted)",
    "binaryDataMode": "default",
    "stackTrace": [
      "ExpressionError: No data found for item-index: \"1\"",
      "    at Object.get (/usr/local/lib/node_modules/n8n/node_modules/n8n-workflow/src/WorkflowDataProxy.ts:409:14)",
      "    at Proxy.eval (eval at getFunction (/usr/local/lib/node_modules/n8n/node_modules/@n8n/tournament/src/index.ts:30:16), <anonymous>:9:74)",
      "    at Proxy.eval (eval at getFunction (/usr/local/lib/node_modules/n8n/node_modules/@n8n/tournament/src/index.ts:30:16), <anonymous>:16:7)",
      "    at Tournament.execute (/usr/local/lib/node_modules/n8n/node_modules/@n8n/tournament/src/index.ts:42:13)",
      "    at evaluateExpression (/usr/local/lib/node_modules/n8n/node_modules/n8n-workflow/src/ExpressionEvaluatorProxy.ts:92:10)",
      "    at Expression.renderExpression (/usr/local/lib/node_modules/n8n/node_modules/n8n-workflow/src/Expression.ts:349:29)",
      "    at Expression.resolveSimpleParameterValue (/usr/local/lib/node_modules/n8n/node_modules/n8n-workflow/src/Expression.ts:322:28)",
      "    at Expression.getParameterValue (/usr/local/lib/node_modules/n8n/node_modules/n8n-workflow/src/Expression.ts:544:16)",
      "    at getNodeParameter (/usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/NodeExecuteFunctions.ts:2467:36)",
      "    at Object.getNodeParameter (/usr/local/lib/node_modules/n8n/node_modules/n8n-core/src/NodeExecuteFunctions.ts:3698:12)"
    ]
  }
}

Hello! I’m running into a frustrating “No data found for item-index: ‘1’” error in my n8n workflow, and I’m struggling to understand why it’s happening. Here’s my setup:

Workflow Goal:

The workflow is designed to:

Monitor a folder (/var/www/run/work/) for new ZIP archives containing PDFs.
Unzip each archive to a subfolder, renaming files to remove special characters.
Extract text, images, and markdown from the PDFs.
Optionally stitch multiple images per page into a single image.

Workflow Breakdown:

Local File Trigger: Triggers the workflow when a new .zip file is added to the /var/www/run/work/ directory.
Execute Command (Python script: hantera_zip_filer.py):
- Unzips the ZIP archive, handling special characters and streaming to minimize memory usage.
- Outputs a JSON object with details about the unzipped files and processing time.
Function node (jsonformating): Parses the stdout JSON from hantera_zip_filer.py and restructures it for further processing.
Execute Command (Python script: md_extrahera_bilder_fitz.py):
- Extracts text, images, and markdown from the PDFs within the unzipped folder.
- Outputs a JSON array containing results for each PDF processed.
Function node (jsonformating1): Parses the JSON output from md_extrahera_bilder_fitz.py, potentially creating multiple items.
Execute Command (Python script: sammanfoga_bilder.py):
- Stitches together multiple images on a single page, if necessary.
- Outputs a JSON array with results for each page processed, indicating whether images were stitched or not.
Function node (jsonformating2): Parses the JSON output from sammanfoga_bilder.py.
Execute Command ( move_zip2done): Moves the original ZIP archive to the /var/www/run/done/ folder.
Execute Command (move_folder2done): Moves the folder containing the extracted files to the /var/www/run/done/ folder.
clean up /work/ folder and remove the processed files

The Problem:

The stitch_img node (Execute Command running sammanfoga_bilder.py) is consistently throwing the error “No data found for item-index: ‘1’”, even though there is data available from the preceding “jsonformating” node, which outputs a single item.

Questions:

Could someone help me understand why this error is occurring? Is it possible that a node in the workflow is expecting more items than it’s receiving, even though the initial trigger (Local File Trigger) is correctly configured for a single item (file)?
How can I make this workflow more robust so that it handles cases where a node doesn’t produce output for a particular item or page? Should I use an “IF” node to check for empty data?

Any insights or suggestions would be highly appreciated. I’m still getting my bearings with n8n and would welcome any expert advice!

Please let me know if you need more details about my workflow or the Python scripts.

hantera_zip_fil.py

import os
import zipfile
import time
import sys
from unidecode import unidecode

def ersatt_aao_och_mellanslag(text):
    """Ersätt åäö (gemener och versaler) och mellanslag med _ i en textsträng."""
    text = unidecode(text).lower()
    text = text.replace(" ", "_")
    return text

def hantera_zip_fil(zip_fil):
    """Byt namn på och unzippa en ZIP-fil, strömma filinnehållet."""
    start_tid = time.time()  # Starta tidmätning

    rot, fil = os.path.split(zip_fil) # Hämta mappsökväg och filnamn

    mappnamn = ersatt_aao_och_mellanslag(os.path.splitext(fil)[0])
    utdata_mapp = os.path.join(rot, mappnamn)

    if not os.path.exists(utdata_mapp):
        os.makedirs(utdata_mapp)

    with zipfile.ZipFile(zip_fil, 'r') as zip_ref:
        for info in zip_ref.infolist():
            if "__MACOSX" in info.filename:
                continue

            if info.filename.endswith(".pdf"):
                try:
                    gammalt_filnamn = info.filename
                    nytt_filnamn = ersatt_aao_och_mellanslag(gammalt_filnamn)
                    # Strömma filinnehåll med 1MB bitar
                    with zip_ref.open(info, 'r') as infile, open(os.path.join(utdata_mapp, nytt_filnamn), 'wb') as outfile:
                        while True:
                            bit = infile.read(1048576)  # Läs 1MB åt gången
                            if not bit:
                                break
                            outfile.write(bit)
                except Exception as e:
                    print(f"Fel vid extrahering av fil {gammalt_filnamn}: {e}")

    print(f"Unzippat: {zip_fil} till {utdata_mapp}")
    
    # Byt namn på ZIP-filen *efter* uppackning
    nytt_namn = os.path.join(rot, ersatt_aao_och_mellanslag(fil))
    try:
        os.rename(zip_fil, nytt_namn)
        print(f"Bytte namn på: {zip_fil} till {nytt_namn}")
    except Exception as e:
        print(f"Fel vid namnbyte av {zip_fil}: {e}")

    slut_tid = time.time()  # Stoppa tidmätning
    print(f"Tiden det tog att packa upp filerna: {slut_tid - start_tid} sekunder")

if __name__ == "__main__":
    if len(sys.argv) > 1: 
        zip_fil = sys.argv[1]
        hantera_zip_fil(zip_fil)  # Bearbeta endast den angivna filen
    else:
        print("Fel: Ingen filsökväg angiven.")

md_extrahera_bilder_fitz.py

import os
import time
import sys
import pymupdf4llm
import fitz
import json

def ersatt_aao_och_mellanslag(text):
    """Ersätt åäö (gemener och versaler) och mellanslag med _ i en textsträng."""
    text = text.lower()
    ersattningar = {
        "å": "a",
        "ä": "a",
        "ö": "o",
        " ": "_"  # Ersätt mellanslag med _
    }
    for tecken, ersattning in ersattningar.items():
        text = text.replace(tecken, ersattning)
    return text

def extrahera_data(mapp):
    """Extrahera text och bilder från alla PDF-filer i en specifik mapp."""
    resultat = []
    start_tid = time.time()

    # Kontrollera om mappen existerar
    if os.path.exists(mapp):
        for filnamn in os.listdir(mapp):
            if filnamn.endswith(".pdf"):
                pdf_fil = os.path.join(mapp, filnamn)
                rot, fil = os.path.split(pdf_fil)
                dokumentnamn = ersatt_aao_och_mellanslag(os.path.splitext(fil)[0])

                # Tidmätning för Markdown-extraktion
                start_tid_markdown = time.time()
                try:
                    md_text = pymupdf4llm.to_markdown(pdf_fil, write_images=False) 
                    with open(os.path.join(rot, f"{dokumentnamn}.md"), "w", encoding="utf-8") as f:
                        f.write(md_text)
                    slut_tid_markdown = time.time() 

                    # Lägg till information i resultatlistan 
                    resultat.append({
                        "action": "extract_markdown",
                        "pdf_file": pdf_fil,
                        "markdown_file": os.path.join(rot, f"{dokumentnamn}.md"),
                        "duration": slut_tid_markdown - start_tid_markdown
                    }) 

                except Exception as e:
                    slut_tid_markdown = time.time()
                    resultat.append({
                        "action": "extract_markdown",
                        "pdf_file": pdf_fil,
                        "error": str(e),
                        "duration": slut_tid_markdown - start_tid_markdown
                    })

                # Tidmätning för bildextration
                start_tid_bilder = time.time() 
                try:
                    # Extrahera bilder med PyMuPDF (fitz)
                    doc = fitz.open(pdf_fil)  
                    for i, sida in enumerate(doc):
                        bildlista = sida.get_images(full=True)
                        for j, img in enumerate(bildlista):
                            xref = img[0]  # Bildens referensnummer
                            basbild = doc.extract_image(xref) 
                            bilddata = basbild["image"]
                            tillägg = basbild["ext"]

                            # Anpassa filnamnet efter behov
                            filnamn = f"{rot}/{dokumentnamn}_sida_{i+1}_bild_{j+1}.{tillägg}" 

                            if bilddata:
                                with open(filnamn, "wb") as f:
                                    f.write(bilddata)  

                    slut_tid_bilder = time.time()
                    resultat.append({
                        "action": "extract_images",
                        "pdf_file": pdf_fil,
                        "duration": slut_tid_bilder - start_tid_bilder
                    })

                except  Exception as e:
                    slut_tid_bilder = time.time()
                    resultat.append({
                        "action": "extract_images",
                        "pdf_file": pdf_fil,
                        "error": str(e),
                        "duration": slut_tid_bilder - start_tid_bilder 
                    })
    else:
        print(f"Fel: Mappen '{mapp}' hittades inte.")

    slut_tid = time.time() 
    resultat.append({
        "action": "total_duration",
        "duration": slut_tid - start_tid
    })

    # Skriv ut resultatlistan som JSON 
    print(json.dumps(resultat))

if __name__ == "__main__":
    if len(sys.argv) > 1:
        mapp_med_pdf_filer = sys.argv[1]  # Hämta mappsökvägen från argument
        extrahera_data(mapp_med_pdf_filer)
    else:
        print("Fel: Ingen mappsökväg angiven.")

sammanfoga_bilder.py

import os
import subprocess
import json
import sys

def sammanfoga_bilder_for_sida(bildmapp, sidnummer, max_bilder=3):
    """Sammanfogar bilder för en specifik sida med ImageMagick."""

    bildfiler = [
        os.path.join(bildmapp, f)
        for f in os.listdir(bildmapp)
        if f"_sida_{sidnummer}_bild_" in f and f.lower().endswith((".png", ".jpg", ".jpeg"))
    ]

    # Sortera bildfiler baserat på sidnummer och bildnummer
    bildfiler.sort(key=lambda f: (int(f.split("_sida_")[1].split("_bild_")[0]),
                                  int(f.split("_bild_")[1].split(".")[0]))) 

    if len(bildfiler) >= max_bilder:  # Sammanfoga om det finns tillräckligt med bilder
        try:
            utdatafil = os.path.join(bildmapp, f"sida_{sidnummer}_sammanfogad.png")
            subprocess.run(
                ["convert", *bildfiler, "-append", utdatafil],
                check=True
            )

            for fil in bildfiler:
                try:
                    os.remove(fil)
                except Exception as e:
                    print(f"Fel vid borttagning av fil {fil}: {e}")

            return {"success": True, "message": f"Bilder för sida {sidnummer} sammanfogade till {utdatafil}."}

        except subprocess.CalledProcessError as e:
            return {"success": False, "message": f"Fel vid sammanfogning: {e}"}
    else:
        return {"success": True, "message": f"Inga bilder att sammanfoga för sida {sidnummer}."}

def bearbeta_mapp(mapp, max_bilder):
    """Bearbetar en mapp för att sammanfoga bilder."""
    alla_resultat = []

    if os.path.basename(mapp) == "__MACOSX":
        return alla_resultat
    
    # Identifiera alla unika sidnummer från filnamnen i mappen
    sidnummer = set()
    for filnamn in os.listdir(mapp): 
        if "_sida_" in filnamn:
            try:
                # Extrahera sidnummer från filnamnet
                sidnummer_str = filnamn.split("_sida_")[1].split("_bild_")[0]
                sidnummer.add(int(sidnummer_str)) 
            except (IndexError, ValueError):
                print(f"Varning: Ogiltigt filnamn: {filnamn}")
                continue

    # Sammanfoga bilder för varje unikt sidnummer
    for s in sidnummer:
        resultat = sammanfoga_bilder_for_sida(mapp, s, max_bilder) # Skicka mapp direkt
        alla_resultat.append(resultat)

    print(f"alla_resultat: {alla_resultat}")
    return alla_resultat

if __name__ == "__main__":
    if len(sys.argv) > 1:
        mapp_att_bearbeta = sys.argv[1]
        max_bilder = 3  # Hårdkodat till 3 bilder
        alla_resultat = bearbeta_mapp(mapp_att_bearbeta, max_bilder)
        print(json.dumps(alla_resultat))
    else:
        print("Fel: Mappsökväg saknas.")

testfile https://file.io/8wRYPUBBrVZ9

n8n · July 7, 2024, 3:13pm

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

barn4k · July 7, 2024, 8:00pm

hello @E_B

Your stitch_img node has a relative reference to the node $node["jsonformating"] instead of the node $node["jsonformating1"] (or maybe it was intended). However, the node jsonformating1 may contain more than 1 item in the output. So in the case, when there were 2+ items, the flow will fail, as the first Code node jsonformating may contain only 1 item

E_B · July 7, 2024, 8:51pm

@barn4k

So i need to pass along the {{ $item(“0”).$node[“jsonformating”].json[“utdataMapp”] }} through all the steps?

I thought it was possible to just use the same data (workfolder/path) comming from {{ $item(“0”).$node[“jsonformating”].json[“utdataMapp”] }} in all the nodes that is using thath path but i must somehow pass that data along the execution chain trough the nodes as input/output?

barn4k · July 8, 2024, 8:26am

You are using the relative reference for the variable (<node>.json property). But in your case, you should use absolute reference. Actually, I don’t know why you want to execute the same CLI command /var/www/run/mmupdf/bin/python3 /var/www/run/sammanfoga_bilder.py "{{$node["jsonformating"].json["utdataMapp"]}}" multiple times (if not - then set the node to execute once), but to use the absolute reference, the syntax will be like that

/var/www/run/mmupdf/bin/python3 /var/www/run/sammanfoga_bilder.py "{{$('jsonformating').last().json.utdataMapp}}"

Nodes have different types of reference:

<node>.json.<property> - relative reference, will access the property of the desired node within the same item index. if the desired node or any node in the chain up to the desired one will have different number of items, you will receive that error sooner or later.
<node>.last().json.<property> - will access the last item of the desired node. Doesn’t care about the number of items wherever in the node chain
<node>.first().json.<property> - same as above, but access the first item of the desired node
<node>.all().<array_of_items> - access all items of the desired node.

E_B · July 8, 2024, 1:31pm

@barn4k I’m relatively new to n8n and am working on a workflow to process ZIP archives containing PDFs. I’m struggling to understand how batching and the “Execute Once” setting work with multiple “Execute Command” nodes.

My Workflow:

My workflow aims to:

Monitor a folder: The workflow is triggered when multiple new ZIP archives is added to the /var/www/run/work/ directory.
Extract and Process:
- Extract the archive to a folder named after the ZIP file.
- Extract text, images, and markdown from the PDFs within the extracted folder.
- Stitch together images from the same page if there are more than 2.
Move Processed Files: Move the processed ZIP archive and its corresponding folder to the /var/www/run/done/ directory.

Current Setup:

I have the following nodes in my workflow (full JSON attached):

Local File Trigger: Triggers the workflow on new ZIP files in /var/www/run/work/.
Wait Node: Added to allow time for large ZIP files to finish uploading.
Loop Over Items (Batch Node): Configured to process one item at a time.
Wait Node: Another wait node, potentially unnecessary.
Execute Command (hantera_zip_filer.py):
- Unzips the archive and renames files to remove special characters.
- Outputs a JSON object with details about the unzipped files.
Function Node (jsonformating): Parses the JSON output from the previous node.
Execute Command (ext_txt_img): Extracts text, images, and markdown from PDFs.
Execute Command (stitch_img): Stitches together images on a page if necessary.
Function nodes (jsonformating1 and jsonformating2): Parse the JSON output from the Python scripts.
Execute Command (move_zip2done): Moves the processed ZIP file to /var/www/run/done/.
Execute Command (move_folder2done): Moves the processed folder to /var/www/run/done/.
Several “Set” nodes (Edit Fields): Used to set variables based on data from other nodes.

Confusion and Questions:

Batching: I believe the “Loop Over Items” node with a batch size of 1 should process one ZIP file at a time. However, it seems like the “stitch_img” node is trying to execute for each individual image instead of once per folder. How can I ensure that “stitch_img” is run only once for each unzipped folder?
Execute Once: I’ve set executeOnce: true on the “stitch_img” node, but it still seems to be executing multiple times. Am I misunderstanding how this setting interacts with the batching process?
Wait Nodes: I’m unsure if the “Wait” nodes are actually necessary or if there’s a better way to ensure the ZIP files are fully uploaded before processing (i have flipped the setting in the trigger node).

I’d appreciate any help in clarifying how to correctly use batching and “Execute Once” to process one ZIP archive at a time with the scripts.

Thank you in advance for your help!

barn4k · July 9, 2024, 12:52pm

The Loop Node will process one item at a time, but that’s true only for items which it has as the input. If you have multiple items inside the Loop node - they will be processed normally (if the node has 2+ items - the next node will be executed for each item. With some exceptions)

It will be executed one time per Loop. But your workflow has some design issues, so the Loop wouldn’t work as intended.

Dunno what do you mean by “Uploading”. If you mean the last node with moving, then you don’t need wait nodes, as the CLI node waits when the command will exit and return 0 (that may be an issue if you have some interactive logic, as the node may hang the n8n. Like if you specify ping command without counter’s flag).

As I understand your workflow logic, you should have something like this

But I didn’t get the purpose for the jsonformatting2 Node, As I don’t see any references to it.