Passport Submission

I am currently creating a Passport Submission Form, where my AI agent reads and extracts the passport data based on my system message prompt.

However, I would like to understand what should happen if a user uploads more than one image in a single submission. How should the workflow handle a single form submission that contains multiple images?

Specifically, I want to know how to design the workflow so the AI agent can process multiple uploaded images within one submission.

2 Likes

Great question — handling multiple images in a single form submission is a common pattern when building document processing workflows.

The key is to use a Split In Batches (or Split Out) node immediately after your form trigger to iterate over each image individually before passing to your AI agent. Here’s the pattern:

  1. Form Trigger receives the submission (array of uploaded images)
  2. Split Out node splits the images array into individual items — set the field path to whatever key holds the image array in your form payload
  3. HTTP Request / Read Binary File — download/prepare each image as binary
  4. AI Agent (with your system prompt) processes one image at a time — extracts the passport data
  5. Merge node after the loop — aggregate all extracted passport records back into one item

The reason you want to process one at a time (vs. sending all images to the agent at once): most vision models have context limits, and sending 5 passport images simultaneously often causes extraction errors or mixed-up data. Processing sequentially and then merging gives you cleaner, more reliable results.

If you want to preserve which extracted data belongs to which original file, you can use $item.json.filename or the index from the Split Out to tag each record before merging.

One more thing — if your form can receive both a front and back of a single passport as two separate images, you might want to pair them before extraction (use an Edit Fields node to merge both into one extraction request for that passport).

What form tool are you using for the submission? That’ll affect exactly how the binary data comes through.

1 Like

@yelinaung In handling multiple images uploaded in a single form, it is highly recommended to split these images and then process each image individually. Your flow should commence with a form trigger, where you should accept multiple images, followed by a split out node where these images are split into individual images. An AI Agent should then process each of these images individually, and finally, an Aggregate node should be used to bring back all this data into a single item.

This approach should provide you with a more accurate outcome, as each of these images is processed individually while keeping it related to a single original event. It should work irrespective of the number of images received.

1 Like

Hi @yelinaung What i would do is that just make the form accept multiple images also i would then split those images into different elements and then i would pass it down to AI so that it can process those images one by one item by item so that it always outputs based on information related to all those images uploaded.

1 Like

thank you for the information

1 Like

let me know if that works @yelinaung .

thank you for the information :ok_hand:

Multi-Image Passport OCR Workflow using AI Agents & SSH Task Runners

The goal is to take multiple uploaded images from a dashboard, process them individually, and return structured JSON to a backend API.

The Workflow Logic:

Webhook Trigger: Receives the initial payload from a custom dashboard.

Image Preparation (Code Node): Parses the incoming data to create a clean list of image references.

SSH File Download: Uses a Task Runner/SSH node to securely fetch the binary image files.

AI Agent (OpenAI): The core engine. It uses a vision-capable model to “read” the passport details. I have disabled Memory here because each upload is a fresh transaction and we want to avoid data “bleeding” between different customers.

Data Formatting: A Code node ensures the AI’s output matches the specific schema required by our backend.

Aggregate & Send: Merges the individual results into a single array and POSTs it back to the developer’s API.

My Questions

While testing this, I’ve run into an issue with consistency.

Sometimes the AI Agent provides the correct data, but other times it returns incorrect information. When I try the exact same image again, it might give me the correct result the second time.

I’m looking for advice on:

  • AI System Message: How can I make my prompt “stricter” to ensure it doesn’t hallucinate or miss fields?

  • OpenAI Settings: Should I be looking at specific parameters (like Temperature) to ensure the output is the same every time?

  • Code Node Logic: Is there a better way to “pre-process” the images or the response format to help the AI stay accurate?

I’d love to hear if anyone else has faced this “sometimes right, sometimes wrong” behavior with Vision models in n8n and how you solved it!

help to suggest for this workflow please :frowning:

Hi @yelinaung as far as i can this is purely based on system prompt , so to fix this you need a system prompt curated for your specific use case , consider using any system prompt generator that would fit into your needs and tell the AI agent examples and how you want it to behave/output using the output parser and then this should be resolved.

2 Likes

yes i just change or modify a bit and it seems okay now thank you.

I split the images with the code node , but i use the Aggregate node to bring back all data into a single item , thank you.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.