Passing an image through OCR with instructions

Hi there

I am currently trying to build an OCR engine that will also take instructions, the use case being I want to send photos of business documents (e.g., invoices) alongside a set of instructions on what to do with that document once processed.

An example would be sending a picture of an invoice with instructions such as “Set a reminder in asana to pay this, include the data in our google sheet tracker”.

Currently the OCR bit works well and I can send the documents through an iPhone shortcut but I am struggling to find a way to add instructions to either the original photo or the processed document.

Any help would be appreciated.

Billel

You have an interesting idea there. I asked Claude your exact question, and this is what it came back with. Hopefully it’s helpful:

This is an interesting workflow challenge in n8n. You need a way to associate instructions with each document going through your OCR pipeline. Let me suggest a few approaches you could implement:

Option 1: Metadata in Filename or Headers

You could encode instructions directly in the filename or as metadata when sending the document from your iPhone shortcut.

For example, your shortcut could name files like: invoice_SET-REMINDER-ASANA_ADD-TO-SHEET.jpg. Then in n8n, you can parse this filename to extract instructions.

Option 2: JSON Wrapper

Have your iPhone shortcut send a JSON object containing both the image data and instructions:

{
  "document": "base64-encoded-image-data",
  "instructions": "Set a reminder in asana to pay this, include the data in our google sheet tracker"
}

Option 3: Separate But Linked Messages

Send two separate items through your workflow - the document first, followed by a text message with instructions, using a unique ID to link them.

Implementation Approach

The most flexible approach would be Option 2. Here’s how you could set it up:

  1. Modify your iPhone shortcut to:
  • Get the image
  • Ask for instructions text
  • Create a JSON payload with both
  • Send to your n8n webhook
  1. In n8n, your workflow would:
  • Parse the incoming JSON
  • Extract and process the image with your OCR
  • Parse the instructions using a text analysis node
  • Execute the appropriate actions based on identified instructions
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.