I am currently trying to build an OCR engine that will also take instructions, the use case being I want to send photos of business documents (e.g., invoices) alongside a set of instructions on what to do with that document once processed.
An example would be sending a picture of an invoice with instructions such as “Set a reminder in asana to pay this, include the data in our google sheet tracker”.
Currently the OCR bit works well and I can send the documents through an iPhone shortcut but I am struggling to find a way to add instructions to either the original photo or the processed document.
You have an interesting idea there. I asked Claude your exact question, and this is what it came back with. Hopefully it’s helpful:
This is an interesting workflow challenge in n8n. You need a way to associate instructions with each document going through your OCR pipeline. Let me suggest a few approaches you could implement:
Option 1: Metadata in Filename or Headers
You could encode instructions directly in the filename or as metadata when sending the document from your iPhone shortcut.
For example, your shortcut could name files like: invoice_SET-REMINDER-ASANA_ADD-TO-SHEET.jpg. Then in n8n, you can parse this filename to extract instructions.
Option 2: JSON Wrapper
Have your iPhone shortcut send a JSON object containing both the image data and instructions:
{
"document": "base64-encoded-image-data",
"instructions": "Set a reminder in asana to pay this, include the data in our google sheet tracker"
}
Option 3: Separate But Linked Messages
Send two separate items through your workflow - the document first, followed by a text message with instructions, using a unique ID to link them.
Implementation Approach
The most flexible approach would be Option 2. Here’s how you could set it up:
Modify your iPhone shortcut to:
Get the image
Ask for instructions text
Create a JSON payload with both
Send to your n8n webhook
In n8n, your workflow would:
Parse the incoming JSON
Extract and process the image with your OCR
Parse the instructions using a text analysis node
Execute the appropriate actions based on identified instructions