I’m looking for the fastest and most reliable way to extract a transaction number from a JPG file I receive regularly. Ideally, I want to pass the image directly to OpenAI (ChatGPT) for analysis – but as far as I know, that’s not supported natively in n8n.
Here’s what I’ve tried:
Used HTTP Request + ConvertAPI to turn the JPG into a PDF
Then tried Extract from File node – but since it’s an image-only PDF, no text gets extracted
Also tried PDF.co and PDFKit, same issue – no real text, just image
Finally tried Extract from File node directly on PDF output – again, no OCR results
My goal:
→ Extract a single transaction number from the image (OCR)
→ Optionally feed it to OpenAI for further parsing/logic
My questions:
What’s the fastest approach for this in n8n?
Is there a reliable OCR node or service you recommend?
Any way to get OpenAI or any LLM to process image input directly via the OpenAI node?
Personally, I prefer Gemini for AI OCR, there are also many other AI models that are good at OCR, like Mistral, even if it’s not supported directly, you can still call the HTTP endpoint..
I think this is the fastest option if you’re okay with relying only on AI and no other services..
Have a look at the link below for an example on using Mistral. Ive had good results using this. They offer an OCR which is really good, but if the image contains more data, you can use their vision API and provide a prompt to explain how to read the value you need