Best way to extract text (transaction number) from JPG – OCR or GPT?

Hi everyone,

I’m looking for the fastest and most reliable way to extract a transaction number from a JPG file I receive regularly. Ideally, I want to pass the image directly to OpenAI (ChatGPT) for analysis – but as far as I know, that’s not supported natively in n8n.

Here’s what I’ve tried:

  • Used HTTP Request + ConvertAPI to turn the JPG into a PDF

  • Then tried Extract from File node – but since it’s an image-only PDF, no text gets extracted

  • Also tried PDF.co and PDFKit, same issue – no real text, just image

  • Finally tried Extract from File node directly on PDF output – again, no OCR results

My goal:

→ Extract a single transaction number from the image (OCR)
→ Optionally feed it to OpenAI for further parsing/logic

My questions:

  • What’s the fastest approach for this in n8n?

  • Is there a reliable OCR node or service you recommend?

  • Any way to get OpenAI or any LLM to process image input directly via the OpenAI node?

Thanks in advance for any tips!

Hi @Luca2

Image analysis is natively supported in the OpenAI node and in Google Gemini node.
Have you tried working with them?

Personally, I prefer Gemini for AI OCR, there are also many other AI models that are good at OCR, like Mistral, even if it’s not supported directly, you can still call the HTTP endpoint..

I think this is the fastest option if you’re okay with relying only on AI and no other services..

2 Likes

Hi @Luca2 ,

Have a look at the link below for an example on using Mistral. Ive had good results using this. They offer an OCR which is really good, but if the image contains more data, you can use their vision API and provide a prompt to explain how to read the value you need

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.