Extract/Parse/Analyse PDF using OpenAI/ChatGPT vision API

Here is the use case I am want to setup:

I upload invoices (pdf format) to airtable. A workflow is triggered daily : It takes the document, parse the document and fill the missing data in airtable. It parses:

  • Supplier name
  • Invoice number
  • Amount
  • Currency
  • Date

How can I use n8n and OpenAI to perform such a use case?

Option A - Extract the text from the pdf

  1. Convert pdf to txt
  2. Use regular OpenAI message with JSON parser

Some pdf files are scans with no OCR, which mean it is basically a picture. Extracting the text is useless in some case

=> Not the correct solution

Option B - Convert pdf to image

  1. Convert pdf to image
  2. Use regular the node “Analyse image” to get the data as txt
  3. Use regular OpenAI message with JSON parser

I do not want the need to convert pdf to image because it requires external npm package.
I would prefer to use a one step approch for 2 and 3 with a JSON parser directly on the upload file request.

is it possible to have a JSON output for the “Analyse image” node?

=> Not the correct solution

Option C - Use assistant

What is the correct way to do it with assistants?

  1. Create an assistant
  2. Add a file
  3. Make a request
  4. Remove the file
  5. Remove the assistant

?

How to do it correctly?

Thank you for the help

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

I was about to post that just for invoices I recommend Rossum Elis, but I’m checking on their website and now they process not just invoices but many kinds of documents and the price’s gone stratospheric… Anyway I’m using it and it works veeeeeeery well

Hey @LucBerge

If some of your PDFs are scanned images then Option B is really the only viable option.

Good news is that it isn’t that difficult and you don’t have to start from scratch! Check out my multimodal vision templates which you can copy the first half to handle the pdf->image conversion and just change the LLM node to spit out the desired datapoints.

Essentially, both use StirlingPDF for PDF->Image conversion. There is a public API (which is the default in the template for demonstration purposes) but I use docker compose to setup a private instance of stirlingPDF on my end.

This is the declaration I use if this helps.

  stirlingpdf:
    image: frooodle/s-pdf:latest-ultra-lite
    ports:
      - "8080:8080"
    volumes:
      - .stirling/extraConfigs:/configs

Then in the HTTP request node of the template, I would change the URL from https://stirlingpdf.io to http://stirlingpdf:8080.

Hello @Jim_Le, thank you for your answer.

Does it mean that the current OpenAI UI handles pdf with a pdf to txt conversion and a pdf to image conversion ?

n8n supports pdf-to-text natively but not pdf-to-image, you would need the help of another tool or service. In the template, we delegate the conversion task to stirlingPDF which can be self-hosted.

The correct answer to this post is that OpenAI API does not handle .pdf files as the UI does. You need to convert the PDF to TXT (if numerical) or PDF to PNG (if image) first.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.