Extract/Parse/Analyse PDF using OpenAI/ChatGPT vision API

LucBerge · October 11, 2024, 3:08pm

Here is the use case I am want to setup:

I upload invoices (pdf format) to airtable. A workflow is triggered daily : It takes the document, parse the document and fill the missing data in airtable. It parses:

Supplier name
Invoice number
Amount
Currency
Date

How can I use n8n and OpenAI to perform such a use case?

Option A - Extract the text from the pdf

Convert pdf to txt
Use regular OpenAI message with JSON parser

Some pdf files are scans with no OCR, which mean it is basically a picture. Extracting the text is useless in some case

=> Not the correct solution

Option B - Convert pdf to image

Convert pdf to image
Use regular the node “Analyse image” to get the data as txt
Use regular OpenAI message with JSON parser

I do not want the need to convert pdf to image because it requires external npm package.
I would prefer to use a one step approch for 2 and 3 with a JSON parser directly on the upload file request.

is it possible to have a JSON output for the “Analyse image” node?

=> Not the correct solution

Option C - Use assistant

What is the correct way to do it with assistants?

Create an assistant
Add a file
Make a request
Remove the file
Remove the assistant

?

How to do it correctly?

Thank you for the help

n8n · October 11, 2024, 3:08pm

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

ValPro · October 11, 2024, 3:15pm

I was about to post that just for invoices I recommend Rossum Elis, but I’m checking on their website and now they process not just invoices but many kinds of documents and the price’s gone stratospheric… Anyway I’m using it and it works veeeeeeery well

Jim_Le · October 11, 2024, 5:23pm

Hey @LucBerge

If some of your PDFs are scanned images then Option B is really the only viable option.

Good news is that it isn’t that difficult and you don’t have to start from scratch! Check out my multimodal vision templates which you can copy the first half to handle the pdf->image conversion and just change the LLM node to spit out the desired datapoints.

Essentially, both use StirlingPDF for PDF->Image conversion. There is a public API (which is the default in the template for demonstration purposes) but I use docker compose to setup a private instance of stirlingPDF on my end.

This is the declaration I use if this helps.

  stirlingpdf:
    image: frooodle/s-pdf:latest-ultra-lite
    ports:
      - "8080:8080"
    volumes:
      - .stirling/extraConfigs:/configs

Then in the HTTP request node of the template, I would change the URL from https://stirlingpdf.io to http://stirlingpdf:8080.

LucBerge · October 11, 2024, 6:11pm

Hello @Jim_Le, thank you for your answer.

Does it mean that the current OpenAI UI handles pdf with a pdf to txt conversion and a pdf to image conversion ?

Jim_Le · October 12, 2024, 9:34am

n8n supports pdf-to-text natively but not pdf-to-image, you would need the help of another tool or service. In the template, we delegate the conversion task to stirlingPDF which can be self-hosted.

LucBerge · October 22, 2024, 10:03am

The correct answer to this post is that OpenAI API does not handle .pdf files as the UI does. You need to convert the PDF to TXT (if numerical) or PDF to PNG (if image) first.

system · October 29, 2024, 10:04am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.