I upload invoices (pdf format) to airtable. A workflow is triggered daily : It takes the document, parse the document and fill the missing data in airtable. It parses:
Supplier name
Invoice number
Amount
Currency
Date
How can I use n8n and OpenAI to perform such a use case?
Option A - Extract the text from the pdf
Convert pdf to txt
Use regular OpenAI message with JSON parser
Some pdf files are scans with no OCR, which mean it is basically a picture. Extracting the text is useless in some case
=> Not the correct solution
Option B - Convert pdf to image
Convert pdf to image
Use regular the node “Analyse image” to get the data as txt
Use regular OpenAI message with JSON parser
I do not want the need to convert pdf to image because it requires external npm package.
I would prefer to use a one step approch for 2 and 3 with a JSON parser directly on the upload file request.
is it possible to have a JSON output for the “Analyse image” node?
I was about to post that just for invoices I recommend Rossum Elis, but I’m checking on their website and now they process not just invoices but many kinds of documents and the price’s gone stratospheric… Anyway I’m using it and it works veeeeeeery well
If some of your PDFs are scanned images then Option B is really the only viable option.
Good news is that it isn’t that difficult and you don’t have to start from scratch! Check out my multimodal vision templates which you can copy the first half to handle the pdf->image conversion and just change the LLM node to spit out the desired datapoints.
Essentially, both use StirlingPDF for PDF->Image conversion. There is a public API (which is the default in the template for demonstration purposes) but I use docker compose to setup a private instance of stirlingPDF on my end.
n8n supports pdf-to-text natively but not pdf-to-image, you would need the help of another tool or service. In the template, we delegate the conversion task to stirlingPDF which can be self-hosted.
The correct answer to this post is that OpenAI API does not handle .pdf files as the UI does. You need to convert the PDF to TXT (if numerical) or PDF to PNG (if image) first.