Read Data from Invoice and load into a database or excel

Describe the problem/error/question

I have a PDF invoice and when i try to read the data from the PDF the output JSON is coming as single object

What is the error message (if any)?

""\n\nInvoice\nPayment is due within 30 days from date of invoice. Late payment is subject to fees of 5% per month.\nThanks for choosing DEMO - Sliced Invoices | [email protected]\nPage 1/1\nFrom:\nDEMO - Sliced Invoices\nSuite 5A-1204\n123 Somewhere Street\nYour City AZ 12345\[email protected]\nInvoice NumberINV-3337\nOrder Number12345\nInvoice DateJanuary 25, 2016\nDue DateJanuary 31, 2016\nTotal Due$93.50\nTo:\nTest Business\n123 Somewhere St\nMelbourne, VIC 3000\[email protected]\nHrs/QtyServiceRate/PriceAdjustSub Total\n1.00\nWeb Design\nThis is a sample description...\n$85.000.00%$85.00\nSub Total$85.00\nTax$8.50\nTotal$93.50\nANZ Bank\nACC # 1234 1234\nBSB # 4321 432\nPaid"",

Please share your workflow

Share the output returned by the last node

Information on your n8n setup

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Hi @Arudhra, welcome to the community!

I am sorry you’re having trouble. This is the expected behaviour I am afraid, n8n’s Read PDF node would not automatically parse tables or other content of your PDF file, it’d just extract the raw text.

Since parsing PDF invoices can be quite a challenge you might want to consider using a designated 3rd party service focusing on this task. Mindee would be a service integrated in n8n for example and it can parse invoices like the example one you appear to be using here:

It would provide results like this:

Perhaps this works better for you?

1 Like

Hi MutedJam

Thanks for the response

Any other alternative options. using any free of cost tools

I have very little experience with invoice parsing I am afraid. There are a couple of free PDF parsers out there, but you would need to test how well each one works for you.

Hi @MutedJam
I have replicated the same workflow you created, but unable to replicate the output what you got,
Can you please help me here

unable to replicate the output

Which result are you seeing instead? And which version of n8n exactly are you running?

i Have downloaded the desktop version on n8n, i got this result

Hi @Arudhra, n8n’s Desktop version is deprecated. The last version is based on [email protected], meaning it would not have received this update for the Mindee node switching to a newer API version.

Perhaps you want to consider moving to a newer version of n8n? If you are using Windows, you could for example easily spin up a local n8n instance using docker with the command shared over here.

Hi @MutedJam, i have installed in one of our server, how do i know where i am using the latest Mindee node. Because i tried after installing i am getting the same output as above

Hi @Arudhra, to find out which node version you’re using you want to select your node on the n8n canvas, then copy it using Ctrl+C and finally paste the data you have copied into a text editor using Ctrl+V.

You should then see a line saying typeVersion which I have highlighted below:

"typeVersion": 2 suggests you are using an earlier version of the Mindee node. The current version should say "typeVersion": 3.

If you’re using an older node version you can simply delete the old node from your canvas and add a node instead. A newly added node should always use the latest node version.

Thanks @MutedJam i was able to get the data, if i want to insert the data from Mindee to data base can i do it directly or do i need to use any converter

Hi @Arudhra, this should work fine in most cases, but the exact approach depends on your database of course. What exactly are you struggling when processing the Mindee response?

I am currently trying to read multiple PDF files and load it into excel or data base

Looks like you have it working to me. Your Spreadsheet File node would build an Excel file with the data coming from Mindee:

From opening the file in a spreadsheet application it looks fine:

I think I might have misunderstood something. What’s missing for you here?

I have two PDF files in the Http Request, and the output that i get from mindee is - no binary file exist

Hi @Arudhra - can you try accessing only one PDF file at a time in the HTTP request node? I’ve also not used Mindee, but have you renamed the binary file to not be data? If you’ve changed the name, you’ll need to update this. :+1:

If i use single PDF link it is working, But My problem is when there are multiple files in folder to read those pdf files

Are your invoices consistent in structure, or potentially totally different? I have a setup that uses the read pdf and some custom code block to retrieve the info I want. Works quite well for a free solution.

Otherwise I can recommend Eden AI as they offer a whole host of different ocr services by amazon, google, Microsoft etc.

@Jelle_de_Rijke they are consistent in format, please let me know the solution

Hi @Arudhra :wave: That would make sense, as the nodes following would not be expecting an array, but a single item. You can split out the binary files like this:

I’m not too sure if you’re getting multiple PDFs or a zip file, but you’d need to decompress the zip file (the above workflow shows you how to do this) before you place the code node.

The code node in particular would be what you’d need to add after your HTTP request node :+1:

If this doesn’t work for you, can you provide and example of the format of the data you’re working with?