I have a PDF invoice and when i try to read the data from the PDF the output JSON is coming as single object
What is the error message (if any)?
"text":
""\n\nInvoice\nPayment is due within 30 days from date of invoice. Late payment is subject to fees of 5% per month.\nThanks for choosing DEMO - Sliced Invoices | [email protected]\nPage 1/1\nFrom:\nDEMO - Sliced Invoices\nSuite 5A-1204\n123 Somewhere Street\nYour City AZ 12345\[email protected]\nInvoice NumberINV-3337\nOrder Number12345\nInvoice DateJanuary 25, 2016\nDue DateJanuary 31, 2016\nTotal Due$93.50\nTo:\nTest Business\n123 Somewhere St\nMelbourne, VIC 3000\[email protected]\nHrs/QtyServiceRate/PriceAdjustSub Total\n1.00\nWeb Design\nThis is a sample description...\n$85.000.00%$85.00\nSub Total$85.00\nTax$8.50\nTotal$93.50\nANZ Bank\nACC # 1234 1234\nBSB # 4321 432\nPaid"",
I am sorry you’re having trouble. This is the expected behaviour I am afraid, n8n’s Read PDF node would not automatically parse tables or other content of your PDF file, it’d just extract the raw text.
Since parsing PDF invoices can be quite a challenge you might want to consider using a designated 3rd party service focusing on this task. Mindee would be a service integrated in n8n for example and it can parse invoices like the example one you appear to be using here:
I have very little experience with invoice parsing I am afraid. There are a couple of free PDF parsers out there, but you would need to test how well each one works for you.
Perhaps you want to consider moving to a newer version of n8n? If you are using Windows, you could for example easily spin up a local n8n instance using docker with the command shared over here.
Hi @MutedJam, i have installed in one of our server, how do i know where i am using the latest Mindee node. Because i tried after installing i am getting the same output as above
Hi @Arudhra, to find out which node version you’re using you want to select your node on the n8n canvas, then copy it using Ctrl+C and finally paste the data you have copied into a text editor using Ctrl+V.
You should then see a line saying typeVersion which I have highlighted below:
"typeVersion": 2 suggests you are using an earlier version of the Mindee node. The current version should say "typeVersion": 3.
If you’re using an older node version you can simply delete the old node from your canvas and add a node instead. A newly added node should always use the latest node version.
Thanks @MutedJam i was able to get the data, if i want to insert the data from Mindee to data base can i do it directly or do i need to use any converter
Hi @Arudhra, this should work fine in most cases, but the exact approach depends on your database of course. What exactly are you struggling when processing the Mindee response?
Hi @Arudhra - can you try accessing only one PDF file at a time in the HTTP request node? I’ve also not used Mindee, but have you renamed the binary file to not be data? If you’ve changed the name, you’ll need to update this.
Are your invoices consistent in structure, or potentially totally different? I have a setup that uses the read pdf and some custom code block to retrieve the info I want. Works quite well for a free solution.
Otherwise I can recommend Eden AI as they offer a whole host of different ocr services by amazon, google, Microsoft etc.
Hi @Arudhra That would make sense, as the nodes following would not be expecting an array, but a single item. You can split out the binary files like this:
I’m not too sure if you’re getting multiple PDFs or a zip file, but you’d need to decompress the zip file (the above workflow shows you how to do this) before you place the code node.
The code node in particular would be what you’d need to add after your HTTP request node
If this doesn’t work for you, can you provide and example of the format of the data you’re working with?