How can I build an invoice data extractor tool for free?

I want to build an invoice extractor for free , in which I can get desired details when I upload my invoice pdf, in a JSON format, I have tried many things like hard code approach, pre trained models on hugging face , docling , donut and layoutlmv3 too , but I am not getting accurate results , the structure and contents of my invoices are different.
If anybody can help me it would be great.

1 Like

You can always use llm apis, like open ai , calude or deepseek?

So, I just created PDF based n8n node which is completely free but only for now it can do some basic stuff for real.

Right now it was in development:

  • Split Pages, Merge PDF’s isn’t working for now.

And the Generate Invoice Is already working. If this is something you are looking at you are open to try it already.

By installing this as a community node “n8n-nodes-pdfbro”

And let me know what more features you need here…

And the codebase is opensource too[Open for pull requests]:

Hey! Variable invoice formats are a pain - pre-trained models and hardcoded stuff breaks the second you get a different layout.

What’s worked for me: skip the training/template approach entirely. Use an extraction API where you just define what fields you need (invoice number, date, items, totals) with JSON Schema, and it pulls those fields regardless of how the invoice looks.

Setup in n8n is just HTTP Request or there’s a community node. Send PDF + your schema, get back consistent JSON.

I use PDF Vector for this - handles scanned, digital, tables, weird layouts, all the annoying stuff. The difference is you’re not trying to match templates, you’re just telling it “find these fields” and it figures out the layout.

What invoice types are breaking for you most? Scanned ones? Multi-page? Tables?

I can highly recommend easybits for data extraction. It’s super easy to set up and can be implemented directly via API through an HTTP node. You can find a step-by-step integration guide here: Extract Structured Data from Documents in Minutes

If you need any help, feel free to reach out. I’ve also created a workflow that does exactly what you’re trying to achieve, but it uses Telegram as the interface, allowing you to upload PDF files or photos directly through the chat. You can find the workflow here: Automated expense tracking with Telegram, easybits & Google Sheets | n8n workflow template