Hey everyone,
A few weeks ago I shared how I built an automation to help my friend catch duplicate invoices. That workflow saved him so much time that he came back with a new request: “Can you also sort my invoices by category? My tax lawyer needs them in separate folders and I’m doing it all by hand.”
His situation is pretty common – he receives invoices from doctors, restaurants, hotels, tradespeople, you name it. Every month he manually drags them into the right folders before handing everything off to his tax lawyer. Tedious, error-prone, exactly the kind of thing that should be automated.
Now, my team and I at easybits have been building a data extraction solution (easybits Extractor) – it’s designed to pull structured fields out of documents. But classification? That wasn’t really what we built it for. Still, I was curious, so I sat down and tested whether I could push it beyond extraction and into document classification territory.
Turns out it works perfectly.
The trick is simple: instead of defining extraction fields like “invoice_number” or “total_amount,” you create a single field called document_class and give it a detailed classification prompt. You describe your categories, what signals to look for in each one, and the decision rules. The Extractor analyzes the full document and returns exactly one label – or null if it’s unsure.
How the workflow works:
The n8n workflow is four nodes:
-
Form Upload – User uploads a PDF, PNG, or JPEG through a hosted web form
-
Extract to Base64 – The binary file gets converted to a base64 string
-
Build Data URI – The MIME type is read from the upload and prepended to create a proper data URI
-
Send to easybits – The data URI is POSTed to the Extractor API, which returns the classification result
That’s it for the base workflow. From there you can extend it however you want – route files to different Google Drive folders based on the label, send a Slack message when something comes back as null, log everything to a spreadsheet, whatever fits your setup.
Setting up the pipeline in easybits Extractor:
-
Go to extractor.easybits.tech and create a new pipeline
-
Add one field to the mapping:
document_class -
In the field description, paste your classification prompt – this is where you define your categories and how the model should identify each one
-
The prompt tells the model to return exactly one category label (like
medical_invoice,restaurant_invoice,hotel_invoice) ornullif it can’t confidently classify the document
I’ve included a full example prompt as a sticky note inside the workflow so you can just copy it and adjust the categories to your own use case. The example covers three invoice types, but you can add or remove categories as needed.
The workflow JSON is attached below – just import it into n8n, swap in your own Pipeline ID and API Key, and you’re good to go.
Would love to hear if anyone has a similar classification use case or ideas for extending this. Happy to answer questions about the setup.
Best,
Felix
