I try to build a finance analyzer workflow (for myself & practice). I already have trouble finding the right node to upload a pdf and extract it’s values. Here in the forum I read different ways to do it: Option A) PDF community node → what is a community node? And how reliable are they? Where do I find community nodes? Option 2: Use Mistral OCR. Option 3) Claude and giving it a tool. What is you opinion on this comparing these 3 ways? Is there another way? Thank you in advance, Best Jennifer
Community nodes are built by regular people, not n8n official. Find them in Settings → Community Nodes. They work but sometimes break with updates.
Mistral OCR is solid for scanned stuff. Good accuracy but pricey at volume.
Claude for extraction is overkill honestly. Better for analyzing after you have the data.
What I use:
Extract from File broke on me too many times with scanned PDFs. Switched to HTTP Request with a document API - way more reliable. Send PDF, get back clean JSON.
Key thing for finance: make sure tables don’t get flattened. You need line items matched with amounts, not scattered everywhere.
What kind of docs are you processing? Invoices, statements, receipts?
In my opinion, for any kind of document which has complex data structures (structured or non-structured), Mistral OCR has had the best results. Goolge Document AI is also very good but might require some custom training / setup.
Have a look at the below result from a previous question using Mistral’s Vision APIs to read data from a scanned invoice. This will also work on clean digital pdfs