I’m new to n8n and looking to automate data extraction from documents coming from different providers. The page structure is always the same for each provider, but the layout varies between them. Documents can be scanned PDFs or photos of varying quality.
I’ve seen several AI-based templates, but I’d love some advice on the best existing template to start with, if possible.
I started with this one and had pretty good luck. It uses LlamaParse to pull the content out of the pdf in structured (Markdown) format, and then uses a different LLM to analyze the content and match up fields to a specific target JSON document.
Nobody could say which would be “best” for your situation, but this is a good starting point.
If this answers your question enough to get you started, please mark this post as the solution.