PDF to JSON workflow for complex legal documents

alem · September 28, 2025, 4:21pm

Hi n8n Community, I need to build a workflow that converts 20+ different Terms & Conditions PDFs (stored in Google Drive) into JSON. I’d appreciate your expertise and suggestions on how to build it.

The JSON output should be machine-readable and also understandable by Ops/Legal Teams, who’ll review the quality and amend mistakes manually.

These documents have some complexities:

Multi-language content
Variable document structures (sometimes in side-by-side columns, sometimes in single columns)
The documents will often include lists and tables
Deep hierarchical sections with complex numbering (A1-2.1.1, etc.)
Legal text that must be preserved exactly - no paraphrasing allowed
Documents range from 20-100+ pages

How would you approach building this workflow?

Thanks in advance!