Hi n8n Community, I need to build a workflow that converts 20+ different Terms & Conditions PDFs (stored in Google Drive) into JSON. I’d appreciate your expertise and suggestions on how to build it.
The JSON output should be machine-readable and also understandable by Ops/Legal Teams, who’ll review the quality and amend mistakes manually.
These documents have some complexities:
-
Multi-language content
-
Variable document structures (sometimes in side-by-side columns, sometimes in single columns)
-
The documents will often include lists and tables
-
Deep hierarchical sections with complex numbering (A1-2.1.1, etc.)
-
Legal text that must be preserved exactly - no paraphrasing allowed
-
Documents range from 20-100+ pages
How would you approach building this workflow?
Thanks in advance!