PDF to JSON workflow for complex legal documents

Hi n8n Community, I need to build a workflow that converts 20+ different Terms & Conditions PDFs (stored in Google Drive) into JSON. I’d appreciate your expertise and suggestions on how to build it.

The JSON output should be machine-readable and also understandable by Ops/Legal Teams, who’ll review the quality and amend mistakes manually.

These documents have some complexities:

  • Multi-language content

  • Variable document structures (sometimes in side-by-side columns, sometimes in single columns)

  • The documents will often include lists and tables

  • Deep hierarchical sections with complex numbering (A1-2.1.1, etc.)

  • Legal text that must be preserved exactly - no paraphrasing allowed

  • Documents range from 20-100+ pages

How would you approach building this workflow?

Thanks in advance!

1 Like