Pdf ocr -> json

Hello, I would like to read a PDF and pack the content into a JSON file. Which node do I have to use for this?

Hi @mac338 :wave:

Just to make sure - since you mentioned OCR :sweat_smile: Are you looking for optical character recognition / does your PDF have pictures of text? If that’s what you’re looking for, n8n can’t do this directly, but AWS Textract can (and does have a node within n8n to use). Mindee can also do this for some documents.

If I’m off base, can you provide an example of what you’re looking for instead? :bowing_man:

If your PDF contains actual text instead of images of text like I suspected, the Read PDF node will do the job :slight_smile:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.