How to parse a pdf page by page?

Hey there ! I’m looking to create a large database of labeled slides from all the presentations we do in my company.

For this purpose, I have PDF files of 50 to 200 pages that I need to parse to get one line in my database for each slide and then run a specifil labeling workflow for each slide.

What node / tool would you use to parse the pdf ?

Thanks !

Information on your n8n setup

  • n8n version: 1.67.1
  • Database (default: SQLite): Default
  • n8n EXECUTIONS_PROCESS setting (default: own, main): default
  • Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
  • Operating system: Mac OS Sequia 15.1

Welcome to the community @Jeremy_Foucray !

Tip for sharing information

Pasting your n8n workflow


Ensure to copy your n8n workflow and paste it in the code block, that is in between the pairs of triple backticks, which also could be achieved by clicking </> (preformatted text) in the editor and pasting in your workflow.

```
<your workflow>
```

That implies to any JSON output you would like to share with us.

Make sure that you have removed any sensitive information from your workflow and include dummy or pinned data with it!


Use Extract from File node.

1 Like

Thank you very much !

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.