Split Information in a PDF File

Hi everyone,

I need help to automatically separate the questions that are contained in a PDF. I have a PDF file with multiple questions, and my goal is to split its content so that each question becomes an individual item in an n8n workflow (for example, to create simulated tests).

Here’s what I’ve tried so far:

  1. I converted the PDF to Markdown.
  2. I attempted to create a JavaScript function in n8n to split the Markdown content into individual questions using a regular expression that looks for the marker “QUESTÃO” followed by a number.
  3. However, the function isn’t working as expected, and I haven’t been able to successfully extract the questions.

Could someone please help me build an n8n workflow that can:

  • Read the PDF (or the converted Markdown),
  • Identify each question based on the marker (e.g., “QUESTÃO” followed by a number),
  • And separate these questions into individual items for further processing?

Any guidance, tips, or example workflows would be greatly appreciated!

Thank you in advance.

Hello and welcome to the community!

Please share your workflow. There is a chance a correction to your implementation will make it working.

Also pin some data, e.g. Markdown output.

Here is how: How to get help on n8n community forum fast · GitHub