Chunking Large PDFs with Page Number metadata?

Describe the problem/error/question

Apologies if this is a common or dumb question. I wasn’t able to find the answer in the search. I’m building a RAG pipeline to chunk large PDF documents (~200+ pages). Is there a way to extract page numbers as metadata for each chunk? I want to be able to score or ask specific questions based on the most relevant chunks, cite them, and include a page number in the response.

What is the error message (if any)?

No error.

Please share your workflow

n/a

Share the output returned by the last node

Information on your n8n setup

  • n8n version: [email protected]
  • Database (default: SQLite): Supabase
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app): n8n cloud
  • Operating system: Windows

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.