Simplest way to split long PDF in smaller chunks

Hi!
I’m working with long PDFs to extract text/scans into a data base but right now my problematic is that the PDFs are just too long.

What would be the ideal workflow to split a PDF (from my Drive) into 5 pages PDFs to then extract the data from each and merge this data together in my SupaBase?

Thank you!

for free, you can use google docs, but its not the acutal intended use case, for something easy you can look at PDF.co

Interesting, but not sure what you mean. What would be the n8n workflow to split a pdf into chunks with google docs?

Hi @TeenTyrant, great use case — splitting long PDFs into chunks is a smart strategy for OCR or structured text extraction!

:jigsaw: Here’s a simple step-by-step workflow in n8n to split a PDF from Google Drive into 5-page chunks, extract the content, and store it in Supabase:


:brick: Ideal Workflow Structure

  1. Google Drive Node
  • Download the PDF file from your Drive.
  1. PDF Split (via external API or Code node)
  • Use a Code node with pdf-lib or call an external API (like PDF.co, PDF4me, or Cloudmersive) to split the PDF every 5 pages.
  1. Loop over each PDF chunk
  • Use SplitInBatches node to loop through each chunked PDF.
  1. PDF Extract Text
  • Use the PDF Extract node (or an external OCR if needed) to extract content from each chunk.
  1. Merge Text
  • Use a Merge node (in Append mode) or a custom Function node to concatenate extracted data.
  1. Insert into Supabase
  • Use the Supabase node to insert the final result into your database.

:bulb: Bonus Tip

If the splitting step is a blocker, you can:

  • Use pdf-lib in a Code node (if you’re self-hosted and comfortable installing custom libraries)
  • Or, for low-code users: PDF.co has a node and built-in PDF splitter with a free tier and works well with long docs

Would you like a ready-to-import example for splitting + OCR + Supabase insert?

Let me know your preferred text extraction method (OCR or raw PDF text), and whether you’re okay using external APIs or only native nodes.

Homie if you gonna paste AI, at least dont make it obvious

are u just copy and pasting from chatgpt lol I thought from the last community stream on youtube they said they dont like this? @bartv

You could try splitting the content.

for example, name each section something like Split 1, Split 2, etc. If you’re using a vector store, this kind of structure can really help it pick things up more accurately.

Working with PDFs, you can split them into chunks first. I’m pretty sure there’s either:

you could call an external service,

or even run a bit of Python code in a Code node or external script to handle the splitting.

this may help abit further https://www.youtube.com/watch?v=VBw5PEV-zKw&pp=ygUQc3BsaXQgcGRmIGluIG44bg%3D%3D

Also to note,

the GUI editor when running in test mode handles around 20mb I think I heard okay, but anything over u may see browser issues (test with small data), and when running production it should run correctly with much larger data etc, as it runs more backend etc :slight_smile:

Is it not valid to write the answer and format it with chatgpt? My native language is Spanish, in my opinion I think it helps me to clarify the idea of ​​the answer, and this is not simply copy and paste, I know what I’m saying and doing, I do not know what case they are trying to compare me with, but if language limitations prevent us from sharing what we know how to do, I would understand that it is not the place or sense of community.

And this profile is not a bot or anything like that, this is my LinkedIn: https://www.linkedin.com/in/eatorres510/

Your doing it an awful lot I’ve noticed over the last few days. Up to you, but maybe try format less gpt like ahaa :slight_smile: it’s not my call but they could ban you,

📣 We Need Your Help! (Answer & Earn) - #3 by jimvh.

"We will! However, we debated if we should let it answer questions on the forum and we decided against it as that would most likely quickly harm the community dynamic (it would basically turn the forum into a public chatbot, and remove all incentives for humans to interact with each other).

Instead, we’re exploring a chatbot that tries to help you before you post, quickly solving the common issues, while still leaving the interesting conversations on the forum itself.

Does that make sense to you?"

There is also a built-in n8n node for ConvertAPI, which offers over 500 API conversion tools.

Here’s a workflow you can use to split a PDF file:

Don’t know why it’s not appearing correctly but it’s working

I find this interesting because many of the problems here are repetitive, and I’ve noticed that we don’t bother checking to see if they’re resolved before asking.

I’ve also seen that with the official documentation, most people don’t read the documentation on how to configure the nodes.

If you’d allow me, I could help with something like this, develop it, and contribute.

ohh nice :slight_smile: not used convert api yet :slight_smile: Hope ure days been going well bro :slight_smile:

it’s up to you, but I notice it makes it hard to read what actually needs doing. my suggestion would be use to format ofcourse, my english is bad (grammer etc) but I’ve always used computer to aid me, grammerly, gpts, but if you look back what you posted. Not many reply because it’s just confusing.

I’m trying to helpout and learn to know more than @mohamed3nan :joy: he knows to much haha, but in all seriousness learn how to use N8N as a lot of the replies I ve seen posted are a bit off, this will really help the community more :slight_smile:

Looking for the simplest way to split a long PDF into smaller chunks? Try the Softaken PDF Splitter Software. It is designed for quick and hassle-free PDF separation.With this tool, you can split any PDF file by page number, page range, or even into single pages. It’s fully offline, works on all Windows versions, and doesn’t affect the original layout, text, or images. Perfect for both personal and professional use.There’s also a free demo version available so you can test its features before buying.

1 Like