Extract from PDF: Text extracted is minimal

aya · October 25, 2024, 12:54pm

I don’t have access to your original PDF so I’m not sure what your expected output would be - do you mean you’re expecting more text in the output?

I just tested with a pdf I had and the ‘Extract from PDF’ operation in the node is working fine for me so I don’t think there’s an issue with the node itself.

I also see that you’re getting an output from the node, so I don’t think you’re doing anything wrong with the configuration of the node but rather, an issue with the pdf itself. It could be that the pdf contains scanned images of text rather than actual text, then the extraction process might not work (you’ll need something like optical character recognition to extract the text in that case, which n8n doesn’t support natively) but it’s hard to say without looking at the pdf itself