Extract from File -> Extract from PDF

Simon_Coton · October 17, 2024, 3:56pm

hey Sandeep,

I’d suggest something along the lines of what you already alluded to:

Return text from PDF
Submit same PDF to OCR API of sorts to return text from inside an image
Merge the results e.g. by PDF name (so you get all the info together)

As for notifying you when an image is present, what does the “Extract from File” node return when there’s an image e.g. does it tell you there’s an image? or is it not returning anything? If it’s not returning anything, you might be able to add in a step that calls an AI node, uploads the PDF and the AI node returns info on what pages images exist