Extract from File -> Extract from PDF

hey Sandeep,

I’d suggest something along the lines of what you already alluded to:

  1. Return text from PDF
  2. Submit same PDF to OCR API of sorts to return text from inside an image
  3. Merge the results e.g. by PDF name (so you get all the info together)

As for notifying you when an image is present, what does the “Extract from File” node return when there’s an image e.g. does it tell you there’s an image? or is it not returning anything? If it’s not returning anything, you might be able to add in a step that calls an AI node, uploads the PDF and the AI node returns info on what pages images exist