Can't Upload PDF w/ Image to Vector Store

,

Describe the problem/error/question

My current workflow uses an HTTP node to GET a PDF from an S3 URL. Once the document has been “downloaded” I can see it available in the binary for outputs and inputs on the next node.

The next node is a Vector Store (I’m experiencing the same issue with the built-in and Pinecone). I have no trouble configuring the vector store node, but the sub-node data loader is where I start to run into issues. No matter what I change about the text splitter, or the data loader, I cannot get these files to output into data (see image below)

A few helpful pieces of context about the PDF:

  • The PDF has no text on it. It’s just an image. I suspect this may be the issue, but I don’t know why. Maybe I can’t use images with vector stores (still new to this concept).
  • The PDF has sensitive data, so I cannot share it for reference.

Please share your workflow

Share the output returned by the last node

Information on your n8n setup

  • n8n version: [email protected]
  • Running n8n via (Docker, npm, n8n cloud, desktop app): n8n cloud

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Hi, you need to extract the text from the image with OCR and send it to Pincone. You can use an external tool that provides an API like OCR Space or you can try building your own tool with Google Cloud Vision for a lower cost

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.