Read Data from Invoice and load into a database or excel

@Jon my problem statement is - i want to read the pdf files from a folder and extract the data and load the data to excel or database

file:///C:/Users/KAS823/Downloads/1St1-zMwWl3gaEKPLgPxlx2fbxpDE6ss8%20(4).html

It opens in a new tab

@Arudhra That sounds ok but we are here to provide support, guidence and examples not build a complete workflow, We can however put you in touch with one of our experts who do create workflows as part of the services they offer their customers.

It looks like you are now actually downloading the PDFs and are instead just saving the HTML page, Looking at your HTTP Request node that makes sense as you are using the URL for the page and are not actually downloading the documents.

I have very quickly put this example together which lists the files from your Google Drive, Downloads them as PDF files then sends them to Mindee.

The Output of each node is…

Here are our 2 files including the filenames and the IDs, the IDs are used to download the files in the next node.

Here you can see the ID from the previous node has been used in the ID field and on the right we have 2 binary items called data with the correct file names and file types.

We then send that to Mindee which gives us the output below from the OCR process and we can then use that output like normal in following nodes.

This should get you most of the way there.

1 Like

I am getting this error and i am using service account as facing difficult to setup auth account, Google Drive Error : Cannot read properties of undefined (reading 'pipe')


Sorry for the inconvenience, i have been trying to fix it but unable to do it

Hey @Arudhra,

Are you running on cloud or self hosting and which version of n8n are you running?

self hosted version

Which version? And are you using docker or npm now? If it is docker what is your config?

Yes it is docker i think version 3.7

@Jon
/opt/n8n-docker-caddy

version: “3.7”

services:

caddy:

image: caddy:latest

restart: unless-stopped

ports:

  - "80:80"

  - "443:443"

volumes:

  - caddy_data:/data

  - ${DATA_FOLDER}/caddy_config:/config

  - ${DATA_FOLDER}/caddy_config/Caddyfile:/etc/caddy/Caddyfile

n8n:

image: docker.n8n.io/n8nio/n8n

restart: always

ports:

  - 5678:5678

environment:

  - N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}

  - N8N_PORT=5678

  - N8N_PROTOCOL=https

  - NODE_ENV=production

  - WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}/

  - GENERIC_TIMEZONE=${GENERIC_TIMEZONE}

volumes:

  - n8n_data:/home/node/.n8n

  - ${DATA_FOLDER}/local_files:/files

volumes:

caddy_data:

external: true

n8n_data:

external: true

Can you try setting N8N_DEFAULT_BINARY_DATA_MODE to filesystem and see if that helps

i tried it, i am getting output as HTML when i pass the google drive link

Can you share that output so I can see it and your workflow?

output

Hey @Arudhra,

Why are you using the http request node still? If you follow the example workflow I shared it won’t need it. If you do want to use the HTTP request node then you will need to correctly implement the Google Drive API which will take more work than using the node we provide.

  1. If i use the HTTP request i am getting out as HTML, which unable to read the PDF
  2. If i directly connect it through Gmail, I am getting error "Cannot read properties of undefined (reading ‘pipe’)

Hey @Arudhra,

You are getting html because you are loading a webpage with the http request node and not using the api which actually returns the files.

The best option is to use the node which will be easier to troubleshoot, did you set the environment option I mentioned previously to see if that helps?

Hi @Jon
please let me know which environment do i need to set

Hey @Arudhra,

It is the one above…

N8N_DEFAULT_BINARY_DATA_MODE to filesystem

You mentioned after I posted it that you tried it but you were still getting the output as HTML which would suggest you may have set it and went back to your own workflow instead of the example.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.