@Jon my problem statement is - i want to read the pdf files from a folder and extract the data and load the data to excel or database
file:///C:/Users/KAS823/Downloads/1St1-zMwWl3gaEKPLgPxlx2fbxpDE6ss8%20(4).html
It opens in a new tab
@Arudhra That sounds ok but we are here to provide support, guidence and examples not build a complete workflow, We can however put you in touch with one of our experts who do create workflows as part of the services they offer their customers.
It looks like you are now actually downloading the PDFs and are instead just saving the HTML page, Looking at your HTTP Request node that makes sense as you are using the URL for the page and are not actually downloading the documents.
I have very quickly put this example together which lists the files from your Google Drive, Downloads them as PDF files then sends them to Mindee.
The Output of each node is…
Here are our 2 files including the filenames and the IDs, the IDs are used to download the files in the next node.
Here you can see the ID from the previous node has been used in the ID field and on the right we have 2 binary items called data
with the correct file names and file types.
We then send that to Mindee which gives us the output below from the OCR process and we can then use that output like normal in following nodes.
This should get you most of the way there.
I am getting this error and i am using service account as facing difficult to setup auth account, Google Drive Error : Cannot read properties of undefined (reading 'pipe')
Sorry for the inconvenience, i have been trying to fix it but unable to do it
Hey @Arudhra,
Are you running on cloud or self hosting and which version of n8n are you running?
self hosted version
Which version? And are you using docker or npm now? If it is docker what is your config?
Yes it is docker i think version 3.7
@Jon
/opt/n8n-docker-caddy
version: “3.7”
services:
caddy:
image: caddy:latest
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- caddy_data:/data
- ${DATA_FOLDER}/caddy_config:/config
- ${DATA_FOLDER}/caddy_config/Caddyfile:/etc/caddy/Caddyfile
n8n:
image: docker.n8n.io/n8nio/n8n
restart: always
ports:
- 5678:5678
environment:
- N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
- N8N_PORT=5678
- N8N_PROTOCOL=https
- NODE_ENV=production
- WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}/
- GENERIC_TIMEZONE=${GENERIC_TIMEZONE}
volumes:
- n8n_data:/home/node/.n8n
- ${DATA_FOLDER}/local_files:/files
volumes:
caddy_data:
external: true
n8n_data:
external: true
Can you try setting N8N_DEFAULT_BINARY_DATA_MODE
to filesystem
and see if that helps
i tried it, i am getting output as HTML when i pass the google drive link
Can you share that output so I can see it and your workflow?
Hey @Arudhra,
Why are you using the http request node still? If you follow the example workflow I shared it won’t need it. If you do want to use the HTTP request node then you will need to correctly implement the Google Drive API which will take more work than using the node we provide.
- If i use the HTTP request i am getting out as HTML, which unable to read the PDF
- If i directly connect it through Gmail, I am getting error "Cannot read properties of undefined (reading ‘pipe’)
Hey @Arudhra,
You are getting html because you are loading a webpage with the http request node and not using the api which actually returns the files.
The best option is to use the node which will be easier to troubleshoot, did you set the environment option I mentioned previously to see if that helps?
Hi @Jon
please let me know which environment do i need to set
Hey @Arudhra,
It is the one above…
N8N_DEFAULT_BINARY_DATA_MODE
to filesystem
You mentioned after I posted it that you tried it but you were still getting the output as HTML which would suggest you may have set it and went back to your own workflow instead of the example.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.