Hi everyone,
I’m building a workflow to process PDFs from Google Drive, but I’m stuck at the OCR step. I would appreciate any help!
My Goal:
My workflow should:
- List PDF files from a Google Drive folder.
- Download each file.
- Send the file to OCR.space for text extraction.
The Problem:
The workflow fails right after the download step. I have confirmed the Google Drive (Download)
node successfully outputs an item with a binary property named “data”. However, both the HTTP Request
node and the n8n-nodes-capivision
community node fail to process this binary data.
Attempt 1: Using the HTTP Request Node
Node JSON:
{
“nodes”: [
{
“parameters”: {
“method”: “POST”,
“url”: “https://api.ocr.space/parse/image”,
“sendQuery”: true,
“queryParameters”: {
“parameters”: [
{
“name”: “filetype”,
“value”: “PDF”
}
]
},
“sendHeaders”: true,
“headerParameters”: {
“parameters”: [
{
“name”: “apikey”,
“value”: “***”
}
]
},
“sendBody”: true,
“contentType”: “binaryData”,
“inputDataFieldName”: “data”,
“options”: {}
},
“type”: “n8n-nodes-base.httpRequest”,
“typeVersion”: 4.2,
“position”: [
-60,
-360
],
“id”: “7cf798e0-2a3e-4eeb-a11e-1928d1b20f41”,
“name”: “HTTP Request”
}
],
“connections”: {},
“pinData”: {},
“meta”: {
“templateCredsSetupCompleted”: true,
“instanceId”: “2674d96ba7a1ba5577f605ffbda0202ce2f08b27a32be40d2414b0a0c2a4a342”
}
}
Error Message:
(在这里,粘贴对应的E216: Unable to recognize the file type错误信息)
Attempt 2: Using the n8n-nodes-capivision
Community Node
Node JSON:
{
“nodes”: [
{
“parameters”: {
“engine”: “ocrspace”
},
“type”: “n8n-nodes-capivision.capivisionOcr”,
“typeVersion”: 1,
“position”: [
660,
-240
],
“id”: “4c16bb9c-f52c-4be9-bcbc-f0c197faa7eb”,
“name”: “CAPIVISION OCR”,
“credentials”: {
“ocrSpaceApi”: {
“id”: “8e7n2LGpYxKBikqL”,
“name”: “OCR.space account”
}
}
}
],
“connections”: {},
“pinData”: {},
“meta”: {
“templateCredsSetupCompleted”: true,
“instanceId”: “2674d96ba7a1ba5577f605ffbda0202ce2f08b27a32be40d2414b0a0c2a4a342”
}
}
Error Message:
[
{
“OCRExitCode”: 99,
“IsErroredOnProcessing”: true,
“ErrorMessage”: [
“Unable to recognize the file type”,
“E216:Unable to detect the file extension, or the file extension is incorrect, and no ‘file type’ provided in request. Please provide a file with a proper content type or extension, or provide a file type in the request to manually set the file extension.”
],
“ProcessingTimeInMilliseconds”: “0”
}
]
My Environment:
- n8n Version: 1.99.1
- Deployment: Docker on an OCI (Oracle Cloud Infrastructure) instance.
My Question:
Am I missing something obvious? Is there a known issue with how binary data from the Google Drive node is handled in this version? What is the correct way to pipe binary data from a download into another node for upload?
Thank you for your time and help!