N8N PDF form intake

Hello everyone, I would like to know how the PDF data is processed when uploaded via an n8n form? I am trying to create a workflow where the user uploads raw pdf data, then the agent gets the information and builds a new docs so the user can create a new guide. The idea is that the system ingests 4 or 5 pdfs, and the agent can pick and choose the right data to build the final “report”.

I am using OCR.PDF through an HTTP call to identify and read the PDF data but I keep getting errors like this: **

Not a valid base64 image. The accepted base64 image format is ‘data:<content_type>;base64,<base64_image_content>’. Where ‘content_type’ like ‘image/png’ or ‘image/jpg’ or ‘application/pdf’ or any other supported typ

The code from the PREPARE PDF NODE and output:

// Get the binary data from the form upload
const fieldName = ‘Upload PDF Document’;

console.log(‘Available JSON keys:’, Object.keys($json));
console.log(‘Available Binary keys:’, Object.keys($binary || {}));

// Get the binary data
let pdfBinary = null;

if ($binary && $binary[fieldName]) {
pdfBinary = $binary[fieldName];
} else if ($binary && Object.keys($binary).length > 0) {
const binaryKeys = Object.keys($binary);
pdfBinary = $binary[binaryKeys[0]];
}

if (!pdfBinary) {
throw new Error(No PDF file found. Available binary keys: ${Object.keys($binary || {}).join(', ')});
}

console.log(‘PDF Binary structure:’, Object.keys(pdfBinary));

// For OCR.space API, we need to read the actual file content and convert to base64
// n8n stores the file content internally, we need to access it properly
return {
json: {
country: $json[‘Country Name’],
company: $json[‘Company Name’],
email: $json[‘Email for Delivery’],
needsOCR: true,
fileInfo: {
fileName: pdfBinary.fileName,
fileSize: pdfBinary.fileSize,
mimeType: pdfBinary.mimeType,
id: pdfBinary.id
}
},
binary: {
pdfFile: {
data: pdfBinary, // Pass the binary object for n8n to handle
mimeType: pdfBinary.mimeType || ‘application/pdf’,
fileName: pdfBinary.fileName || ‘employment_terms.pdf’
}
}
};

Information on your n8n setup

  • n8n version: - 1.95.2, - Cloud

I recommend the following:

1: “Prepare PDF Data” Node (Full Code)
This node should detect the uploaded file, convert it to base64 and format it into the standard: data:<mime_type>;base64,

const fieldName = 'Upload PDF Document';
let pdfBinary = null;
if ($binary && $binary[fieldName]) { 
pdfBinary = $binary[fieldName];
} else if ($binary && Object.keys($binary).length > 0) { 
const binaryKeys = Object.keys($binary); 
pdfBinary = $binary[binaryKeys[0]];
}
if (!pdfBinary) { 
throw new Error(`No PDF file found. Available binary keys: ${Object.keys($binary || {}).join(', ')}`);
}
const base64Content = pdfBinary.data;
const base64Formatted = `data:${pdfBinary.mimeType || 'application/pdf'};base64,${base64Content}`;
return { 
json: { 
country: $json['Country Name'], 
company: $json['Company Name'], 
email: $json['Email for Delivery'], 
needsOCR: true, 
fileInfo: { 
fileName: pdfBinary.fileName, 
fileSize: pdfBinary.fileSize, 
mimeType: pdfBinary.mimeType, 
id: pdfBinary.id, 
}, 
base64Image: base64Formatted // This value will be used by the API 
}
};

2: “HTTP Request (OCR PDF)” Node
Configuration:
Method: POST

URL: https://api.ocr.space/parse/image
Authentication: Header Auth (if you have an API Key)
Header Auth Name: apikey
Header Auth Value: your OCR.Space API key
Send Body: true
Body Content Type: Form-Data

Body Parameters:
Key, Value
base64Image , ={{$json.base64Image}}
language, eng (or the required ISO code)
isOverlayRequired, false
scale, true
OCREngine, 2 (more precise than 1)

Official Documentation

Hi Erick,
Thanks for your help. Unfortunately, I get the same error message as before from the OCR PDF HTTP call. I will try with a simple text extraction as the PDFs are digitally generated.

Why dont you just use the built in node to read the content of the pdf?

However, if you want to stick with OCR.PDF, then simply convert the binary file to base64 like this:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.