Describe the problem/error/question
I am trying to parse a PDF document in a structured JSON output using LLM. This works well, but it only every parses the first paragraph of the document.
Example: I am trying to Parse: PDF Link into the following format:
{
“title”: “I. Compliance and reporting obligations”,
“section”: “1. General compliance expectation”,
“paragraph_number”: “2.”,
“text”: “Guidelines reflect EBA’s view on appropriate supervisory practices in the ESFS and application of Union law.”
}
In my promt I explicitly tell the LLM (I tried Gemini Pro as well as ChatGPT 4o) to create 20 JSON objects for the first 20 paragraphs. The parsing works, but it stops after outputting the first JSON. What do I have to to in order to create all 20 (or if it starts working all >130 in one go) JSON?
If i input this promt into chatGPT directly it works just fine.
What is the error message (if any)?
Please share your workflow
{
“nodes”: [
{
“parameters”: {
“operation”: “download”,
“fileId”: {
“__rl”: true,
“value”: “={{ $json.id }}”,
“mode”: “id”
},
“options”: {}
},
“id”: “4069217f-22e0-41b7-9b0b-86f3ab2674df”,
“name”: “Download PDF 1 from Google Drive”,
“type”: “n8n-nodes-base.googleDrive”,
“typeVersion”: 3,
“position”: [
180,
40
],
“credentials”: {
“googleDriveOAuth2Api”: {
“id”: “E0r3UzaAAWTYH1dR”,
“name”: “Google Drive account”
}
}
},
{
“parameters”: {
“resource”: “fileFolder”,
“searchMethod”: “query”,
“queryString”: “not name contains ‘outsourcing’”,
“filter”: {
“folderId”: {
“__rl”: true,
“value”: “1nFQIIShX8vUdUflNFNxMMs4ePePSJ08w”,
“mode”: “list”,
“cachedResultName”: “Regulatorikvergleich”,
“cachedResultUrl”: “https://drive.google.com/drive/folders/1nFQIIShX8vUdUflNFNxMMs4ePePSJ08w”
},
“whatToSearch”: “files”
},
“options”: {}
},
“type”: “n8n-nodes-base.googleDrive”,
“typeVersion”: 3,
“position”: [
-20,
40
],
“id”: “17ec3382-833f-43e1-a23e-202d512a03f7”,
“name”: “Search files and folders1”,
“credentials”: {
“googleDriveOAuth2Api”: {
“id”: “E0r3UzaAAWTYH1dR”,
“name”: “Google Drive account”
}
}
},
{
“parameters”: {},
“type”: “n8n-nodes-base.manualTrigger”,
“typeVersion”: 1,
“position”: [
-220,
40
],
“id”: “26b00624-75f5-4409-84d4-47ba985b05c2”,
“name”: “When clicking ‘Execute workflow’”
},
{
“parameters”: {
“promptType”: “define”,
“text”: “=Extract data from the attached pdf. Focus on identifying:\n\n\n\n1) Titles with Roman numerals (like ‘Title I - …’),\n\n2) Sections (like ‘1 Proportionality’) and\n\n3) Paragraph numbers (like ‘22.’, ‘23.’, etc.) with their\n\n4) full text content. Preserve the hierarchical structure. Create a JSON for every paragraph in the following structure:\n\n \n\n"title": \n\n"section":\n\n"paragraph_number"\n\n"text" \n\n\n\nAn example would be:\n\n{ \n\n"title": "I. Compliance and reporting obligations", "section": "1. General compliance expectation", \n\n"paragraph_number": "2.", \n\n"text": "Guidelines reflect EBA’s view on appropriate supervisory practices in the ESFS and application of Union law." \n\n}\n\n\n\nStart with the first 20 paragraphs and create 20 JSON objects”,
“hasOutputParser”: true,
“batching”: {}
},
“type”: “@n8n/n8n-nodes-langchain.chainLlm”,
“typeVersion”: 1.7,
“position”: [
600,
40
],
“id”: “8653a3eb-0a29-43df-8d6e-bc77598bedf4”,
“name”: “Basic LLM Chain”,
“retryOnFail”: false,
“onError”: “continueErrorOutput”
},
{
“parameters”: {
“modelName”: “models/gemini-2.5-pro”,
“options”: {}
},
“type”: “@n8n/n8n-nodes-langchain.lmChatGoogleGemini”,
“typeVersion”: 1,
“position”: [
600,
260
],
“id”: “ac3b3b45-dceb-4970-82ab-bf09ad06276b”,
“name”: “Google Gemini Chat Model”,
“credentials”: {
“googlePalmApi”: {
“id”: “t262Q1qfR4DsPIrr”,
“name”: “Google Gemini(PaLM) Api account”
}
}
},
{
“parameters”: {
“jsonSchemaExample”: “{ \n"title": "I. Compliance and reporting obligations", "section": "1. General compliance expectation", \n"paragraph_number": "2.", \n"text": "Guidelines reflect EBA’s view on appropriate supervisory practices in the ESFS and application of Union law." \n}”
},
“type”: “@n8n/n8n-nodes-langchain.outputParserStructured”,
“typeVersion”: 1.3,
“position”: [
760,
260
],
“id”: “60dbdbfa-429f-4464-9b64-794bb23d9384”,
“name”: “Structured Output Parser”
}
],
“connections”: {
“Download PDF 1 from Google Drive”: {
“main”: [
[
{
“node”: “Basic LLM Chain”,
“type”: “main”,
“index”: 0
}
]
]
},
“Search files and folders1”: {
“main”: [
[
{
“node”: “Download PDF 1 from Google Drive”,
“type”: “main”,
“index”: 0
}
]
]
},
“When clicking ‘Execute workflow’”: {
“main”: [
[
{
“node”: “Search files and folders1”,
“type”: “main”,
“index”: 0
}
]
]
},
“Basic LLM Chain”: {
“main”: [
]
},
“Google Gemini Chat Model”: {
“ai_languageModel”: [
[
{
“node”: “Basic LLM Chain”,
“type”: “ai_languageModel”,
“index”: 0
}
]
]
},
“Structured Output Parser”: {
“ai_outputParser”: [
[
{
“node”: “Basic LLM Chain”,
“type”: “ai_outputParser”,
“index”: 0
}
]
]
}
},
“pinData”: {},
“meta”: {
“templateCredsSetupCompleted”: true,
“instanceId”: “b3c05fcf9e901b5177ce1eb054ce11551b5ac2dcb5e1d188626664a6b4ccbbbc”
}
}
(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)
Share the output returned by the last node
Information on your n8n setup
- n8n version:
- Database (default: SQLite):
- n8n EXECUTIONS_PROCESS setting (default: own, main):
- Running n8n via (Docker, npm, n8n cloud, desktop app):
- Operating system: