The Problem: A Structured Output Parser
node attached to a LLM Chain
node does NOT include the entire JSON Schema in the prompt sent to the LLM.
The JSONSchema
(exactly as entered in the Structured Output Parser node)
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "array",
"items": { "$ref": "#/$defs/invoice" },
"$defs": {
"address": {
"type": "object",
"properties": {
"address1": {
"type": "string"
},
"address2": {
"type": "string"
},
"city": {
"type": "string"
},
"state": {
"type": "string"
},
"zipcode": {
"type": "string"
}
}
},
"lineItem": {
"type": "object",
"properties": {
"partNumber": {
"type": "string"
},
"description": {
"type": "string"
},
"price": {
"type": "number"
},
"discount": {
"type": "number"
},
"quantity": {
"type": "number"
}
}
},
"invoice": {
"type": "object",
"properties": {
"invoiceDate": {
"type": "string",
"format": "date"
},
"invoiceNumber": {
"type": "string"
},
"purchaseOrderNumber": {
"type": "string"
},
"supplierName": {
"type": "string"
},
"supplierTaxId": {
"type": "string"
},
"supplierAddress": {
"$ref": "#/$defs/address"
},
"customerName": {
"type": "string"
},
"customerTaxId": {
"type": "string"
},
"customerAddress": {
"$ref": "#/$defs/address"
},
"lineItems": {
"type": "array",
"items": {
"$ref": "#/$defs/lineItem"
}
},
"subtotal": {
"type": "number"
},
"tax": {
"type": "number"
},
"fees": {
"type": "number"
},
"total": {
"type": "number"
}
},
"required": [
"invoiceNumber",
"purchaseOrderNumber"
]
}
}
}
The Logs from an execution of the LLM Chain node
The schema added to the LLM prompt text is a small subset of the entire schema, which omits everything but the type (a generic “array”) of the root “output” element’s value.
Your output will be parsed and type-checked according to the provided schema instance, so make sure all fields in your output match the schema exactly and there are no trailing commas!
Here is the JSON Schema instance your output must adhere to. Include the enclosing markdown codeblock:
```json
{"type":"object","properties":{"output":{"type":"array"}},"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}
```
Questions
- Is there some limitation on what JSONSchema features are supported in this context? The linked page about JSONSchemas shows definitions and $ref usages, so one would assume those are supported in n8n.
- Is the JSONSchema supposed to represent the structure of the entire output from the LLM (which, in this case, should be an array of invoice objects). There seems to be no way to customize the instructions such that it says “the JSONSchema represents a single item, but there could be more than one of those items in the output” When I tried making the schema represent a single invoice, the node output (array) had 16 items made up of little pieces of the one invoice in the input, instead of a single invoice object in an array with length-1.
- Is there a guide that gives more information (or at least more current info) than the docs titled Structured Output Parser node? These docs no longer match the current UI as far as I can tell. There is not currently any way to chose between generating and defining a schema as the docs suggest. The only choice now (v1.73.1) appears to be defining it in JSONSchema format.
Information on your n8n setup
- n8n version: 1.73.1
- Database (default: SQLite): Postgres
- n8n EXECUTIONS_PROCESS setting (default: own, main): main (not queue mode)
- Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
- Operating system: Raspberry Pi OS (Debian)