AI generated n8n workflows always break, I built an open-source fix

Every time you ask Claude, GPT, or Gemini to generate an n8n workflow, you get back JSON that looks right but fails the second you import it. Wrong node types, broken connections, missing UUIDs, invalid versions. You spend 20 minutes debugging what was supposed to save you time.

I got tired of fixing these by hand, so I built Kairos, an open-source system that gives any LLM actual n8n expertise.

Without Kairos: You ask an AI for a workflow → it gives you broken JSON → you paste it in → nodes error out → you spend 20 minutes fixing it manually.

With Kairos: You describe what you want → your AI generates the workflow using Kairos’s n8n knowledge → 23 structural rules catch issues before deployment → it auto-corrects mistakes → deploys directly to your instance. 10-20 seconds, no manual fixing.

It’s built 286 workflows so far across integrations like Slack, Google Sheets, Gmail, Stripe, Notion, PostgreSQL, Telegram, GitHub, and more. And it learns from every successful build.


MCP

(The fastest way to try it, no coding required)

If you use Claude Desktop, Claude Code, Cursor, or just any MCP compatible tool, you can connect Kairos in 60 seconds. Your AI becomes an n8n expert, it knows your exact node types, versions, and connection rules.

Add this to your MCP config (claude_desktop_config.json or Cursor settings):

json
{
  "mcpServers": {
    "kairos": {
      "command": "npx",
      "args": ["--yes", "--package=@kairos-sdk/core", "kairos-mcp"],
      "env": {
        "N8N_BASE_URL": "https://your-instance.app.n8n.cloud",
        "N8N_API_KEY": "your-n8n-api-key",
        "KAIROS_MCP_ALLOW_DEPLOY": "true"
      }
    }
  }
}

That’s it. Now when you ask Claude to build a workflow, it pulls from Kairos’s knowledge base, validates the output, and can deploy it straight to your n8n instance.

No Anthropic API key needed. Your host LLM (Claude, GPT, whatever) generates the workflow and Kairos provides the n8n expertise and guardrails.

Destructive operations (deploy, activate, delete) are gated behind permission flags that default to blocked, so nothing touches your instance unless you explicitly allow it.


What makes this different from just asking an AI directly? Why can’t you just use n8n’s built in AI instead?

You can, but it generates a workflow and drops it in the editor. If something’s wrong, you fix it manually. It doesn’t validate structure, it doesn’t self-correct, it doesn’t learn from past builds, and you can’t call it from your own code.

LLMs don’t actually understand n8n. They’ve seen workflow JSON in training data, but they hallucinate node types, use deprecated versions, and get connection formats wrong constantly.

Kairos is a knowledge system, not just a validator:

  • 23 structural rules that encode n8n-specific domain knowledge: The exact things LLMs get wrong every time (trigger placement, connection format, node type/version combos, UUID requirements, credential configs

  • Auto-correction: When validation fails, it fixes the issues and re-validates automatically, up to 3 attempts. No manual intervention

  • Syncs your live n8n instance: Fetches your actual installed node types and versions so it generates against your real setup, not a generic guess

  • Learns from past builds: Every successful workflow goes into a retrieval library. 286 builds in, it pulls from proven patterns using hybrid search ( TF-IDF + node fingerprinting + outcome history + cluster reranking) to find the most relevant examples for any new request

The difference between asking Claude directly and asking Claude with Kairos is like the difference between asking someone who’s read about n8n and asking someone who’s built hundreds of workflows on it.


The Numbers

We ran a 20 prompt benchmark suite which included simple triggers, multi-step conditional logic, AI agents with memory, all to measure how often Claude generates structurally valid n8n JSON.

Without the Kairos library (raw Claude + validator + correction loop):

First-try pass rate: 55% (11/20)
Needed correction loop: 45% of the time
Average generation time: 30.6 seconds
Average attempts per workflow: 1.45

(Almost half the time, Claude got something wrong. Bad node types, broken connections, missing fields, and the correction loop had to fix it)

With the Kairos library seeded (same prompts, same validator):

First-try pass rate: 100% (20/20)
Needed correction loop: 0%
Average generation time: 20.7 seconds
Average attempts per workflow: 1.0

The library eliminated the correction loop entirely and cut generation time by a third. Every workflow passed all 23 structural rules on the first attempt.

That’s what 286 workflows of accumulated knowledge does, it’s not just validating, it’s teaching the LLM how to get it right the first time.

The library includes workflows up to 50+ nodes, RAG chatbots with vector stores, multi-agent systems with conditional routing, AI assistants managing Gmail/Calendar/Tasks, automated outreach pipelines with multi-agent QA loops.


Also available as SDK and CLI

If you’re a developer who wants to integrate workflow generation into your own tools:

SDK:

typescript
import { Kairos } from '@kairos-sdk/core'

const kairos = new Kairos({
  anthropicApiKey: process.env.ANTHROPIC_API_KEY,
  n8nBaseUrl: 'https://your-instance.app.n8n.cloud',
  n8nApiKey: process.env.N8N_API_KEY,
})

const result = await kairos.build(
  'When a new row is added to Google Sheets, send a Slack notification to #updates'
)

CLI:

bash
npx @kairos-sdk/core build "Send a Slack message every morning at 9am" --dry-run

Links

GitHub: GitHub - Kruttz/Kairos: AI powered n8n workflow engine. Use it as an MCP server, SDK, or CLI · GitHub

npm: https://www.npmjs.com/package/@kairos-sdk/core

Still early and actively building. If you try it, I’d genuinely like to know what workflows you throw at it, especially what breaks. That’s how the system gets better.

1 Like

hello i can help with that

1 Like

The expression syntax issue is the piece that trips LLMs up most - they write $item("0")["json"]["field"] or outdated accessor patterns instead of $json.field or $('NodeName').item(0).json.field. The validation catching this before deployment is the most valuable part of what you’ve built. One question: are the 23 structural rules documented anywhere in the repo? Would be useful to reference them directly when crafting prompts for workflow generation, especially for cases where the LLM needs to know which node parameter types expect strings vs objects.

Good point, that’s probably the biggest gap in what we validate right now. The 23 rules are all structural (node IDs, connections, credential shapes, type versions, etc.) so it catches broken JSON before it hits n8n, but it doesn’t look inside parameter expressions. That’s on the roadmap.

The full rule table is in the README:

Just scroll to “Validator Rules”. Each rule also has mitigation text and pipeline stage mappings in src/validation/rule-metadata.ts that get fed back into the LLM prompt when a rule keeps failing.

The pattern learning system actually tracks which rules fail most often and warns the LLM about them in future builds, so if expression syntax were added as a rule, the system would automatically start learning from those failures too.

Would be curious what specific expression patterns you see LLMs mess up most, that’d help prioritize what to validate first.

The most common ones I see:

  1. Outdated accessor syntax - using $node["NodeName"].json.field instead of $('NodeName').item.json.field. LLMs trained on older content still generate the deprecated pattern constantly.
  2. Wrong item index assumptions - writing $json.items[0] expecting an array when the data is already flattened into individual items by n8n’s item model.
  3. Missing .first() or .all() calls - referencing $('NodeName') directly without specifying item context, which throws at runtime.

The deprecated accessor one is by far the most frequent - I’d prioritize that for expression validation.