N8n AI workflows scale great… until your system prompt becomes a monster. Anyone using "skills" or modular routing for large AI agents?

(I used AI to clean up my message because my English is not very good.)

I’ve been loving n8n for building AI-powered workflows — the AI Agent node + tools is super powerful for small-to-medium scale stuff.

Right now my setup uses one big system prompt that configures the agent’s personality, rules, available actions, output format, etc. Works beautifully when the scope is limited.

But as soon as the agent needs to handle more domains (customer support + data analysis + scheduling + content generation + …), the system prompt balloons to thousands of tokens. Every single call pays the full price, even for trivial queries → higher costs, slower responses, harder to maintain/debug.

What I’m dreaming of (and maybe some of you already built):

  • A “skills” system (like Anthropic’s Claude skills / tool-use patterns) where the agent has:

    • A short core prompt (“You are a helpful coordinator…”)

    • A router / classifier that decides which “skill” to activate

    • Only loads/injects the relevant skill instructions (as dynamic context, sub-prompt, or even a separate AI Agent node) for that task

In n8n terms, this could look like:

  • Main AI Agent with minimal prompt + tool-calling to “route to skill X”

  • Or Switch + IF nodes that classify intent first → then call specialized AI Agent sub-workflows with their own focused system prompts

  • Or dynamic prompt injection from a database / GitHub (like the “Use skills in n8n agent node” template)

Has anyone implemented something like this successfully at scale?

  • Multi-agent hierarchies?

  • Intent classification upfront then route to slimmed-down agents?

  • External skill files pulled on-demand?

  • Prompt caching tricks that actually work in n8n?

Would love to see example workflows or patterns — happy to share mine too if it helps spark ideas.

The AI Agent Tool node is what you want here, it lets your main agent call other agents as tools so each one gets its own focused system prompt and only fires when needed. Way cleaner than trying to route with Switch/IF nodes manually. There’s also a “Use skills in n8n agent node” template on n8n.io that does exactly the skill-loading pattern you described, worth checking out before building from scratch.

1 Like

yeah this hits once you go past 2-3 domains.

what worked for me: tiny core prompt (3-4 lines max), domain instructions stored externally, fetched and injected dynamically based on intent. a cheap classifier call upfront to route is way more reliable than one giant agent trying to do everything.

also — structure prompts with stable parts first (persona, format rules), dynamic parts last. both anthropic and openai cache top-down so you get cache hits on the static prefix.

@Pavel_Kuzko nailed it with the external instructions approach. we’re doing something similar — each domain gets its own sub-workflow with a focused prompt, and a small classifier up front routes to the right one. core prompt stays like 5-6 lines, everything else lives outside.

biggest win honestly wasn’t even performance, it was debugging. when something breaks you immediately know which skill caused it instead of digging through a 2000 token mega-prompt. and adding new capabilities is just another sub-workflow, no touching the main agent.

the routing adds a bit of latency but for anything past 3 domains it’s 100% worth it. also +1 on the cache point — static prefix first, dynamic context last.

@Benjamin_Behrens the debugging point is the best argument for this pattern — when a 2000-token monolith misbehaves you have no idea which section caused it. one more thing that compounds the benefit: iteration speed. changing a domain-specific sub-prompt and testing it in isolation takes seconds. changing one section of a giant prompt and hoping you didn’t break the other 12 domains is a different game.

for @paklong2556 — the per-domain sub-workflow approach Benjamin described scales really well in n8n. start with 3-4 domains max, get the classifier routing reliably first, then add more. the temptation is to build the full taxonomy upfront and it always bites.

the iteration speed point is underrated — testing a sub-prompt change in isolation vs. hoping a tweak in a monolith didn’t break 12 other domains is a completely different workflow.

@paklong2556 the “classifier first” advice is key. don’t build the full domain taxonomy before you know routing is reliable — get 3-4 domains working cleanly, then expand.

the sub-workflow router pattern works really well once you nail the classifier. one addition to @Benjamin_Behrens’ debugging insight: versioning becomes much simpler too. when something breaks you know exactly which domain-prompt to roll back instead of git-blaming a 2000-token mega-prompt. also worth considering: keep your classifier lightweight and simple — it’s the most critical part. if the router makes mistakes everything downstream suffers. spend the time getting intent classification right even if it means a separate fine-tuned classifier node.

@Vaibhavi_Pai the “agent as unit of organisation” framing makes a lot of sense for larger setups. The sub-workflow approach being discussed here is actually n8n’s native path to that same separation — each domain sub-workflow gets its own system prompt, tool access, and scope, while the core orchestrator stays lightweight. The classifier routes intent, sub-workflows handle execution; you’re composing “agents”, not just logic branches. What platform are you working with now — curious what the main practical difference looks like at scale.

@Benjamin_Behrens @Pavel_Kuzko @Vaibhavi_Pai

Thank you all so much for the detailed insights and real-world experience, this is exactly what I needed!

The classifier-first + sub-workflow pattern with domain-specific agents sounds perfect. Love the debugging, iteration speed, and versioning wins you mentioned, those alone make it worth the small routing overhead.

Quick question: Does anyone have a favorite prompt template or example for the intent classifier node (e.g., what system prompt + categories/output format works reliably for routing to domains/skills)?

@paklong2556 for the classifier we keep it dead simple: a short prompt listing the available domains (3-5 words each) and an instruction to return JSON with one key domain. Add 2-3 few-shot examples of ambiguous inputs with the correct domain — that handles most edge cases. Run it with a fast cheap model like Haiku, the response is tiny and you don’t need advanced reasoning for intent detection.