AI-powered DevOps first-line: a classifier + 4 specialized assistants (Built with n8n)

Sharing a system I’ve been running in production for a couple of months — an AI agent that handles first-line DevOps support in our team’s Slack channel. The whole thing is built on n8n + MCP servers, and I’ve just uploaded the templates here so you can pick the pieces that fit your setup.

What it does

The team’s Slack channel was eating a lot of triage time — and a sizeable chunk of incoming messages weren’t even DevOps tickets, but each still needed a quick look. So the system works like this:

  1. A classifier reads each incoming message (and pulls thread context when needed via slack-mcp), categorizes it into one of seven buckets — CI/CD, incident, infra question, modification request, announcement, etc. — and routes it to the right sub-workflow.
  2. Each category-specific assistant has its own toolset (GitHub MCP, Kubernetes MCP, Grafana MCP, etc.) and its own system prompt tuned for that type of request.
    1. CI/CD Investigator
    2. The rest of the workflow can be found on GitHub.
  3. A shared attachments analyzer processes screenshots and log files attached to the request before the main agent runs — so a “build failed” + screenshot turns into actual structured context the LLM can work with.
  4. A dedicated error reporter workflow catches failures of any of the above and posts a short notice back into the user’s Slack thread, so nothing silently hangs.

Current numbers

  • ~25% of requests fully closed without an engineer
  • ~40% reach the on-call with diagnostics already attached
  • Avg response time under 3 min
  • ~$250/month LLM spend at our volume

A few things I learned along the way

  • Pulling thread context before classification matters a lot — a reply like “same issue here” is meaningless without history. Worth giving the classifier its own slack-mcp access rather than pre-fetching.
  • The built-in HTTP Request node doesn’t handle timeouts gracefully — a single unreachable endpoint can take down the whole agent chain. Wrapping it in a sub-workflow that always returns a structured success/failure/timeout result fixed it.
  • Setting up an Error Workflow in the main workflows’ settings is one of those things that feels optional during dev and becomes critical the moment something runs in prod.
  • For the classifier itself, Sonnet and GPT-5 Codex give comparable quality — model choice mattered far less than the system prompt and the tool access.

A few more branches are coming (infra incident assistant, RAG-based knowledge assistant for routine infra questions, modification request handler with auto-ticketing) — I’ll add them to this thread when they’re stable.

Happy to answer questions about the setup, the system prompts, or specific tradeoffs. Curious to hear how others handle Slack/chat triage in n8n — anyone doing something similar with a different routing strategy?