How are you testing AI agents in n8n today?

Hey n8n community :waving_hand:

We’ve been building OverseeX, an early-access platform focused on testing and monitoring AI agents, especially those embedded inside automated workflows.

One challenge we kept running into with agent-based workflows (LLMs, tools, external APIs) is that traditional testing doesn’t really work well. Outputs aren’t deterministic, real API calls get expensive fast, and most issues only show up after deployment.

What we’re trying to do with OverseeX:

  • Auto-generate tests by observing real agent behavior

  • Intelligently mock external APIs to reduce cost

  • Continuously monitor agents in production with alerts

We’re still early and actively looking for feedback from people building AI-powered workflows with n8n.

Would love to learn:

  • How are you currently testing AI agents in n8n?

  • What breaks most often in your AI workflows?

If this resonates, happy to share more or get feedback from real use cases.

You can checkout our product : https://overseex.com

Thanks!