Evaluations in Agentic Workflows
For anyone who couldn’t make it to n8n Builders Berlin, we’ve got another session recording ready for you. This one is from the Advanced Track, where JP van Oosten (Engineering Manager of the AI team at n8n) walks through how he uses evaluations to make AI workflows more reliable.
He covers common issues like inconsistent LLM outputs, context drift, and edge cases, and also shows how to compare prompts and models in a structured way. There’s a live demo too, where JP sets up evaluations in n8n and explains how he interprets the results.
If you’re building AI workflows or agents and want a clearer way to test, monitor, and improve them, this is a great session to watch. ![]()
