Request for feedback: Workflow evaluation beta

One of the features we’re most excited about — evaluation for AI workflows — is now in beta.

Evaluation is a way of getting AI workflows working reliably, since it allows you to measure the performance of your workflow and check whether you have corner cases covered. You do this by running a test dataset through the workflow and optionally calculating a quality score.

You can find a lot more information about evaluation in n8n and see a video of how it works in the docs.

If you’d like to help test this feature, upgrade to version 1.95.1. Cloud users need do nothing more, while self-hosted users must be on the registered community license and will need to open the developer console and run the following command, then refresh:

window.featureFlags.override('032_evaluation_mvp', true)

You’ll then see a new tab in the workflow editor:
CleanShot 2025-05-28 at 18.48.32

Evaluation is split into two parts: light evaluation and metric-based evaluation. Light evaluation is free for everyone, as is metric-based evaluation for one workflow. We’ve tried to make as much of this feature free as possible while also earning something from the people who are getting a lot of value out of it. So to add metric-based evaluation to multiple workflows, you’ll need a Pro plan.

Feel free to try it out — we would love your feedback!

33 Likes

whoop, exciting stuff. Going to try this :slight_smile: thanks @sirdavidoff

Do we have the option to buy the Pro plan and enable the features in a self-hosted environment?

3 Likes

That’s sounds great! Ty @sirdavidoff for heads up

It’s a great start, however tbh I was expecting a bit more. Sorry to say that.
I basically had the same features built with Baserow before, just by using a few regular nodes.
Only missing was a fancy way to see the evaluation runs in a separate view and that chart - metrics changing over time. So that is definitely a plus here!
Would love to see alternative options to Google Sheets like Airtable and NocoDB.

5 Likes

Good !
Is it about the human loop check block or just an individual workflow?

I wish we could just trace with Opik or something.

1 Like

I couldn’t find documentation. Can you help me out?

Have you tried the link in the original post? The docs should also be linked from the ‘evaluations’ tab

1 Like

Is it about the human loop check block or just an individual workflow?

Human in the loop is something you do at runtime, when the workflow executes. Evaluation is something you do while building your workflow, to check that when it runs it will do what it’s supposed to.

2 Likes

This is just the MVP! There’s a lot more we want to do here.

Awesome, but no tab shows up. I am on the startup license / self-hosted. I am stuck going back and forth fixing bugs coming back like boomerangs. Really need this feature.

edit; I did window.featureFlags.override('032_evaluation_mvp', true) in the console

Haha, sorry, I tried the link from the node itself and it threw a 404 error.

Qq, is there a plan for implementing reranking or a more sophisticated evaluations workflow? Cause the AI agent will have to constantly learn through interactions and someone should be able to say what is right from wrong at some point.

Or if something is already done in this direction can you point me to it?

Thanks,
V.

Sorry to hear that — we’ll double check the startup license to make sure it wasn’t overlooked.

1 Like

Could you say a little more about what you’re looking for here?

My bad, its resolved. Going to test it out now, but our use-case is a full workflow test, that calls other workflows, lets see what this thing can do. :rocket:

Awesome, thank you guys! i’ve enjoyed using it before also, but had to reinstall all nodes after updating n8n =/

Hey @N8ner

Can you tell me a bit more about this as it sounds unexpected.

Which version did you update from?
How are you running n8n (npm / docker)?
Which nodes did you have to reinstall and were they installed using community node feature or were they added manually to the custom nodes folder?

1 Like

Hey, Jon! i meant, that it happpened to me before, when i went from 1.89 -1.93, as far as i remember, not now) that’s why good news for me to see it on canvas with this update!) i host n8n on railway server and run it with docker, but used npm for installing packages. i’ve answered your question and also got the reason why it occured) Thank you!
Error message looks like this: