I’m not sure how to imlpement this in n8n. I was thinking of 2 ways:
set a “Mock Node” tool before each AI Agent tool, that will check if we are in a test execution. if so, call the mock node instead the ai agent tool node
set one global “Mock Node” tool, that every agent will be connected to. This tool will check if the enivornment is “mock enabled” and will check the json’s input for a valid mock definition. If it finds it will be invoked.
There are different issues with both ideas:
The name of the tool will be different then the actual tool (Mock Node vs greeting-generator).
Relying on the model to invoke the mock can cause a non-determenistic behaviour that sometimes might call the real tools instead of the mock tools.
How would you suggest me to handle it?
P.s
As a feature request I can suggest the following:
Adding a new feature to the AI Agent Tool that will be connected to a “Code Node” / or any node that can return output easily. Whenever such the tool will be invoked, the AI Agent Tool will systematically check if the current execution is “mock-enabled” and if so it won’t activate the agent but the mock.
There is no such thing as deterministic when it comes to LLMs. It is always a matter of interpretation of the combination of your instructions along with LLMs training data as well as the intensity of solar flares and whether the Mercury is in retrograde or not
That’s a really clear suggestion, thank you! I like the idea of having the AI Agent Tool smart-check whether execution is mock-enabled before running. It would definitely make testing smoother and avoid unnecessary activations. Appreciate you laying it out so clearly.
I agree with you that try to acheive on truth (determenistic) with LLM is difficult. With that said I think we still need to aim to get determensitc results as much as we can. Especially for evalution and test methods for workflow.
In this case, I’m not looking for determenistic within the LLM itself, but just within the tool node that wraps the LLM invokation.
Mocking is common approach in SWE and testing strategies, and having it will allow scaling n8n workflows into production grade.