It’s been a minute and nice to be back. I have a question for the community.
I use the Langfuse and Comet OpikObservability & Evaluations platforms, which require using their Python SDK @decorators, API wrapper classes, and Callback facilities.
This pattern is easy to integrate into, say, LlamaIndex or LangChain, but in many cases those code-heavy frameworks are unnatural, even suboptimal for what is essentially a UI visual build effort (as with n8n), especially in Multi-agent cases.
So, what ways can these platforms be integrated into n8n for Observability & Evaluations, including specific worklow subsets and entire ones end-to-end?
This is something I would also be really interested in! Being able to build out our use-cases quickly is a great bonus of n8n but I’ve been unable to find a way to log the traces in the same way we could if we were working in python which is quite an important element for us.
I haven’t tried this myself, but have you experimented with using code nodes, webhooks, or other possible integrations that n8n has? It might get messy, but I’m just curious if you’ve given it a shot.
In my opinion, a game-changing feature for any of these UI-based builders would be the ability to fully export the underlying generated code. This would allow the workflows to be run and integrated into other contexts. I think this is a huge missing piece in the ecosystem right now.
I’ve found that you can extract execution data using the n8n api node which feels like a short term fix.
But I would really like to be able to use something like LangSmith or Arize Phoenix because they can trace the full pipeline and make it easier for us to work with others in identifying where the responses aren’t quite what we would expect.
Thank you. I hope the @n8n people are reading, understanding, and taking this discussion seriously because implementing a tie-in into platforms we’ve mentioned would be compelling for would-be many customers.