Be able to track cached tokens on AI agents logs. Mainly for evaluation.
My use case:
I’m building chatbot with RAG/CAG (cache augmented generation). When adding a Prompt Cache Key, I can See on OpenAI platform that I’m using the cache of the model, but can’t see it (and track it) on the json response from OpenAI on n8n.
There’s only tokenUsageEstimate and no “prompt_tokens_details”: {
“cached_tokens”: xxxx
},
I think it would be beneficial to add this because:
Evaluations on production-ready, heavy consuming tokens use cases (e.g. RAG/CAG).
Efficiency gains.
Yeah this would be useful, especially for production RAG/chatbot stuff where caching actually affects cost.
If n8n only exposes tokenUsageEstimate , I don’t think you’ll be able to track cached tokens cleanly from the AI Agent node yet. Workaround might be calling OpenAI directly with an HTTP Request node for the runs you care about, then logging prompt_tokens_details.cached_tokens yourself.
Strongly in favor of this. In high-volume RAG flows with long system prompts, cached_tokens can represent 60-70% of total tokens, so not seeing that breakdown makes cost tracking meaningless. The workaround via direct HTTP Request works but defeats the purpose of using the AI Agent node for cleaner flow management.
Exactly my use case and exactly what I did to overcome the problem. Going through http request is less practical but way more userful for evaluation. Agree that it shouldn’t be easy to enable token details for the n8n team. It may be a limit that comes from the langchain framework that is under the AI Agent fonctionning. But worth trying to ask for it. We’ll see