Many models allow for what they call prompt caching which is extremely beneficial if you have long systemprompts or are referring to the same video multiple times or whatever token intensive task it may be. Is it possible to somehow setup a prompt caching system within N8N?
Information on your n8n setup
n8n version: 1.81.4
Database (default: SQLite): SQLite
n8n EXECUTIONS_PROCESS setting (default: own, main): Own
Running n8n via (Docker, npm, n8n cloud, desktop app): Google Cloud
If you are using OpenAI, prompt caching is automatic.
In n8n, when you configure a prompt, it gets sent with each execution, and there isn’t a built-in method to “cache” it separately to save tokens.
This is inherent to how LLM APIs work: every call needs the full context (system, user, and assistant messages) to generate a response.
But OpenAI already saves tokens behind the scenes, when you send repeated similar calls.
Gemini is not mentioned there, BUT.
Today I played with same chat pipeline with Sonnet 3.7 and Gemini 2.5 Pro via OpenRouter, and for the similar requests in the middle of the chat (7k in, 500 out tokens) the cost was 43 (forty-three) times bigger in Sonnet.
Per the model costs: $1 in/$10 out for Gemini, $3 in/$15 out for Sonnet it should be not 43x, but around 3x.
Also I saw requests for Gemini that had costs not 43x less, but around x10 less than Sonnet (dialog start).
So, for Gemini it’s for sure automatic caching applied when using OpenRouter Model node.
Unfortunately found out that OpenRouter was using my Google AI Studio account key and I have $300 trial credits there, so the main cost was deducted in Google AI Studio account.
Played a bit with API keys and their fallback mechanics in OpenRouter, now I see the cost directly in OpenRouter, as it uses Google Vertex primarily.