Using Prompt Caching in N8N

,

Many models allow for what they call prompt caching which is extremely beneficial if you have long systemprompts or are referring to the same video multiple times or whatever token intensive task it may be.
Is it possible to somehow setup a prompt caching system within N8N?

Information on your n8n setup

  • n8n version: 1.81.4
  • Database (default: SQLite): SQLite
  • n8n EXECUTIONS_PROCESS setting (default: own, main): Own
  • Running n8n via (Docker, npm, n8n cloud, desktop app): Google Cloud
  • Operating system: WIndows10

If you are using OpenAI, prompt caching is automatic.

In n8n, when you configure a prompt, it gets sent with each execution, and there isn’t a built-in method to “cache” it separately to save tokens.

This is inherent to how LLM APIs work: every call needs the full context (system, user, and assistant messages) to generate a response.

But OpenAI already saves tokens behind the scenes, when you send repeated similar calls.

Take a look at this documentation:

:point_right: If my reply answers your question, please remember to mark it as a solution.

I contacted OpenAIs developers they told me otherwise

What did they tell you? That the official documentation is outdated and they no longer do prompt caching?

what if we are using Google Gemini? We need a new option config in the Gemini model node to set the cache name as the request looks like this:

{
      "contents": [
        {
          "parts":[{
            "text": "Please summarize this transcript"
          }],
          "role": "user"
        },
      ],
      "cachedContent": "'$CACHE_NAME'"
    }
1 Like

Researching the same topic.

This is what OpenRouter docs say about caching:

Gemini is not mentioned there, BUT.
Today I played with same chat pipeline with Sonnet 3.7 and Gemini 2.5 Pro via OpenRouter, and for the similar requests in the middle of the chat (7k in, 500 out tokens) the cost was 43 (forty-three) times bigger in Sonnet.
Per the model costs: $1 in/$10 out for Gemini, $3 in/$15 out for Sonnet it should be not 43x, but around 3x.
Also I saw requests for Gemini that had costs not 43x less, but around x10 less than Sonnet (dialog start).
So, for Gemini it’s for sure automatic caching applied when using OpenRouter Model node.

I have 4k in system prompt only.

1 Like

Unfortunately found out that OpenRouter was using my Google AI Studio account key and I have $300 trial credits there, so the main cost was deducted in Google AI Studio account.

Played a bit with API keys and their fallback mechanics in OpenRouter, now I see the cost directly in OpenRouter, as it uses Google Vertex primarily.

Do the chat model nodes have this prompt chaching included already?

I guess the question is to how enable explicit context caching to ensure savings: