Prompt and Context Caching in N8N

Following up on a closed topic: Using Prompt Caching in N8N

I’m looking for a way to leverage the models’ context caching (e.g. Context caching overview  |  Generative AI on Vertex AI  |  Google Cloud Documentation)

in order to avoid sending large requests with each LLM execution.

Are there any news or best-practices around this?

Hi @Zohar Have you tried LiteLLM? It seems promising.

1 Like

I found out about the models own prompt and context caching layer. I don’t see how to do it in n8n apart for using a direct HTTP request.

@Zohar You can do that by passing AI Agent as a tool to the main AI Agent, like:

And by using ResponsesAPI this switches the chat model from classic chat Completions to the Responses endpoint and exposes the model side state options. Also i recommend reading this conversation:

https://www.reddit.com/r/n8n/comments/1qys5cj/google_geminis_implicit_context_caching/