How can I set up the **AI Agent in n8n** so that my long, **static prompt** is not sent with every request, avoiding unnecessary token usage?

My prompt remains the same and is quite long, but currently, it is included in every execution. In ChatGPT, you can create an Assistant that stores the prompt, making the process more efficient and saving both time and tokens.

Is there a similar way to achieve this in n8n—storing the prompt once and only sending new input while maintaining context?

Thanks for your insights! :rocket:

In OpenAI’s Assistant (and similar systems), even though the prompt might seem “stored”, it’s still included in every API call as part of the conversation context.

This means its tokens are always counted toward your usage.

In n8n, when you configure a prompt, it gets sent with each execution, and there isn’t a built-in method to “cache” it separately to save tokens.

This is inherent to how LLM APIs work: every call needs the full context (system, user, and assistant messages) to generate a response.

Configuring the prompt in an assistant does not avoid token usage. The static prompt is always included and contributes to your token count.

I know it’s frustrating, because it would be great to save tokens. But OpenAI already saves tokens behind the scenes, when you send repeated similar calls.

Take a look at this documentation:

:point_right: If my reply answers your question, please remember to mark it as a solution.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.