My prompt remains the same and is quite long, but currently, it is included in every execution. In ChatGPT, you can create an Assistant that stores the prompt, making the process more efficient and saving both time and tokens.
Is there a similar way to achieve this in n8n—storing the prompt once and only sending new input while maintaining context?
In OpenAI’s Assistant (and similar systems), even though the prompt might seem “stored”, it’s still included in every API call as part of the conversation context.
This means its tokens are always counted toward your usage.
In n8n, when you configure a prompt, it gets sent with each execution, and there isn’t a built-in method to “cache” it separately to save tokens.
This is inherent to how LLM APIs work: every call needs the full context (system, user, and assistant messages) to generate a response.
Configuring the prompt in an assistant does not avoid token usage. The static prompt is always included and contributes to your token count.
I know it’s frustrating, because it would be great to save tokens. But OpenAI already saves tokens behind the scenes, when you send repeated similar calls.