Hello n8n Support,
I’m currently using the AI Agent node in n8n to interact with OpenAI’s GPT-4o model. The agent is working with a very large prompt (including extensive context and 37 dynamic instructions from external tools), which causes the token usage to spike significantly on each request.
As a result, I’m regularly hitting the OpenAI rate limit :
OpenAI: Rate limit reached for gpt-4o in organization [...] on tokens per min (TPM): Limit 30000, Used 20191, Requested 22443
.
Unfortunately, I cannot place a delay between the Agent and the language model inside the AI Agent setup. While I’ve tried adding a custom wait_Tool
, the agent only calls it inconsistently, and not on every execution, even when prompted to do so.
I also explored the “Wait Between Tries (ms)” option in the Agent settings, but it maxes out at 5000ms , and increasing it further is not allowed via the UI.
My question is:
How can I reliably introduce a longer delay (e.g. 10–30 seconds) between Agent executions or before each call to the OpenAI model (inside AI Agent), to stay within the rate limits — especially when using large prompts and multiple tool calls?
Any suggestions or workarounds would be greatly appreciated.
Best regards,
Information on your n8n setup
- **n8n version:**Version 1.88.0
- Database (default: SQLite): PostgreSQL
- Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
- Operating system: cloud server Render