Hi,
OpenAI’s API has rate limits.
In the case of chat completions on the free trial, the api rate limits currently are:
- Requests per minute: 3 requests per minute.
- Tokens per minute: 40,000 tokens per minute.
You’ll get the amount of tokens you consumed in each api response (see image below).
My use case is the following: I am looping through a list of products, so I send each product to the OpenAI API, one by one, to ask gpt to describe each product.
For each product, I do 4 different requests to OpenAI (so I do 4 requests per loop), this causes errors because of the mentioned OpenAI api rate limits.
I’ve evaluated how this would be handled in Bubble and Xano and for both, it might involve creating a pretty complex queuing systems that make me feel that I’m fighting with the api.
I just wanted to know how would this be done in n8n, to see if I really won’t have to fight with the apis anymore.
How do you make n8n respect both of this rate limits (requests per minute and tokens per minute)?
Any help would be appreciated.
Note: The open ai API might be called by others of my n8n workflows apart from the one used for product descriptions, consuming tokens and requests of the same api key rate limit. You should be able to run parallels calls (not one by one) to the openai api, if there is enough free space in the rate limit, of course.
Any guidance would be appreciated!