How to handle OpenAI's API rate limits in n8n?

pachocastillosr · April 20, 2023, 3:29pm

Hi,

OpenAI’s API has rate limits.

In the case of chat completions on the free trial, the api rate limits currently are:

Requests per minute: 3 requests per minute.
Tokens per minute: 40,000 tokens per minute.
You’ll get the amount of tokens you consumed in each api response (see image below).

My use case is the following: I am looping through a list of products, so I send each product to the OpenAI API, one by one, to ask gpt to describe each product.

For each product, I do 4 different requests to OpenAI (so I do 4 requests per loop), this causes errors because of the mentioned OpenAI api rate limits.

I’ve evaluated how this would be handled in Bubble and Xano and for both, it might involve creating a pretty complex queuing systems that make me feel that I’m fighting with the api.

ejfoqw

I just wanted to know how would this be done in n8n, to see if I really won’t have to fight with the apis anymore.

How do you make n8n respect both of this rate limits (requests per minute and tokens per minute)?

Any help would be appreciated.

Note: The open ai API might be called by others of my n8n workflows apart from the one used for product descriptions, consuming tokens and requests of the same api key rate limit. You should be able to run parallels calls (not one by one) to the openai api, if there is enough free space in the rate limit, of course.

Any guidance would be appreciated!

sirdavidoff · April 21, 2023, 10:21am

Hi @pachocastillosr, welcome to the community!

I’m afraid n8n doesn’t currently have any functionality to track request rates across different workflows. You could build this yourself by storing a counter of the number of requests/credits you’ve used in the past minute and checking that before making requests. But you’d have to store this counter in a DB somewhere.

mxeise · October 15, 2024, 3:30pm

Hi @sirdavidoff, any guidance on how to apply that to an AI Writer Node? I’ve set the retry to Timeout 5000 (which seems to be max) but the rate limit still applies.

As I have a „Delegate to Writers“ node before I don’t think I can build in the logic you’ve described, or is it possible?

Thanks for your help

pemontto · October 16, 2024, 6:07pm

Ideally the AI nodes themselves handle rate limiting and backoff. In OpenAI’s case, by inspecting the headers - https://platform.openai.com/docs/guides/rate-limits/rate-limits-in-headers.

E.g. wait until x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens are > 0, or both x-ratelimit-reset-requests and x-ratelimit-reset-tokens are 0.