I’m looping over many items that are sent one after the other to the assistant and it replies with its response.
When items are many, I reach the allowed limit per minute and the workflow fails with an error like this:
Rate limit reached for gpt-4o in organization org-074riJH0pDwfWfdDE0IN6QmQ on tokens per min (TPM): Limit 30000, Used 4338, Requested 28609. Please try again in 5.894s. Visit https://platform.openai.com/account/rate-limits to learn more.
So, I set “Retry on fail” in the node and set higher limits.
Unfortunately, I discovered that the input field automatically lowers the limit to a maximum of “5” for “Max. Retries” and a maximum of “5000” for “Wait Between Tries (ms)”.
Is it possible to higher this limits? It would be very useful (and I don’t understand why an hard cap was put on the limit setting ).
Rate limits and web scraping are topics that concern many people. One option is to use proxies or SaaS providers like Apify to handle the problem for you.
Another approach is to manage it yourself and get creative. To a certain extent, this can be done quite easily. For example, in a workflow, you can iterate through a list using a loop node and send HTTP requests one by one. If an error occurs, you wait and try again.
However, to use this effectively, you should check the error type and set a retry limit—otherwise, you risk creating an endless loop! The same can happen if the wait time is too short or if the IP address gets blocked.
If the issue is rare and doesn’t occur often, you could also consider using a “Human in the Middle” node. This allows the workflow to notify you: “Something went wrong—should I continue?”
The configuration shown in the workflow you posted is actually very similar to mine.
The main difference is that I’m only using the “Split out” node.
Looking at the workflow you posted, instead, I notice that after the “Split out” node there is the “Loop over items” one: I supposed the loop was performed by the “Split out” node.
But, evidently, this is not the case. So, the actual question is: when I use a Split out node, do am I sending all the request at the same time? I expected they were sent one after the other, but, as I am understanding, they are sent all concurrently?
Yes, all nodes are executed simultaneously. This makes everything fast, efficient, and high-performing. However, when dealing with HTTP requests, it can quickly lead to rate limits.
Now I’m trying to implement a waiting logic using loop over items, but I’m having troubles trying to understand what the current index is… But I’m going to open a new 3d to keep things in order
I will link it here: to keep a fil rouge in place…