The delay between receiving the webhook and the workflow starting is too high: sometimes over 90 seconds. The webhook executions stay in “new” status for this long:
We did set QUEUE_HEALTH_CHECK_ACTIVE to true and increased the concurrency up to 75!
We saw a positive impact on the speed of executions, but we still had queues that were accumulating up to around 70 executions, always up to 2:30 minutes before executing everything within a second! Very strange behavior IMO.
I can understand EXECUTIONS_DATA_PRUNE_MAX_COUNT is quite high, but it corresponds to around 15 days of history for our volume
This issue is really annoying, we’re thinking of creating another specific instance dedicated to webhooks that need faster execution.
If you have any other thoughts and suggestions, it could be really helpful!
We had multiple workers for quite a while, but we had duplicated executions that were caused by that. (see this other discussion)
Regarding webhook processors, this is something we don’t want since we need not only a response but a quick execution of the called process..
It really feels like there’s a blind spot here, an important bug that many should face, but I don’t find lots of other dicussions on the community. This is weird.