When a particular complicated scenario gets executed, it blocks any other scenarios to finish

Describe the issue/error/question

We have a setup with “main” mode. In that setup we have dozens of scenarios, some are scheduled, some are triggered by webhooks.

If I just run load test on simple “wait for 5s and make http request” test scenario, I’m able to get good results, i.e. many executions happens at the same time properly, no CPU load, no memory load, all’s well.

But. We have this particular ‘complicated’ scenario, which runs for ~15s and does some HTTP queries, some cyclic logic, whatever. It’s scheduled to run every minute.

If it runs, it immediately blocks all parallel executions. Which can be seen on this load test:

> hey -z 3m  -c 20 https://...
Summary:
  Total:	182.5697 secs
  Slowest:	18.1360 secs
  Fastest:	5.6872 secs
  Average:	7.1627 secs
  Requests/sec:	2.7880
  

Response time histogram:
  5.687 [1]	|
  6.932 [428]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  8.177 [20]	|■■
  9.422 [0]	|
  10.667 [0]	|
  11.912 [0]	|
  13.156 [9]	|■
  14.401 [2]	|
  15.646 [18]	|■■
  16.891 [1]	|
  18.136 [30]	|■■■


Latency distribution:
  10% in 5.7676 secs
  25% in 5.8591 secs
  50% in 5.9584 secs
  75% in 6.1272 secs
  90% in 13.6303 secs
  95% in 17.2409 secs
  99% in 17.9711 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0287 secs, 5.6872 secs, 18.1360 secs
  DNS-lookup:	0.0002 secs, 0.0000 secs, 0.0092 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0043 secs
  resp wait:	6.6999 secs, 5.3913 secs, 17.8255 secs
  resp read:	0.3636 secs, 0.2820 secs, 0.9104 secs

Status code distribution:
  [200]	509 responses

So, as you can see, we run here 20 concurrent web hook queries, for 3 minutes.

Majority of them finishes just fine in 7-8 seconds. But there are outliers — that respond in 13-18 seconds. Those are executions which happened to coincide with “complicated” scenario.

I suppose the problem lies in my scenario, and it somehow takes whole nodejs process resources to itself, blocking any i/o operations. Why is that?

Please share the workflow

Here’s our ‘complicated’ scenario: Complicated n8n scenario · GitHub

(couldn’t paste it here, its too long)

Information on your n8n setup

  • n8n version: 0.199
  • Database you’re using (default: SQLite): Postgres
  • Running n8n with the execution process [own(default), main]: main
  • Running n8n via [Docker, npm, n8n.cloud, desktop app]: Docker

We also tried queue mode — doesn’t help, since it overtakes whole worker (bad solution we have right now is having 3 workers with concurrency=1 — that way we guarantee that 3 concurrency are available).

I guess that “own” mode would work, but since the problem is with this particular scenario, I would want to know why does it happen.

Thanks!

It seems you have multiple custom nodes (CUSTOM.retailCrm and CUSTOM.ecwid). Is it possible that they are doing something blocking? You are also calling other workflows, is it possible that something blocking is happening in there? Did you check in past executions where the most time is spent in that workflow? Is a good chance that the node which executes the longest is where the problem is.

Yeah, that what I thought, but I can’t think of a way (except for slow and painful commenting out things one by one) to actually determine what blocks.

Our custom nodes are nothing but thin APIs wrappers, doing HTTP requests to external APIs.

From ~13s that execution of this scenario takes, as far as I can see, it more or less uniformly spreaded: some 3-4s to our custom node execution at the beginning of a scenario, some later.

And I don’t have enough knowledge on node’s & n8n execution model, i.e. — what might be the actual reason to block other scenarios execution? (and by saying ‘block’ I mean just that: they are waiting for this one to complete, not just slowing or lagging).

Do you have any idea on how can I determine it?

Let’s suppose, that problem is actually within our custom node. It does nothing but something like:

  if (operation === "getProduct") {
    const productId = parameter("productId")

    return apiRequest(context, stores, {
      method: "GET",
      url: `/products/${productId}`
    })
  }

and

const apiRequest = async (
  context: IExecuteFunctions,
  stores: string[],
  request: RequestOptions
): Promise<INodeExecutionData[][]> => {
  const credentials = await context.getCredentials("ecwidApi")
  const storeCredentials = parseStoresCredentials(credentials.stores as string)

  const successfulResponses: INodeExecutionData[] = []
  const errors: INodeExecutionData[] = []

  const requests = stores.map(async storeName => {
    const storeCredential = storeCredentials.find(c => c.name === storeName)
    if (!storeCredential) {
      throw new NodeOperationError(context.getNode(), `unknown store name: ${storeName}`)
    }

    const response = await apiRawRequest(storeCredential, context, request)

    if (!response.success) {
      errors.push({json: {status: response.status, message: response.message}})
      return
    }

    if (response.success) {
      const {data} = response
      if (request.isArray && Array.isArray(data.items)) {
        const patchedItems = data.items.map(item => patchItem(item, storeCredential))
        const mappedItems = context.helpers.returnJsonArray(patchedItems)

        successfulResponses.push(mappedItems)
        return
      }

      successfulResponses.push({json: patchItem(data, storeCredential)})
    }
  })

  await Promise.all(requests)

  return [successfulResponses.flat(), errors]
}

Any idea why this might block execution of other scenarios?

And even if while this node executes (which takes 4-5s out of 13s total), all other work is paused, that’s doesn’t explain why other scenarios wait for this one to complete.