Executions hang when triggered by webhook or cron in queue mode

Describe the issue/error/question

I’m self hosting, using queue mode. Workflows run from the web UI, clicking on the “execute workflow” button work just fine. Webhook tests, which are routed to the main web instance work fine.

However, if a workflow is initiated by cron, or a webhook, they do not execute, hang, and run forever.

In the logs, I see the worker process and webhook process starting the jobs.

What is the error message (if any)?

No error messages.

Please share the workflow

Share the output returned by the last node

No output, it just hangs. My reverse proxy times out.

Information on your n8n setup

  • 0.181.2, the problem started with 0.181.0, as far as i can tell. I upgraded hoping it would fix it.
  • postgres
  • Own process, queue mode
  • running n8n on docker, with a reverse proxy in front to split webhook test and webhook production, as well as terminate ssl.

Hah, figured it out.

wasn’t starting n8n worker and webhook nodes correctly.

:wink: Was just trying to reproduce it and was unable to. Now I at least know why.

Sorry to waste your time!

Just a thought though: perhaps a dashboard of worker and webhook statuses?

Like sidekiq?

Shows RSS, average latency, # of workers, etc.

Or maybe an error if you are running in queue mode and no workers are running?

Thanks!

Hey @bcromie, n8n comes with an optional endpoint providing metrics aimed at Prometheus. You can enable it by setting the environment variable N8N_METRICS=true.

It’s not quite as comprehensive as the dashboard you have in mind though, but might be good starting point.

You might also want to upvote existing feature requests (like this one) or raise your own if you have something else in mind.

1 Like

Woo-Hoo!!

Thanks!

1 Like

@bcromie could you clarify on what the issue was/how you solved this? We are experiencing similar issues and can only seem to resolve it by restarting n8n.

I had setup queue mode, but my init system was starting up the worker nodes without the “worker” argument. So, there wasn’t any worker process to pickup and execute the workflows.

I see, it looks like we aren’t running in queue so I don’t think that our issue is related after all. Appreciate the response though!