Execution runs indefinitely

Describe the problem/error/question

I have a very simply workflow.

Manual trigger → Execute query using Microsoft SQL Server. A few issues:

  1. When clicking Execute Node on Microsoft SQL server mode, it continues to run and it seems the previous node (manual trigger doesn’r run)
  2. When clicking Execute workflow, it ran once but workflow didn’t save in execution log.
  3. I tried running it again but since the first run, the workflow just runs indefinitely and the Manual trigger doesn’t seem to execute at all, just stays grey.

What is the error message (if any)?

None

Share the output returned by the last node

No output

Information on your n8n setup

  • n8n version: 0.224.4
  • Database (default: SQLite): MySQL
  • n8n EXECUTIONS_PROCESS setting (default: own, main): unsure
  • Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
  • Operating system:

Hi @pbirdie, welcome to the community!

I am sorry you’re having trouble. I tried reproducing this with the below docker compose setup:

services:
  n8n:
    image: n8nio/n8n:0.224.4
    ports:
      - 5678:5678
    environment:
      - N8N_USER_MANAGEMENT_DISABLED=true
  mssql:
    image: mcr.microsoft.com/mssql/server:2019-latest
    environment:
      - ACCEPT_EULA=Y
      - SA_PASSWORD=yourStrong(!)Password
      - MSSQL_PID=Express
    ports:
      - 1433:1433

I have then added the below workflow:

As for the behaviour you have reported:

  1. This appears to be different for me. When hitting the Execute Node button on the SQL Server node, it executes as expected for me:

  1. This is expected. By default, manual workflow executions are not stored in the execution list. You can however configure this behaviour in the workflow settings:

Once this setting is updated, the execution should appear in the list:

  1. See 1 - this is working fine for me.

Based on the behavior you have reported I suspect what’s happening for your items 1 + 3 is that your n8n instance isn’t able to send status updates to the frontend. This happens through Server Sent Events which might not be processed properly by your reverse proxy (or other components sitting between your n8n instance and your browser).

So perhaps you can try running your workflow locally rather than on a webserver? If this works for you locally, but not on your webserver you might need to check the relevant configuration on your web server. You could also try switching to Websockets using the N8N_PUSH_BACKEND=websocket environment variable.

1 Like

OKay interesting.

I have saved manual executions. I can see the executions have ran successfully when looking at the execution list, however, the workflow it self will just appear as running with nodes still grey.

It seems to only work once in the UI, then subsequent executions do not…

Would this be related to point 3 you have listed?

@jan

I found this thread which explains my issue much better and seems to be the same.

Any insight into this?

Hey @pbirdie,

Can you share more information on your setup? I take it you are probably using a reverse proxy if you bypass the proxy and connect directly do you have the same issue? We have seen this issue happen with some reverse proxies so it could just be a config issue.

Here is some more information on our set up:

  • self hosted docker image
  • We are not using any CDN. Direct to AWS ALB to ECS containers
  • we are using MySQL database
  • require VPN access to UI

Further description of issue:

  • when first loading workflow connecting to VPN, manual execution works fine 2-3 times consecutively (sometimes) but after that workflow will just hang in execution state. If I stop the execution, the workflow shows as complete - they also appear in executions list
  • Activating and deactivating workflows - workflows continue to run on server (based on execution list) even when deactivating workflow

I am seeing the following logs in cluster logs but not sure if related.

2023-05-23T22:08:45.112471Z 129178 [Note] [MY-010914] [Server] Aborted connection 129178 to db: ‘unconnected’ user: ‘rdsadmin’ host: ‘localhost’ (Got an error reading communication packets). (sql_connect.cc:835)

When stopping workflow after it’s in hanging state when running manually, I see the following error in dev console:

xhr.js:187 POST https://n8n.test.mysale.team/rest/executions-current/159/stop 500

Also, as mentioned above, stopping the execution when hanging can show the workflow executed however after a few runs, this doesn’t work and I get the error:

Problem Stopping Execution: The execution ID # could not be found.

The only way to fix this is either refreshing multiple times, disconnecting VPN and reconnecting OR a combination of those.

We really need this solved ASAP. Unfortunately we have been trying to fix for over a week and we can’t find clear solutions in the community.

Hey @pbirdie,

I would probably look into that MySQL error as n8n will need to be able to talk to it to function so there could be part of the issue there, I have seen the execution cancellation error and infinite spinning a lot and it is normally a proxy issue.

As a test can you bypass ALB and use the IP for the n8n instance and see if that works, I have a feeling this will solve the problem unless you are using Lightsail which has other issues.

Did you also try setting N8N_PUSH_BACKEND to websocket in the end?

1 Like

We did try that but it did not work.

I noticed that on error in the workflow, if I stopped it while it was in hanging state, I would have to check the executions list to see the workflow actually ran despite it saying Error: workflow # could not be found.
However, on success, it seemed that by stopping workflow in hanging state, it would display the nodes in green in the workflow UI itself. Very strange…

We are going to re-deploy all resources again using Postgres and reference this:

I hope this can help us isolate issues to infrastructure used by most in the community to get to the bottom of it. We can’t see any security permissions in terms of ALB interfering with n8n…

1 Like

@MutedJam looks like the new deployment to postgres has fixed the issue…

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.