[Urgent] n8n docker, self-hosted instance restarting every 3-8 minutes

Describe the issue/error/question

For the past hour, my n8n instance is stuck in a loop of restarting every 3-8 minutes.
There were no changes done, of any kind, in the past few hours.
And there are no Failed workflows in the ‘Workflow Executions’.

The logs don’t show me anything too suspicious, but i could be wrong.
(Could share them in PM)
How to debug this?

Information on your n8n setup

  • n8n version: 0.169.0
  • Database you’re using (default: SQLite): Postgres
  • Running n8n with the execution process [own(default), main]: main
  • Running n8n via [Docker, npm, n8n.cloud, desktop app]: Docker

Hi @shrey-42, I am sorry to hear you’re having trouble here.

For issues specific to your n8n.cloud instance rather than n8n in general, the best course of action would be to reach out via n8n.cloud so we can identify the problematic instance.

Or is this related to a self-hosted instance of yours? What’s the docker log output just before the restart? And what does your configuration look like (how do you run n8n, which environment variables do you set)? Are you doing anything specific before the restart occurs?

Hi @MutedJam , thanks for your quick reply.

This is a self-hosted instance.
Can i send you the details, as well as the log output, via a PM?

Hey @shrey-42, I don’t have a lot of time for work on the forum today unfortunately :frowning:

So if you could just post these details here on the forum (redacting everything confidential of course), a lot more people will be able to help.

Sure.

instance:

  • Docker
  • Postgres
  • DO droplet
  • 4GB RAM, 2vCPU, 50GB
  • v0.169.0
ENV details:

# Folder where data should be saved
DATA_FOLDER=/root/n8n/

# The top level domain to serve from
DOMAIN_NAME=****************

# The subdomain to serve from
SUBDOMAIN=****************

# DOMAIN_NAME and SUBDOMAIN combined decide where n8n will be reachable from
# above example would result in: https://n8n.example.com

# The user name to use for autentication - IMPORTANT ALWAYS CHANGE!
N8N_BASIC_AUTH_USER=****************

# The password to use for autentication - IMPORTANT ALWAYS CHANGE!
N8N_BASIC_AUTH_PASSWORD=****************

# Optional timezone to set which gets used by Cron-Node by default
# If not set New York time will be used'
GENERIC_TIMEZONE=Asia/Calcutta

# The email address to use for the SSL certificate creation
SSL_EMAIL=****************

# Execute In Same Process
EXECUTIONS_PROCESS=main

EXECUTIONS_DATA_SAVE_MANUAL_EXECUTIONS=true

EXECUTIONS_DATA_SAVE_ON_ERROR=all
EXECUTIONS_DATA_SAVE_ON_SUCCESS=all

EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_MAX_AGE=336

EXECUTIONS_TIMEOUT=3599
EXECUTIONS_TIMEOUT_MAX=143990

POSTGRES_USER=****************
POSTGRES_PASSWORD=****************
POSTGRES_DB=n8n
POSTGRES_PORT=5432

POSTGRES_NON_ROOT_USER=****************
POSTGRES_NON_ROOT_PASSWORD=****************

NODE_FUNCTION_ALLOW_BUILTIN=*

NODE_FUNCTION_ALLOW_EXTERNAL=dayjs,lodash,he,uuid,object-deep-compare,validator,gm,node-qpdf,generate-unique-id,pdfkit,jsdom,jsonata,ssh2-sftp-client,axios,winston,form-data,faker,jimp,nod>

# Set the logging level to 'debug'
N8N_LOG_LEVEL=debug

# Set log output to both console and a log file
N8N_LOG_OUTPUT=console,file

# Set a save location for the log file
N8N_LOG_FILE_LOCATION=./n8n/logs/n8n.log

# Set a 50 MB maximum size for each log file
N8N_LOG_FILE_MAXSIZE=50

# Set 60 as maximum number of log files to be kept
N8N_LOG_FILE_MAXCOUNT=60

N8N_PAYLOAD_SIZE_MAX=32

N8N_DEFAULT_BINARY_DATA_MODE=filesystem
N8N_USER_MANAGEMENT_DISABLED=true

  • Difficult to share the logs in public forum as most of the workflow names contain identifying information

Hey @shrey-42, if sharing your logs publicly is a problem you can dm them to me, but I don’t know when I’ll be able to take a closer look yet.

1 Like

Update:
I’ve updated to n8n v0.170.0, around 30 minutes ago, and since then the instance hasn’t restarted.

Monitoring now.

Will get back in touch if it does, with the log files.

2 Likes

Glad to hear, many thanks for confirming! Still odd, but I suppose there’s not much point in spending too much time on further debugging :slight_smile:

Hi @MutedJam ,
i started facing this issue again, some time ago.

Today, managed to retrieve the relevant logs:

Log 1
2022-05-13T12:48:56.006Z | [34mdebug[39m    | [34mProxying request to axios[39m {"file":"NodeExecuteFunctions.js","function":"proxyRequestToAxios"}
2022-05-13T12:49:04.847Z | [34mdebug[39m    | [34mWait tracker querying database for waiting executions[39m {"file":"WaitTracker.js","function":"getwaitingExecutions"}
node:internal/process/promises:279
            triggerUncaughtException(err, true /* fromPromise */);
            ^

Error: This socket has been ended by the other party
    at Socket.writeAfterFIN [as write] (node:net:487:14)
    at JSStreamSocket.doWrite (node:internal/js_stream_socket:175:19)
    at JSStream.onwrite (node:internal/js_stream_socket:33:57)
    at TLSSocket.Socket._final (node:net:457:28)
    at callFinal (node:internal/streams/writable:694:27)
    at prefinish (node:internal/streams/writable:723:7)
    at finishMaybe (node:internal/streams/writable:733:5)
    at TLSSocket.Writable.end (node:internal/streams/writable:631:5)
    at TLSSocket.Socket.end (node:net:631:31)
    at endWritableNT (node:internal/streams/readable:1371:12)
    at processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'EPIPE',
  source: 'socket'
}
ln: /home/node/.n8n: File exists
2022-05-13T12:49:29.743Z | [32minfo[39m     | [32mInitializing n8n process[39m {"file":"start.js"}
2022-05-13T12:49:32.726Z | [34mdebug[39m    | [34mNo codex available for: N8nTrainingCustomerDatastore.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T12:49:32.728Z | [34mdebug[39m    | [34mNo codex available for: N8nTrainingCustomerMessenger.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T12:49:34.526Z | [34mdebug[39m    | [34mNo codex available for: Directus.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T12:49:34.528Z | [34mdebug[39m    | [34mNo codex available for: Test.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T12:49:36.551Z | [34mdebug[39m    | [34mWait tracker querying database for waiting executions[39m {"file":"WaitTracker.js","function":"getwaitingExecutions"}
n8n ready on 0.0.0.0, port 5678
Version: 0.176.0
Log 2
2022-05-13T13:00:51.600Z | [34mdebug[39m    | [34mStart processing node "SplitInBatches"[39m {"node":"SplitInBatches","workflowId":212,"file":"WorkflowExecute.js"}
2022-05-13T13:00:51.601Z | [34mdebug[39m    | [34mRunning node "SplitInBatches" started[39m {"node":"SplitInBatches","workflowId":212,"file":"WorkflowExecute.js"}
2022-05-13T13:00:51.602Z | [34mdebug[39m    | [34mRunning node "SplitInBatches" finished successfully[39m {"node":"SplitInBatches","workflowId":212,"file":"WorkflowExecute.js"}
2022-05-13T13:00:51.602Z | [36mverbose[39m  | [36mWorkflow execution finished successfully[39m {"workflowId":212,"file":"WorkflowExecute.js","function":"processSuccessExecution"}
2022-05-13T13:00:51.603Z | [34mdebug[39m    | [34mExecuting hook (hookFunctionsSave)[39m {"executionId":"692834","workflowId":212,"file":"WorkflowExecuteAdditionalData.js","function":"workflowExecuteAfter"}
2022-05-13T13:00:51.603Z | [34mdebug[39m    | [34mSave execution data to database for execution ID 692834[39m {"executionId":"692834","workflowId":212,"finished":true,"stoppedAt":"2022-05-13T13:00:51.602Z","file":"WorkflowExecuteAdditionalData.js","function":"workflowExecuteAfter"}
2022-05-13T13:00:51.613Z | [34mdebug[39m    | [34mExecuting hook (hookFunctionsPush)[39m {"executionId":"692834","workflowId":212,"file":"WorkflowExecuteAdditionalData.js","function":"workflowExecuteAfter"}
node:internal/process/promises:279
            triggerUncaughtException(err, true /* fromPromise */);
            ^

Error: This socket has been ended by the other party
    at Socket.writeAfterFIN [as write] (node:net:487:14)
    at JSStreamSocket.doWrite (node:internal/js_stream_socket:175:19)
    at JSStream.onwrite (node:internal/js_stream_socket:33:57)
    at TLSSocket.Socket._final (node:net:457:28)
    at callFinal (node:internal/streams/writable:694:27)
    at prefinish (node:internal/streams/writable:723:7)
    at finishMaybe (node:internal/streams/writable:733:5)
    at TLSSocket.Writable.end (node:internal/streams/writable:631:5)
    at TLSSocket.Socket.end (node:net:631:31)
    at endWritableNT (node:internal/streams/readable:1371:12)
    at processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'EPIPE',
  source: 'socket'
}
ln: /home/node/.n8n: File exists
g2022-05-13T13:01:15.440Z | [32minfo[39m     | [32mInitializing n8n process[39m {"file":"start.js"}
2022-05-13T13:01:17.795Z | [34mdebug[39m    | [34mNo codex available for: N8nTrainingCustomerDatastore.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T13:01:17.797Z | [34mdebug[39m    | [34mNo codex available for: N8nTrainingCustomerMessenger.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T13:01:19.418Z | [34mdebug[39m    | [34mNo codex available for: Directus.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T13:01:19.420Z | [34mdebug[39m    | [34mNo codex available for: Test.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T13:01:21.527Z | [34mdebug[39m    | [34mWait tracker querying database for waiting executions[39m {"file":"WaitTracker.js","function":"getwaitingExecutions"}
 n8n ready on 0.0.0.0, port 5678
Version: 0.176.0

Would appreciate your help on this!

Hey @shrey-42, unfortunately I am not sure what could cause this behaviour and have asked internally for help. Are you by any chance running n8n in queue mode?

Nope, not using queue mode.

So the first suspect would be a connection breaking based on the Socket.writeAfterFIN error.

Could you confirm if you are using any of the AMQP, EmailReadImap, Kafka, MQTT, n8n, RabbitMQ, Redis, SSE or Workflow trigger nodes in your active workflows?

From these, using Email, n8n and Workflow trigger nodes.

And does the situation improve if you disable the respective workflows? If so, could you then enable them one by one until the problem occurs in order to identify the problematic one?

That would be a rather tedious task, i assume. Because,

  1. the crashes aren’t that often, so no way to ascertain that disabling a node is stopping the crashes
  2. these 3 nodes are spread over atleast 50+ workflows. Even identifying which workflows are using the nodes would be tricky.

Okay, so if it doesn’t happen that often I’ll run some tests on my own next week to see if I can intentionally break this.

Sorry for the trouble and thanks for sharing these details!

Thanks. Shall look forward to your analysis :slight_smile:

Hey @shrey-42, unfortunately I was unable to reproduce this on my end. Based on this similar report from @baflo I suspect this could be related to the EmailReadImap node. However, since the error message is slightly different, it would be of great help if you could identify the workflows on your end running this node and temporarily disable them.