[Urgent] n8n docker, self-hosted instance restarting every 3-8 minutes

shrey-42 · March 28, 2022, 6:05am

Describe the issue/error/question

For the past hour, my n8n instance is stuck in a loop of restarting every 3-8 minutes.
There were no changes done, of any kind, in the past few hours.
And there are no Failed workflows in the ‘Workflow Executions’.

The logs don’t show me anything too suspicious, but i could be wrong.
(Could share them in PM)
How to debug this?

Information on your n8n setup

n8n version: 0.169.0
Database you’re using (default: SQLite): Postgres
Running n8n with the execution process [own(default), main]: main
Running n8n via [Docker, npm, n8n.cloud, desktop app]: Docker

MutedJam · March 28, 2022, 6:17am

Hi @shrey-42, I am sorry to hear you’re having trouble here.

For issues specific to your n8n.cloud instance rather than n8n in general, the best course of action would be to reach out via n8n.cloud so we can identify the problematic instance.

Or is this related to a self-hosted instance of yours? What’s the docker log output just before the restart? And what does your configuration look like (how do you run n8n, which environment variables do you set)? Are you doing anything specific before the restart occurs?

shrey-42 · March 28, 2022, 6:22am

Hi @MutedJam , thanks for your quick reply.

This is a self-hosted instance.
Can i send you the details, as well as the log output, via a PM?

MutedJam · March 28, 2022, 6:24am

Hey @shrey-42, I don’t have a lot of time for work on the forum today unfortunately

So if you could just post these details here on the forum (redacting everything confidential of course), a lot more people will be able to help.

shrey-42 · March 28, 2022, 6:37am

Sure.

instance:

Docker
Postgres
DO droplet
4GB RAM, 2vCPU, 50GB
v0.169.0

ENV details:


# Folder where data should be saved
DATA_FOLDER=/root/n8n/

# The top level domain to serve from
DOMAIN_NAME=****************

# The subdomain to serve from
SUBDOMAIN=****************

# DOMAIN_NAME and SUBDOMAIN combined decide where n8n will be reachable from
# above example would result in: https://n8n.example.com

# The user name to use for autentication - IMPORTANT ALWAYS CHANGE!
N8N_BASIC_AUTH_USER=****************

# The password to use for autentication - IMPORTANT ALWAYS CHANGE!
N8N_BASIC_AUTH_PASSWORD=****************

# Optional timezone to set which gets used by Cron-Node by default
# If not set New York time will be used'
GENERIC_TIMEZONE=Asia/Calcutta

# The email address to use for the SSL certificate creation
SSL_EMAIL=****************

# Execute In Same Process
EXECUTIONS_PROCESS=main

EXECUTIONS_DATA_SAVE_MANUAL_EXECUTIONS=true

EXECUTIONS_DATA_SAVE_ON_ERROR=all
EXECUTIONS_DATA_SAVE_ON_SUCCESS=all

EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_MAX_AGE=336

EXECUTIONS_TIMEOUT=3599
EXECUTIONS_TIMEOUT_MAX=143990

POSTGRES_USER=****************
POSTGRES_PASSWORD=****************
POSTGRES_DB=n8n
POSTGRES_PORT=5432

POSTGRES_NON_ROOT_USER=****************
POSTGRES_NON_ROOT_PASSWORD=****************

NODE_FUNCTION_ALLOW_BUILTIN=*

NODE_FUNCTION_ALLOW_EXTERNAL=dayjs,lodash,he,uuid,object-deep-compare,validator,gm,node-qpdf,generate-unique-id,pdfkit,jsdom,jsonata,ssh2-sftp-client,axios,winston,form-data,faker,jimp,nod>

# Set the logging level to 'debug'
N8N_LOG_LEVEL=debug

# Set log output to both console and a log file
N8N_LOG_OUTPUT=console,file

# Set a save location for the log file
N8N_LOG_FILE_LOCATION=./n8n/logs/n8n.log

# Set a 50 MB maximum size for each log file
N8N_LOG_FILE_MAXSIZE=50

# Set 60 as maximum number of log files to be kept
N8N_LOG_FILE_MAXCOUNT=60

N8N_PAYLOAD_SIZE_MAX=32

N8N_DEFAULT_BINARY_DATA_MODE=filesystem
N8N_USER_MANAGEMENT_DISABLED=true

Difficult to share the logs in public forum as most of the workflow names contain identifying information

MutedJam · March 28, 2022, 6:51am

Hey @shrey-42, if sharing your logs publicly is a problem you can dm them to me, but I don’t know when I’ll be able to take a closer look yet.

shrey-42 · March 28, 2022, 7:07am

Update:
I’ve updated to n8n v0.170.0, around 30 minutes ago, and since then the instance hasn’t restarted.

Monitoring now.

Will get back in touch if it does, with the log files.

MutedJam · March 29, 2022, 7:29am

Glad to hear, many thanks for confirming! Still odd, but I suppose there’s not much point in spending too much time on further debugging

shrey-42 · May 13, 2022, 2:14pm

Hi @MutedJam ,
i started facing this issue again, some time ago.

Today, managed to retrieve the relevant logs:

Log 1

2022-05-13T12:48:56.006Z | [34mdebug[39m    | [34mProxying request to axios[39m {"file":"NodeExecuteFunctions.js","function":"proxyRequestToAxios"}
2022-05-13T12:49:04.847Z | [34mdebug[39m    | [34mWait tracker querying database for waiting executions[39m {"file":"WaitTracker.js","function":"getwaitingExecutions"}
node:internal/process/promises:279
            triggerUncaughtException(err, true /* fromPromise */);
            ^

Error: This socket has been ended by the other party
    at Socket.writeAfterFIN [as write] (node:net:487:14)
    at JSStreamSocket.doWrite (node:internal/js_stream_socket:175:19)
    at JSStream.onwrite (node:internal/js_stream_socket:33:57)
    at TLSSocket.Socket._final (node:net:457:28)
    at callFinal (node:internal/streams/writable:694:27)
    at prefinish (node:internal/streams/writable:723:7)
    at finishMaybe (node:internal/streams/writable:733:5)
    at TLSSocket.Writable.end (node:internal/streams/writable:631:5)
    at TLSSocket.Socket.end (node:net:631:31)
    at endWritableNT (node:internal/streams/readable:1371:12)
    at processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'EPIPE',
  source: 'socket'
}
ln: /home/node/.n8n: File exists
2022-05-13T12:49:29.743Z | [32minfo[39m     | [32mInitializing n8n process[39m {"file":"start.js"}
2022-05-13T12:49:32.726Z | [34mdebug[39m    | [34mNo codex available for: N8nTrainingCustomerDatastore.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T12:49:32.728Z | [34mdebug[39m    | [34mNo codex available for: N8nTrainingCustomerMessenger.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T12:49:34.526Z | [34mdebug[39m    | [34mNo codex available for: Directus.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T12:49:34.528Z | [34mdebug[39m    | [34mNo codex available for: Test.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T12:49:36.551Z | [34mdebug[39m    | [34mWait tracker querying database for waiting executions[39m {"file":"WaitTracker.js","function":"getwaitingExecutions"}
n8n ready on 0.0.0.0, port 5678
Version: 0.176.0

Log 2

2022-05-13T13:00:51.600Z | [34mdebug[39m    | [34mStart processing node "SplitInBatches"[39m {"node":"SplitInBatches","workflowId":212,"file":"WorkflowExecute.js"}
2022-05-13T13:00:51.601Z | [34mdebug[39m    | [34mRunning node "SplitInBatches" started[39m {"node":"SplitInBatches","workflowId":212,"file":"WorkflowExecute.js"}
2022-05-13T13:00:51.602Z | [34mdebug[39m    | [34mRunning node "SplitInBatches" finished successfully[39m {"node":"SplitInBatches","workflowId":212,"file":"WorkflowExecute.js"}
2022-05-13T13:00:51.602Z | [36mverbose[39m  | [36mWorkflow execution finished successfully[39m {"workflowId":212,"file":"WorkflowExecute.js","function":"processSuccessExecution"}
2022-05-13T13:00:51.603Z | [34mdebug[39m    | [34mExecuting hook (hookFunctionsSave)[39m {"executionId":"692834","workflowId":212,"file":"WorkflowExecuteAdditionalData.js","function":"workflowExecuteAfter"}
2022-05-13T13:00:51.603Z | [34mdebug[39m    | [34mSave execution data to database for execution ID 692834[39m {"executionId":"692834","workflowId":212,"finished":true,"stoppedAt":"2022-05-13T13:00:51.602Z","file":"WorkflowExecuteAdditionalData.js","function":"workflowExecuteAfter"}
2022-05-13T13:00:51.613Z | [34mdebug[39m    | [34mExecuting hook (hookFunctionsPush)[39m {"executionId":"692834","workflowId":212,"file":"WorkflowExecuteAdditionalData.js","function":"workflowExecuteAfter"}
node:internal/process/promises:279
            triggerUncaughtException(err, true /* fromPromise */);
            ^

Error: This socket has been ended by the other party
    at Socket.writeAfterFIN [as write] (node:net:487:14)
    at JSStreamSocket.doWrite (node:internal/js_stream_socket:175:19)
    at JSStream.onwrite (node:internal/js_stream_socket:33:57)
    at TLSSocket.Socket._final (node:net:457:28)
    at callFinal (node:internal/streams/writable:694:27)
    at prefinish (node:internal/streams/writable:723:7)
    at finishMaybe (node:internal/streams/writable:733:5)
    at TLSSocket.Writable.end (node:internal/streams/writable:631:5)
    at TLSSocket.Socket.end (node:net:631:31)
    at endWritableNT (node:internal/streams/readable:1371:12)
    at processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'EPIPE',
  source: 'socket'
}
ln: /home/node/.n8n: File exists
g2022-05-13T13:01:15.440Z | [32minfo[39m     | [32mInitializing n8n process[39m {"file":"start.js"}
2022-05-13T13:01:17.795Z | [34mdebug[39m    | [34mNo codex available for: N8nTrainingCustomerDatastore.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T13:01:17.797Z | [34mdebug[39m    | [34mNo codex available for: N8nTrainingCustomerMessenger.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T13:01:19.418Z | [34mdebug[39m    | [34mNo codex available for: Directus.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T13:01:19.420Z | [34mdebug[39m    | [34mNo codex available for: Test.node.js[39m {"file":"LoadNodesAndCredentials.js","function":"addCodex"}
2022-05-13T13:01:21.527Z | [34mdebug[39m    | [34mWait tracker querying database for waiting executions[39m {"file":"WaitTracker.js","function":"getwaitingExecutions"}
 n8n ready on 0.0.0.0, port 5678
Version: 0.176.0

Would appreciate your help on this!

MutedJam · May 13, 2022, 2:21pm

Hey @shrey-42, unfortunately I am not sure what could cause this behaviour and have asked internally for help. Are you by any chance running n8n in queue mode?

shrey-42 · May 13, 2022, 2:23pm

Nope, not using queue mode.

MutedJam · May 13, 2022, 2:33pm

So the first suspect would be a connection breaking based on the Socket.writeAfterFIN error.

Could you confirm if you are using any of the AMQP, EmailReadImap, Kafka, MQTT, n8n, RabbitMQ, Redis, SSE or Workflow trigger nodes in your active workflows?

shrey-42 · May 13, 2022, 2:39pm

From these, using Email, n8n and Workflow trigger nodes.

MutedJam · May 13, 2022, 2:42pm

And does the situation improve if you disable the respective workflows? If so, could you then enable them one by one until the problem occurs in order to identify the problematic one?

shrey-42 · May 13, 2022, 2:45pm

That would be a rather tedious task, i assume. Because,

the crashes aren’t that often, so no way to ascertain that disabling a node is stopping the crashes
these 3 nodes are spread over atleast 50+ workflows. Even identifying which workflows are using the nodes would be tricky.

MutedJam · May 13, 2022, 2:46pm

Okay, so if it doesn’t happen that often I’ll run some tests on my own next week to see if I can intentionally break this.

Sorry for the trouble and thanks for sharing these details!

shrey-42 · May 13, 2022, 2:47pm

Thanks. Shall look forward to your analysis

MutedJam · May 17, 2022, 8:12am

Hey @shrey-42, unfortunately I was unable to reproduce this on my end. Based on this similar report from @baflo I suspect this could be related to the EmailReadImap node. However, since the error message is slightly different, it would be of great help if you could identify the workflows on your end running this node and temporarily disable them.

shrey-42 · June 3, 2022, 1:43am

Hi @MutedJam , any resolution regarding this?
The issue is still persisting.

MutedJam · June 3, 2022, 8:22am

Hi @shrey-42, I am afraid not. As per my previous message I was not able to reproduce this unfortunately. So it would be great if in a next step you could confirm if this is indeed caused by the EmailReadImap node by disabling the respective workflows and confirming whether the behaviour persists.