Keeps getting "Unknown" — n8n crashing every night

I’m triggering a bunch of flows every night, but they keep going into the unknown state. I’ve attached an error workflow to each of these, but that flow doesn’t alert us about anything.

The logs contain this

Error: The workflow can not be activated because it does not contain any nodes which could start the workflow. Only workflows which have trigger or webhook nodes can be activated.
    at ActiveWorkflowRunner.add (/usr/local/lib/node_modules/n8n/dist/src/ActiveWorkflowRunner.js:285:23)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async /usr/local/lib/node_modules/n8n/dist/src/Server.js:354:21
    at async /usr/local/lib/node_modules/n8n/dist/src/ResponseHelper.js:76:26
ERROR RESPONSE
Error: The workflow can not be activated because it does not contain any nodes which could start the workflow. Only workflows which have trigger or webhook nodes can be activated.
    at ActiveWorkflowRunner.add (/usr/local/lib/node_modules/n8n/dist/src/ActiveWorkflowRunner.js:285:23)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async /usr/local/lib/node_modules/n8n/dist/src/Server.js:354:21
    at async /usr/local/lib/node_modules/n8n/dist/src/ResponseHelper.js:76:26
The session "iimlzwwkw" is not registred.
The session "iimlzwwkw" is not registred.
The session "iimlzwwkw" is not registred.
The session "iimlzwwkw" is not registred.
The session "iimlzwwkw" is not registred.

Then the pod is restarted but the workflow never ran.

Hey @haf!

May I know which version of n8n you’re using? If you’re using an older version, can you update and try with the latest version? We fixed some issues in the latest version.

Let me know how it goes :slight_smile:

Sure, I’ve upgraded from 0.110.3 to 0.113 — does this include your fixes?

Could the crash handler be made to log what was the thing that crashed and/or what is the workflow that doesn’t have a trigger?

Answering your first question, that got fixed in 0.112.0.

I am not sure about your second question. Can you please elaborate?

Ah, great. Well, we upgraded now.

Second question is about this error message:

Error: The workflow can not be activated because it does not contain any nodes which could start the workflow. Only workflows which have trigger or webhook nodes can be activated.

Could this error message be made to log what its context is and what it’s talking about?

Did your issue got resolved after upgrading?

For your second question, the error message says that you are trying to enable a workflow that is not using a trigger node like the Webhook node, Cron node, etc. The workflow you’re trying to activate is using the Start node to start the workflow and hence it can’t be activated.

Yes, I checked this morning and we’re still having issues despite the upgrade. It just dies and is restarted by Kubernetes. This means we manually have to go in and run these workflows every morning. How do I debug this?

Can you share some more information about the workflows that run into the Unknown state? This will help us debug the issue better. If you can share the workflow as well, it will help us replicate your issue. @krynble fixed the issues in the latest version, he has a better understanding of this.

We can share it, yes; could you e-mail me at henrik at logary dot tech, and I’ll send you the workflow? I’m also happy to do a screen share and I can explain the setup better.

We’ve tried to reproduce it ourselves, both by running it manually (works well) and during day time, calling the workflows with the triggering workflow (a workflow that calls localhost:5678 with input data) manually and that works too. The only thing that fails is running it at night (but this is probably a red herring) on a cron timer. But once we’ve restarted n8n we can run them.

We’ve also set up a “echo ‘Alive!’” workflow that runs every 15 minutes just to verify that n8n is actually responsive, and it is, until it is supposed to run the “large” workflows.

Generally, what can cause workflows to exit like this?

  - containerID: docker://d4b5376033e23ac302f9eafb166836f7d5d3ac84828340952bddd0df84c58037
    image: n8nio/n8n:0.113.0
    imageID: docker-pullable://n8nio/[email protected]:421c83fe1df815b6918393ecc05aaf3a64529250c597f78d6aa991543dc1d5df
    lastState:
      terminated:
        containerID: docker://6a43adae9dd376044a86fa1ab6c29797234a7fa0c7aca00868ac55e41a8e98de
        exitCode: 137
        finishedAt: "2021-04-01T20:50:03Z"
        reason: OOMKilled
        startedAt: "2021-04-01T00:00:20Z"

OOM. GG me. OMG.

So this is not enough for n8n to ship a few megabytes of data in parallel:

    resources:
      limits:
        cpu: "1"
        memory: 500Mi
      requests:
        cpu: 10m
        memory: 20Mi

Lesson learned! :slight_smile:

1 Like