The n8n keeps crashing and going offline

Good evening, how are you?

Often my n8n has been down, the access page is inaccessible, just loading forever, when I try to save some workflows it happens to give an error, then I manage to save, but when I check it ends up that the workflows maintain both the old configuration and the new one, for example, if I set a trigger at 08:35 and then changed it to 8:40, it happened to activate the worflow both times: 8:35 and 8:40

I thought it was due to the number of accumulated executions and because the version was out of date, so I cleaned the old executions (from 570 thousand it dropped to 12 thousand)

I updated the 26 versions that were overdue and now I’m on the most recent one: 1.33.1

I upgraded my droplet on digitalocean too, I currently have 8 GB Memory / 4 Intel vCPUs / 160 GB Disk / NYC1 (about 2 to 4x larger than the initial size)

When there is an error, in Digitalocean the CPU reaches 30% usage, I believe the problem is not there

Anyway, I did all the updates and upgrades, but sometimes, out of the blue, even without many workflows running, the site simply goes offline, without receiving data, nor being able to access it, forcing me to turn the droplet off and on, What can it be?

Information on your n8n setup

  • n8n version: 1.33.1
  • Database (default: SQLite): nĂŁo
  • n8n EXECUTIONS_PROCESS setting (default: own, main): nĂŁo
  • Running n8n via (Docker, npm, n8n cloud, desktop app): docker compose
  • Operating system: digital ocean

image

eternal looping, dont open

It only resolves when I turn the droplet off and on

Hey @PedroHenrique26,

Have you checked the logs yet to see if there are any errors?

How do I check this?

It depends on how you have configured that. If you have used only docker, then with docker logs <container_name> command

If you have configured it with some cloud service, then that service should provide logs.

I woke up and it was offline, I went to the logs inside the digitalocean console and searched for “n8n” and with “error” or “disconnected” and these logs appeared
Is it possible to find out the reason from the image?

When I was loading the logs, I noticed that there was a lot of the event below up to the current date, but when it finished, when I searched for the keywords, it didn’t appear, but I managed to copy a piece from when I was copying, the date is older, but I noticed that the The same thing happened with yesterday and today’s date, if it has any relevance:

It seems these logs are from the docker host instance. Can you execute the command docker ps and see if there is a docker container?

If so, then you can use the docker logs <container_name> to see the n8n logs.

1 Like

after the docker ps

after the docker logs root_n8n_1

Well… the error is not very specific but seems there is a lot of executions in the queue.

Can you check the memory state? n8n mostly uses memory instead of CPU/disk. But 8GB usually should be fine.

The only thing I can advice for now is to collect the logs If the n8n will go offline again and provide them here, so maybe there will be something else.
You can export logs to the /tmp/n8n.log file with the command:

docker logs --since 1h root_n8n_1 > /tmp/n8n.log

And you can replace the 1h in the --since key with actual time when the n8n was down (e.g. 1-2-3h ago or more). Because the container will restart and the needed logs may be hidden.

Thousands of the same phrase that I had sent earlier appeared, where they were unable to identify what it was, but a different phrase also appeared:

Removed triggers and pollers for workflow “22”
2024-03-26T12:26:09.204Z [Rudder] info: Your message must be < 32kb. This is currently surfaced as a warning. Please update your code {
userId: ‘9b8c11ec1d7722d48b6973dbf70400faa913ae64e46fef1f40a9b1d654824ea2#3af181f5-9769-43d2-aa32-64ab62fd7fe7’,
event: ‘User saved workflow’,
properties: {
user_id: ‘3af181f5-9769-43d2-aa32-64ab62fd7fe7’,
workflow_id: ‘22’,

Earlier I imagined that the problem could be this flow, and when I deactivated it, until then the system went offline again, it is programmed to happen every hour, and the problem always came after it started working. …

Before the update there was no problem, what could it be?

I noticed that some of my flows had some changes in the nomenclature of the webhook trigger, instead of having the name “Webhook”, some had “Webhook1” or “Webhook2” which meant that the information did not continue in the flow, as the origin was unknown, so I changed the nomenclature on them

Was there any change with the version update that impacted something in this workflow 22? How do I share it with you to check if this really is the problematic workflow, and how to solve it?

Can you share the workflow 22? I’m wondering why it has so many nodes.
You can copy the workflow with ctrl+a > ctrl+c and then paste it after pressing the button </>
image

But there is a limits for the message length, so it may be required to copy it to the file and share the file from any storage.

I used a cron node to activate every hour every 10 minutes, between 7am and 11pm

The flow consists of sending some information, waiting about 50 seconds, and then checking custom fields in “BotConversa” via http request

If the Custom Field has a value different from that stipulated, it means that there is an error in “BotConversa”, it usually means that my WhatsApp number has been banned or disconnected, in this case it makes an http request with a call to my cell phone via zenvia informing this

Each line refers to a WhatsApp number, totaling around 30 lines with the same function

I sent the first 5 lines of the flow by message

Oh…

Well… that definitely may be an issue. Can you show how the Schedule node configured?
In total you have about 30 minutes of wait time + some response time from the HTTP nodes, so the flow may work for more than 40 minutes long or even longer.

I’m wondering if there are overlaps happen (when the first execution is still working and the next one started to work) that may lead to some issues with performance.

But generally it’s better to redesign the workflow to exclude the duplicates and common patterns (you have basically one line with the logic and 30+ lines with the specific parameters to that logic). I can help with optimizing that flow, but it will require some amount of time.

@Jon or @bartv when we updating the n8n to the newer version, will the nodes automatically be updated to the latest version or they keep the old version?

before the newest update (it had 26 accumulated) they work normally, I don’t know if anything changed due to the update This, normally the flow lasts around 30 minutes, as it only happens every hour, I have never had an overlap but also, since I did the update (yesterday) I was never able to complete the flow, we can rule out this issue of overlapping, as the system has already crashed from the beginning

I deleted the trigger and tried to activate it manually, when it started (before reaching the first node) there was a problem and it went offline

It’s only a suggestion, but maybe n8n or some of its sub-modules have a limitation on how big the workflow could be.

btw what database are you using?

I duplicated the workflow, and removed the nodes, left 10 rows of nodes, manually activated the workflow, it worked perfectly
So I copied another 10 lines from the other workflow to test, and ran it manually

In other words, with 20 lines it worked perfectly

But when I tried to place the other lines, it gave me an error again, before even starting the workflow, just clicking stop start crashed…

Before the update it worked perfectly, all day, every day

I don’t understand why the error occurs right at the beginning of execution, shouldn’t the other data in the flow be loaded over time?

I’m a layman, I don’t know how to check the database, in fact I barely know what it means

I use n8n via docker compose on digitalocean and access the droplet via the web console, how do I check which database is for you?

To start the workflow, n8n must validate the workflow in the beginning (It won’t let you activate the workflow if it has any errors), so there can be an issue, but it seems more like a bug.

Plus if you are using the default database, then it also can cause issues, but as you had tons of executions, seems it’s not.

I duplicated the flow and divided it into 2 parts, with 16 lines and 15 lines respectively, one starting at 10 minutes of each hour, and the second at 30 minutes of each hour

In the first hour both worked correctly, without the n8n going offline

I believe the problem is more in processing the number of lines, rather than a specific part of the workflow

Is the output from some nodes rather big? n8n has an environment N8N_PAYLOAD_SIZE_MAX that is by default 16 (MB)