Cron job workflows connecting to postgresql are failing at random intervals

vibinchander · March 13, 2023, 7:50am

We have twenty cron jobs actively running at an interval of 5 to 9 mins each. We notice that these cron jobs fail at random intervals of a day. Sometimes it does not happen but sometimes it happens frequently. What is the best way to troubleshoot at which point of the workflow we receive the error?

Workflow setup is simple → Cron configured for every 5 minutes → postgreSQL node runs one delete Query and one insert query.

It does not throw any specific error message for now, it simply mentions that workflow has failed. Any pointers to understand how to configure the failure for the workflow will also help us.

Please share the workflow

Information on your n8n setup

n8n version:0.214.0
Database you’re using (default:PostgreSQL):
Running n8n with the execution process [own(default)]:
Running n8n via [npm]:

Jon · March 13, 2023, 12:14pm

Hey @vibinchander,

Do you already have execution logging enabled for your workflows? It might be worth checking the console output of n8n as well as it may show something there but I would probably start with an upgrade to see if that helps and maybe swapping from every X to Cron to see if that helps.

vibinchander · April 3, 2023, 5:19am

@Jon - appreciate the response, I will change the every X to Cron and see if this is getting mitigated.

The console log error often times say that it cannot connect to the DB, connection error. Is there an expected issue in the connectivity of running N8N in the docker? vs. Running N8N as NPM?

Jon · April 3, 2023, 9:59am

Hey @vibinchander,

It shouldn’t matter if you are using npm or docker, What is the full connection error?

Leandro_Hoffmann · April 20, 2023, 2:29pm

After more investigation, we got from the logs this error

2023-04-20T15:26:01+01:00 ENOENT: no such file or directory, stat '/home/node/.n8n/binaryData/meta/binarymeta_1309983_1682000743463'

vibinchander · April 21, 2023, 1:39pm

Thanks for sharing the log details @Leandro_Hoffmann.
I tried changing the expression from using every x to corn job expression as you suggested @Jon but it did not work. So any suggestions is much appreciated.

Jon · April 21, 2023, 2:21pm

Hey @vibinchander and @Leandro_Hoffmann,

That error looks unrelated to the postgres node, Is there more to the workflow that has not been shared? I would expect that error to appear if you were maybe working with files.

Leandro_Hoffmann · April 24, 2023, 10:00am

Here is the workflow for reference.

We have ~20 of this workflow, each one runs every 5 min and just trigger the postgres node that queries DB. There is no file access.

Jon · April 24, 2023, 10:07am

@Leandro_Hoffmann I wouldn’t expect that workflow to cause that error, During your investigation what led to you finding that line and linking it to that workflow? I can’t see anything in the flow that would cause that.

When the workflow fails what does the workflow execution log say? If it is failing there should be something for that in the UI.

Leandro_Hoffmann · April 24, 2023, 10:17am

When we manually run, it works, when we leave for the cron to run, sometimes it works, some times we got the error I sent earlier.

We saw this morning that if we try to click retry one execution that failed. we got this error on logs:

The first is the cron error when worflow fails, the other 2 are us trying to reprocess the failed execution. If then we go to the workflow and manually trigger, it works.

Jon · April 24, 2023, 10:23am

So when it fails to run from the cron what does the execution log say in the n8n ui? It sounds like there is possibly 2 different issues here. Once we know what the first issue is we can then look into the second one so that things don’t get mixed up.

For part of the file error though is n8n configured to run in queue mode?

Leandro_Hoffmann · April 24, 2023, 10:51am

Its not on queue mode. We have only one instance of n8n doing all. It never reaches cpu or memory limits. We cannot see any error log from the executiion failed.
Here is a print screen of the execution for one that failed

Jon · April 24, 2023, 11:51am

Hey @Leandro_Hoffmann,

That is the retry can you try one of the others to see if that shows a different message?

Leandro_Hoffmann · April 24, 2023, 2:27pm

All of them shows no error message or logs. They are all the cron trigger that failed, if you check times, it runs every 5 min, but some worked

Jon · April 24, 2023, 3:16pm

That is most unusual, I would have expected anything to be running for more than a second to have more output.

Can you share the environment options that are currently set?

Leandro_Hoffmann · April 25, 2023, 9:32am

Here are the options we currently use:

N8N Version: 0.222.3

GENERIC_TIMEZONE : *******
N8N_BASIC_AUTH_ACTIVE : true
N8N_BASIC_AUTH_PASSWORD : ********
N8N_BASIC_AUTH_USER : ******
N8N_DEFAULT_BINARY_DATA_MODE : filesystem
N8N_PAYLOAD_SIZE_MAX : 32
WEBHOOK_URL : *********

Jon · April 25, 2023, 10:04am

If you change the default binary data mode to default does that change anything?

Leandro_Hoffmann · April 25, 2023, 12:34pm

Changed now, will let you know if it happens again, also updated n8n to latest

Leandro_Hoffmann · April 26, 2023, 8:16am

Issue still happening unfortunately.
Now on the latest version we are seeing some extra logs:

Jobs are failing with no logs, but now, if I retry a failed job, it works. It does not show the error:
Cannot read properties of undefined (reading 'nodeExecutionStack')

Jon · April 26, 2023, 9:31am

Hey @Leandro_Hoffmann,

This is good progress, Can you enable debug logging: Logging - n8n Documentation that should show a bit more information.

I wouldn’t expect to see slow queries on a postgres database so I guess just to confirm… in the issue template the database field was left with just Postgres in it. Are you using Postgres for n8n as well or just the default sqlite database? If it is sqlite that could explain the issue as it gets slower as it gets bigger and I noticed there were no data pruning options in the env vars you sent over.