Problem description and error message
I’ve got an issue, where some executions fail to execute correctly/ fail to correctly save into the execution table of n8n.
These execution are missing the startedAt field, that means they never get executed.
The execution looks like this:
{
"id": 3117422,
"finished": false,
"mode": "webhook",
"retryOf": null,
"retrySuccessId": null,
"startedAt": null,
"stoppedAt": "2025-02-18T08:09:09.594Z",
"waitTill": null,
"status": "error",
"workflowId": "xgp0Axf0smn5COeX",
"deletedAt": null,
"createdAt": "2025-02-18T08:08:15.230Z"
}
As the startedAt date does not get saved, the N8N Editor shows the current time when opening the execution. The Executiond id is the same as the shared json of the execution.
Error in the N8N Editor:
These workflows also cannot be retried as the executiondata is not saved. (it crashes even before starting the workflow.
Logs for this execution from the main instance:
I’ve not checked logs from the workers, as i do not know which worker got these executions:
2025-02-18T08:08:15.237146000Z Enqueued execution 3117422 (job 2385736)
2025-02-18T08:09:09.559918000Z Execution 3117422 (job 2385736) failed
2025-02-18T08:09:09.560073000Z Error: timeout exceeded when trying to connect
2025-02-18T08:09:09.560270000Z at /usr/local/lib/node_modules/n8n/node_modules/pg-pool/index.js:45:11
2025-02-18T08:09:09.560605000Z at PostgresDriver.obtainMasterConnection (/usr/local/lib/node_modules/n8n/node_modules/@n8n/typeorm/driver/postgres/PostgresDriver.js:883:28)
2025-02-18T08:09:09.560850000Z at PostgresQueryRunner.query (/usr/local/lib/node_modules/n8n/node_modules/@n8n/typeorm/driver/postgres/PostgresQueryRunner.js:178:36)
2025-02-18T08:09:09.561036000Z at UpdateQueryBuilder.execute (/usr/local/lib/node_modules/n8n/node_modules/@n8n/typeorm/query-builder/UpdateQueryBuilder.js:83:33)
2025-02-18T08:09:09.561217000Z at ExecutionRepository.setRunning (/usr/local/lib/node_modules/n8n/dist/databases/repositories/execution.repository.js:244:9)
2025-02-18T08:09:09.561380000Z at JobProcessor.processJob (/usr/local/lib/node_modules/n8n/dist/scaling/job-processor.js:87:27)
2025-02-18T08:09:09.561557000Z at Queue.<anonymous> (/usr/local/lib/node_modules/n8n/dist/scaling/scaling.service.js:115:17)
2025-02-18T08:09:09.561749000Z
2025-02-18T08:09:09.579236000Z Problem with execution 3117445: timeout exceeded when trying to connect. Aborting.
2025-02-18T08:09:09.579508000Z timeout exceeded when trying to connect (execution 3117445)
2025-02-18T08:09:09.579631000Z Problem with execution 3117422: timeout exceeded when trying to connect. Aborting.
2025-02-18T08:09:09.579753000Z timeout exceeded when trying to connect (execution 3117422)
Error message returned to webhooks:
{
"message": "Error in workflow"
}
Suspected trigger:
I’ve some workflows running which take about 3 to 8 minutes runntime in which they mostly fetch data from a paginated endpoint and send the data after some modification to a database. (about 40k items with about 50 fields each)
When execution one of these workflows manually (with that on the main instance), then the main instance does not react to any incoming webhooks for ~ a minute.
I’ve checked a few executions and it seems like the issue only occurs when one of these heavy workflows is running. When looking at the logs for such an execution the Executions seems to fail after a few minutes, but a few minutes after failing the logs suggests it executed successfully. It also finishes twice with the same job id?:
2025-02-18T08:00:24.029900000Z Enqueued execution 3117247 (job 2385586)
2025-02-18T08:04:19.821338000Z Problem with execution 3117247: This execution failed to be processed too many times and will no longer retry. To allow this execution to complete, please break down your workflow or scale up your workers or adjust your worker settings.. Aborting.
2025-02-18T08:04:19.822207000Z This execution failed to be processed too many times and will no longer retry. To allow this execution to complete, please break down your workflow or scale up your workers or adjust your worker settings. (execution 3117247)
2025-02-18T08:04:19.822374000Z job stalled more than maxStalledCount (execution 3117247)
2025-02-18T08:04:55.244516000Z Execution 3117247 (job 2385586) finished successfully
2025-02-18T08:09:17.301791000Z Execution 3117247 (job 2385586) finished successfully
Temporary “solution”
It’s not quite a solution, it just hides/removes these failed executions so the do not get shown ontop of all other executions.
Remove all executions from the n8n database whis have errored and have startedAt = null:
Proposed Mid-Term Solutions:
- Optimize heavy workflows (Maybe not always possible) – currently wip for my workflows
- Limit Workers to get max one Execution at a time (not recommended by the Docs)
maybe related:
Information on your n8n setup
- n8n version: 1.78.1
- Database (default: SQLite): Postgres / Redis
- n8n EXECUTIONS_PROCESS setting (default: own, main): main - queue
- Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
- Operating system: Ubuntu 22.04.3 LTS