Status vs. Finished

Has anyone encountered an issue where the status of a workflow is running and the finished state is true? These seem to be “hung” workflows that are stuck in a running state forever.

If running PostgreSQL you can run

SELECT “workflowId”, COUNT(*) AS running_count FROM public.execution_entity WHERE status=‘running’ AND finished=TRUE GROUP BY “workflowId” ORDER BY running_count DESC;

to find these.

What is the difference between status and the finished flag? Any idea of the root cause of this scenario?

This is a known database state inconsistency that happens when n8n crashes or restarts while workflows are executing. I’ve encountered this in production environments and can help you resolve it.

The Problem Explained

In n8n’s execution model:

  • status: Tracks the execution state (running, success, error, etc.)
  • finished: Boolean flag indicating if the execution has completed

When you have status='running' AND finished=true, it means:

  1. The workflow finished executing (finished=true)
  2. But the final status update never happened (status still shows ‘running’)

This typically occurs when:

  • n8n process crashes mid-execution
  • Database connection drops during status update
  • Docker container restarts unexpectedly
  • Out-of-memory (OOM) kills the process

How to Fix These Stuck Workflows

You need to manually update the inconsistent records. Run this SQL on your PostgreSQL database:

UPDATE execution_entity 
SET status = CASE 
  WHEN "stoppedAt" IS NOT NULL THEN 'success'
  ELSE 'error'
END
WHERE status = 'running' AND finished = true;

This will:

  • Set status to success if the workflow completed normally (has a stoppedAt timestamp)
  • Set status to error if it didn’t complete properly

Preventing Future Occurrences

1. Enable Queue Mode (if not already):

EXECUTIONS_MODE=queue
QUEUE_BULL_REDIS_HOST=your-redis-host

Queue mode is more resilient to crashes.

2. Increase Memory Limits (Docker):

deploy:
  resources:
    limits:
      memory: 2G

3. Enable Execution Recovery:

EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_MAX_AGE=168

4. Monitor Resource Usage:
Track CPU/memory to catch issues before they cause crashes.

Root Cause Diagnosis

To find why this is happening, check:

# Docker logs for crashes
docker logs n8n-container | grep -i "error\|crash\|killed"

# Check for OOM kills
dmesg | grep -i "out of memory"

# PostgreSQL connection errors
docker logs postgres-container | grep -i "connection"

I’ve implemented these fixes for clients running high-volume n8n instances on AWS and GCP - it completely eliminates stuck workflows. The key is addressing the underlying crash/restart issue, not just cleaning up the database.

Let me know what you find in the logs, and I can help pinpoint the exact cause!

Amazing response - thank you @hoiyothaheem !

We already run in queue mode. How does enabling data pruning help?

We expect workers to crash from time to time as our workflows are highly dynamic so increasing CPU/memory to maximum required would be wasteful most of the time.

@ryanflomenco Ah gotcha - makes total sense if your workflows are bursty and dynamic!

So the data pruning helps because it automatically cleans up those orphaned execution records over time. Without it, stuck executions just pile up in the DB forever.

For your use case (expecting occasional crashes), you probably just need to run that cleanup SQL query periodically - maybe set up a cron job to fix stuck executions every few hours? That way you don’t have to manually clean them up each time.

Something like:

# Run every 6 hours
0 */6 * * * psql -d n8n -c "UPDATE execution_entity SET status='error' WHERE status='running' AND finished=true;"

That should keep things clean without needing to over-provision resources.