N8N Scaling problems - Cannot Get

Describe the issue/error/question

Hello everyone,
I am running n8n in a docker compose setup on an EC2 instance (Amazon Linux 2 - ECS Optimized).
I was able to make it work in a normal (non-queued) mode and when I started to do the scaling, replace the db engine to Postgres successfully - my endpoints were accessible from the internet and everything worked correctly.

Then I followed the rest of the guide to finalize the scaling process but got stuck with “cannot get” error message when trying to access any production webhook, with no idea how to progress.

I noticed some Postgres and Redis errors and warnings being logged after spinning docker up, which I since resolved but that didn’t help with this particular issue. One Redis warning that is still there is:

redis_1      | 1:C 20 Sep 2022 08:56:56.477 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo                                                                                                                        redis_1      | 1:C 20 Sep 2022 08:56:56.477 # Redis version=7.0.4, bits=64, commit=00000000, modified=0, pid=1, just started                                                                                       redis_1      | 1:C 20 Sep 2022 08:56:56.477 # Configuration loaded                                                                                                                                                 redis_1      | 1:M 20 Sep 2022 08:56:56.478 * monotonic clock: POSIX clock_gettime                                                                                                                                 redis_1      | 1:M 20 Sep 2022 08:56:56.482 * Running mode=standalone, port=6379.                                                                                                                                  redis_1      | 1:M 20 Sep 2022 08:56:56.483 # Server initialized                                                                                                                                                   redis_1      | 1:M 20 Sep 2022 08:56:56.483 # WARNING Your system is configured to use the 'xen' clocksource which might lead to degraded performance. Check the result of the [slow-clocksource] system check: run 'redis-server --check-system' to check if the system's clocksource isn't degrading performance.                                                                                                                   redis_1      | 1:M 20 Sep 2022 08:56:56.484 * Ready to accept connections 

Currently my editor is accessible via internet, and test webhooks also work - thus indicating that the issue lies in either Redis or routing/load balancing(Traefik). Also from my understanding, Cannot GET is a node error meaning the HTTP method for an endpoint is not defined in the application code.

Error and config details bewlow.
Thanks for looking into this!
Michal

What is the error message (if any)?

Cannot GET /path_to_webhook

when running docker log on the individual containers, I find some weird behavior in:

  • webhook listener repeats these messages:
2022-09-16T20:48:30.996Z [Rudder] debug: in flush                                                                                                                                                                  2022-09-16T20:48:30.999Z [Rudder] debug: batch size is 1                                                                                                                                                           2022-09-16T20:48:51.513Z [Rudder] debug: in flush                                                                                                                                                                  2022-09-16T20:48:51.513Z [Rudder] debug: cancelling existing timer...                                                                                                                                              2022-09-16T20:48:51.513Z [Rudder] debug: queue is empty, nothing to flush                                                                                                                                          2022-09-17T02:48:30.995Z [Rudder] debug: no existing flush timer, creating new one
  • worker process repeats these error:
Starting n8n worker...                                                                                                                                                                                             2022-09-16T21:16:19.302Z | debug    | No codex available for: N8nTrainingCustomerDatastore.node.js "{ file: 'LoadNodesAndCredentials.js', function: 'addCodex' }"                                                  2022-09-16T21:16:19.307Z | debug    | No codex available for: N8nTrainingCustomerMessenger.node.js "{ file: 'LoadNodesAndCredentials.js', function: 'addCodex' }"                                                                                                                                                                                                                                                                     n8n worker is now ready                                                                                                                                                                                             * Version: 0.192.0                                                                                                                                                                                                 * Concurrency: 10                                                                                                                                                                                                                                                                                                                                                                                                                    2022-09-16T21:16:21.410Z | error    | Error from queue:  "{\n  command: {\n    name: 'evalsha',\n    args: [\n      'ff9c18634832b0b4115a19b4de5f4788a7cfbd4e',\n      '7',\n      'bull:jobs:stalled',\n      'bull:jobs:wait',\n      'bull:jobs:active',\n      'bull:jobs:failed',\n      'bull:jobs:stalled-check',\n      'bull:jobs:meta-paused',\n      'bull:jobs:paused',\n      '1',\n      'bull:jobs:',\n      '1663362981409',\n      '30000'\n    ]\n  },\n  file: 'worker.js'\n}"                                                                                                                                                        2022-09-16T21:16:21.412Z | error    | Error from queue:  "{\n  command: {\n    name: 'evalsha',\n    args: [\n      'ff9c18634832b0b4115a19b4de5f4788a7cfbd4e',\n      '7',\n      'bull:jobs:stalled',\n      'bull:jobs:wait',\n      'bull:jobs:active',\n      'bull:jobs:failed',\n      'bull:jobs:stalled-check',\n      'bull:jobs:meta-paused',\n      'bull:jobs:paused',\n      '1',\n      'bull:jobs:',\n      '1663362981409',\n      '30000'\n    ]\n  },\n  file: 'worker.js'\n}"                                                                                                                                                        /usr/local/lib/node_modules/n8n/node_modules/redis-parser/lib/parser.js:179                                                                                                                                            return new ReplyError(string)                                                                                                                                                                                             ^                                                                                                                                                                                                                                                                                                                                                                                                                          ReplyError: READONLY You can't write against a read only replica. script: ff9c18634832b0b4115a19b4de5f4788a7cfbd4e, on @user_script:30.                                                                                at parseError (/usr/local/lib/node_modules/n8n/node_modules/redis-parser/lib/parser.js:179:12)                                                                                                                     at parseType (/usr/local/lib/node_modules/n8n/node_modules/redis-parser/lib/parser.js:302:14) {                                                                                                                  command: {                                                                                                                                                                                                           name: 'evalsha',                                                                                                                                                                                                   args: [                                                                                                                                                                                                              'ff9c18634832b0b4115a19b4de5f4788a7cfbd4e',                                                                                                                                                                        '7',                                                                                                                                                                                                               'bull:jobs:stalled',                                                                                                                                                                                               'bull:jobs:wait',                                                                                                                                                                                                  'bull:jobs:active',                                                                                                                                                                                                'bull:jobs:failed',                                                                                                                                                                                                'bull:jobs:stalled-check',                                                                                                                                                                                         'bull:jobs:meta-paused',                                                                                                                                                                                           'bull:jobs:paused',                                                                                                                                                                                                '1',                                                                                                                                                                                                               'bull:jobs:',                                                                                                                                                                                                      '1663362981409',                                                                                                                                                                                                   '30000'                                                                                                                                                                                                          ]                                                                                                                                                                                                                }                                                                                                                                                                                                                }                                     
  • traefik:
time="2022-09-16T14:48:05Z" level=info msg="Configuration loaded from flags."                                                                                                                                      time="2022-09-17T18:17:50Z" level=error msg="Error while Hello: EOF" 
  • redis:
1:S 20 Sep 2022 08:53:42.873 * Connecting to MASTER 178.20.47.79:8886                                                                                                                                              1:S 20 Sep 2022 08:53:42.873 * MASTER <-> REPLICA sync started                                                                                                                                                     1:S 20 Sep 2022 08:53:42.929 * Non blocking connect for SYNC fired the event.                                                                                                                                      1:S 20 Sep 2022 08:53:42.984 # Failed to read response from the server: Invalid argument                                                                                                                           1:S 20 Sep 2022 08:53:42.984 # Master did not respond to command during SYNC handshake

Information on your n8n setup

  • **n8n version: 0.192.0
  • **Database you’re using: Postgres
  • **Running n8n via Docker:

What I did was get the whole config here and alter it to match the rest of my setup and then I tried to implement fixes which I found in these threads:
https://community.n8n.io/t/webhook-scaling-issues/10404
https://community.n8n.io/t/problems-with-the-n8n-queue-configuration/13697/6

Docker compose below

version: '3.8'

volumes:
  db_storage:
  n8n_storage:

services:
  traefik:
    image: "traefik"
    restart: always
    command:
      - "--api=true"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.mytlschallenge.acme.tlschallenge=true"
      - "--certificatesresolvers.mytlschallenge.acme.email=${SSL_EMAIL}"
      - "--certificatesresolvers.mytlschallenge.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ${DATA_FOLDER}/letsencrypt:/letsencrypt
      - /var/run/docker.sock:/var/run/docker.sock:ro

  postgres:
    image: postgres:11
    restart: always
    environment:
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_NON_ROOT_USER=${POSTGRES_NON_ROOT_USER}
      - POSTGRES_NON_ROOT_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
    volumes:
      - db_storage:/var/lib/postgresql/data
      - ./init-data.sh:/docker-entrypoint-initdb.d/init-data.sh
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -h localhost -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 5s
      timeout: 5s
      retries: 10
    
    
  redis:
    image: redis:7.0-alpine
    restart: always
    # CHANGE PASSWORD
    # command: redis-server --requirepass ${REDIS_PASSWORD}
    # below is to fix a redis memory issue
    sysctls:
      net.core.somaxconn: 1024
    volumes:
      - ~/redis.conf:/home/redis/redis.conf
    ports:
      - 6379:6379
    environment:
      - REDIS_REPLICATION_MODE=master
    command: redis-server "../home/redis/redis.conf"


  n8n:
    image: n8nio/n8n
    restart: always
    environment:
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=5432
      - DB_POSTGRESDB_DATABASE=${POSTGRES_DB}
      - DB_POSTGRESDB_USER=${POSTGRES_NON_ROOT_USER}
      - DB_POSTGRESDB_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=${N8N_BASIC_AUTH_USER}
      - N8N_BASIC_AUTH_PASSWORD=${N8N_BASIC_AUTH_PASSWORD}
      - N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
      - N8N_PORT=5678
      - N8N_PROTOCOL=https
      - NODE_ENV=production
      - N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY}
      - N8N_ENDPOINT_WEBHOOK=webhook
      - N8N_ENDPOINT_WEBHOOK_TEST=webhook-test
      - WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}/
      - GENERIC_TIMEZONE=${GENERIC_TIMEZONE}
      - EXECUTIONS_MODE=queue
      - QUEUE_BULL_REDIS_HOST=redis
      - QUEUE_BULL_REDIS_PORT=6379
      - N8N_DISABLE_PRODUCTION_MAIN_PROCESS=true
    ports:
      - 5678:5678
    labels:
      - traefik.enable=true
      - traefik.http.routers.n8n.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`)
      - traefik.http.routers.n8n.tls=true
      - traefik.http.routers.n8n.entrypoints=web,websecure
      - traefik.http.routers.n8n.tls.certresolver=mytlschallenge
      - traefik.http.middlewares.n8n.headers.SSLRedirect=true
      - traefik.http.middlewares.n8n.headers.STSSeconds=315360000
      - traefik.http.middlewares.n8n.headers.browserXSSFilter=true
      - traefik.http.middlewares.n8n.headers.contentTypeNosniff=true
      - traefik.http.middlewares.n8n.headers.forceSTSHeader=true
      - traefik.http.middlewares.n8n.headers.SSLHost=${DOMAIN_NAME}
      - traefik.http.middlewares.n8n.headers.STSIncludeSubdomains=true
      - traefik.http.middlewares.n8n.headers.STSPreload=true
      - traefik.http.middlewares.n8n-redirectregex.redirectregex.regex=/webhook/(.*)
      - traefik.http.middlewares.n8n-redirectregex.redirectregex.replacement=:5679/webhook/$$1
    depends_on: 
      - redis
      - postgres    
      

    volumes:
      - ~/.n8n:/home/node/.n8n
    # Wait 5 seconds to start n8n to make sure that PostgreSQL is ready
    # when n8n tries to connect to it
    command: /bin/sh -c "sleep 5; n8n start"


  n8n-queue:
    image: n8nio/n8n
    restart: always
    environment:
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=5432
      - DB_POSTGRESDB_DATABASE=${POSTGRES_DB}
      - DB_POSTGRESDB_USER=${POSTGRES_NON_ROOT_USER}
      - DB_POSTGRESDB_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=${N8N_BASIC_AUTH_USER}
      - N8N_BASIC_AUTH_PASSWORD=${N8N_BASIC_AUTH_PASSWORD}
      - QUEUE_BULL_REDIS_HOST=redis
      - QUEUE_BULL_REDIS_PORT=6379
      - N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
      - WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}/
      - N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY}
      - NODE_FUNCTION_ALLOW_BUILTIN=*
      - NODE_FUNCTION_ALLOW_EXTERNAL=*
      - GENERIC_TIMEZONE=${GENERIC_TIMEZONE}
      - N8N_PORT=5680
      - N8N_LOG_LEVEL=debug  # error, warning, info, verbose, debug
      - N8N_PROTOCOL=https
    cpus: 1
    ports:
      - 5680:5678
    depends_on:
      - postgres
      - redis
      - n8n
    volumes:
      - ~/.n8n:/home/node/.n8n
    command: /bin/sh -c "n8n worker"

  n8n-wh:
        image: n8nio/n8n
        restart: always
        environment:
          - DB_TYPE=postgresdb
          - DB_POSTGRESDB_HOST=postgres
          - DB_POSTGRESDB_PORT=5432
          - DB_POSTGRESDB_DATABASE=${POSTGRES_DB}
          - DB_POSTGRESDB_USER=${POSTGRES_NON_ROOT_USER}
          - DB_POSTGRESDB_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
          - N8N_BASIC_AUTH_ACTIVE=true
          - N8N_BASIC_AUTH_USER=${N8N_BASIC_AUTH_USER}
          - N8N_BASIC_AUTH_PASSWORD=${N8N_BASIC_AUTH_PASSWORD}
          - EXECUTIONS_MODE=queue
          - QUEUE_BULL_REDIS_HOST=redis
          - N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY}
          - QUEUE_BULL_REDIS_PORT=6379
          - N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
          - WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}/
          - NODE_FUNCTION_ALLOW_BUILTIN=*
          - NODE_FUNCTION_ALLOW_EXTERNAL=*
          - GENERIC_TIMEZONE=${GENERIC_TIMEZONE}
          - N8N_LOG_LEVEL=debug  # error, warning, info, verbose, debug
          - N8N_PROTOCOL=https
          - N8N_PORT=5679
          - N8N_ENDPOINT_WEBHOOK=webhook
          - N8N_ENDPOINT_WEBHOOK_TEST=webhook-test
        cpus: 1
        labels:
        - traefik.enable=true
        - traefik.http.middlewares.n8n.headers.SSLRedirect=true
        - traefik.http.middlewares.n8n.headers.STSSeconds=315360000
        - traefik.http.middlewares.n8n.headers.browserXSSFilter=true
        - traefik.http.middlewares.n8n.headers.contentTypeNosniff=true
        - traefik.http.middlewares.n8n.headers.forceSTSHeader=true
        - traefik.http.middlewares.n8n.headers.SSLHost=${DOMAIN_NAME}
        - traefik.http.middlewares.n8n.headers.STSIncludeSubdomains=true
        - traefik.http.middlewares.n8n.headers.STSPreload=true
        - traefik.http.middlewares.n8n-redirectregex.redirectregex.regex=/webhook/(.*)
        - traefik.http.middlewares.n8n-redirectregex.redirectregex.replacement=:5679/webhook/$$1
        ports:
          - 5679:5678
        networks:
          - default
        depends_on: 
          - postgres
          - redis
          - n8n
          - n8n-queue
        volumes:
          - ~/.n8n:/home/node/.n8n 
        command: /bin/sh -c "n8n webhook"

Setup also includes

  • .env, only containing stuff like credentials, keys and domain names no point posting it here.
  • init-data.sh - left as found in github
  • redis.conf - left as found in github

Hi @michal,
The first thing that pops out to me is that you are routing /webhook/ to your main n8n instance, but you only want to route /webhook-test/ to the main instance and then you want to route /webhook/ to your webhook instance.
Can you try changing the routing rule on n8n/n8nio container to:

    - traefik.http.middlewares.n8n-redirectregex.redirectregex.regex=/webhook-test/(.*)
      - traefik.http.middlewares.n8n-redirectregex.redirectregex.replacement=:5679/webhook-test/$1

And see if that helps? I tend to use npm version of n8n instead of via docker so I may be missreading.

Hello and thanks for the quick response.
Sadly, when I did that it gave me an error saying I am trying to define redirectregex multiple times. When I tried to remove redirectregex lines from the main process and leave it only on the wh processor (With /webhook/(.*) ) the error was gone but the behavior remained the same, resulting in cannot GET on prod and working on test

Ah I think I read the labels wrong!
Can you try changing

traefik.http.routers.n8n.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`)

to

traefik.http.routers.n8n.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`) && ( PathPrefix(`/webhook-test`) || PathPrefix(`/rest`)) || PathPrefix('/')

AND add this to n8n wh

traefik.http.routers.n8n.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`)  && (PathPrefix(`/webhook`)|| PathPrefix(`/webhook-wait`))

That may work? Effectively traefik is sending all your calls to n8n main because you have no paths specified on the router rule

Let me know if that works?

1 Like

Thanks for sacrificing your time for me.
Still the same, in addition to just copying this I tried:

  • your rules with redirectregex middleware and without
  • altering the rules you provided in various manner.
  • If you try to run it with traefik.http.routers.n8n.rule=Host…, it will fail because such router is already defined. I did rename the webhook listener one to n8n_wh to avoid this - don’t know if that changes something?

BTW you have single quotes instead of backticks in the last part of the first router rule, which was causing parsing errors.

…|| PathPrefix(‘/’)

Isn’t there perhaps something we are missing when configuring the traefik itself?
logs from the traefik container give me this:

time="2022-09-20T11:37:35Z" level=info msg="Configuration loaded from flags."  
time="2022-09-20T11:49:45Z" level=error msg="accept tcp [::]:80: use of closed network connection" entryPointName=web
time="2022-09-20T11:49:45Z" level=error msg="Error while starting server: accept tcp [::]:80: use of closed network connection" entryPointName=web
time="2022-09-20T11:49:45Z" level=error msg="accept tcp [::]:443: use of closed network connection" entryPointName=websecure
time="2022-09-20T11:49:45Z" level=error msg="Error while starting server: accept tcp [::]:443: use of closed network connection" entryPointName=websecure
time="2022-09-20T11:49:45Z" level=error msg="accept tcp [::]:8080: use of closed network connection" entryPointName=traefik
time="2022-09-20T11:49:45Z" level=error msg="Error while starting server: accept tcp [::]:8080: use of closed network connection" entryPointName=traefik
2022/09/20 11:49:46 reverseproxy.go:502: httputil: ReverseProxy read error during body copy: unexpected EOF 

Now my (still defunct) compose file looks like this:

version: '3.8'

volumes:
  db_storage:
  n8n_storage:

services:
  traefik:
    image: "traefik"
    restart: always
    command:
      - "--api=true"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.mytlschallenge.acme.tlschallenge=true"
      - "--certificatesresolvers.mytlschallenge.acme.email=${SSL_EMAIL}"
      - "--certificatesresolvers.mytlschallenge.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ${DATA_FOLDER}/letsencrypt:/letsencrypt
      - /var/run/docker.sock:/var/run/docker.sock:ro

  postgres:
    image: postgres:11
    restart: always
    environment:
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_NON_ROOT_USER=${POSTGRES_NON_ROOT_USER}
      - POSTGRES_NON_ROOT_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
    volumes:
      - db_storage:/var/lib/postgresql/data
      - ./init-data.sh:/docker-entrypoint-initdb.d/init-data.sh
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -h localhost -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 5s
      timeout: 5s
      retries: 10
    
    
  redis:
    image: redis:7.0-alpine
    restart: always
    # CHANGE PASSWORD
    #command: redis-server --requirepass ${REDIS_PASSWORD}
    sysctls:
      net.core.somaxconn: 1024
    volumes:
      - ~/redis.conf:/home/redis/redis.conf
    ports:
      - 6379:6379
    environment:
      - REDIS_REPLICATION_MODE=master
    command: redis-server "../home/redis/redis.conf"


  n8n:
    image: n8nio/n8n
    restart: always
    environment:
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=5432
      - DB_POSTGRESDB_DATABASE=${POSTGRES_DB}
      - DB_POSTGRESDB_USER=${POSTGRES_NON_ROOT_USER}
      - DB_POSTGRESDB_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=${N8N_BASIC_AUTH_USER}
      - N8N_BASIC_AUTH_PASSWORD=${N8N_BASIC_AUTH_PASSWORD}
      - N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
      - N8N_PORT=5678
      - N8N_PROTOCOL=https
      - NODE_ENV=production
      - N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY}
      - N8N_ENDPOINT_WEBHOOK=webhook
      - N8N_ENDPOINT_WEBHOOK_TEST=webhook-test
      - WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}/
      - GENERIC_TIMEZONE=${GENERIC_TIMEZONE}
      - EXECUTIONS_MODE=queue
      - QUEUE_BULL_REDIS_HOST=redis
      - QUEUE_BULL_REDIS_PORT=6379
      - N8N_DISABLE_PRODUCTION_MAIN_PROCESS=true
    ports:
      - 5678:5678
    labels:
      - "traefik.enable=true"
      #- traefik.http.routers.n8n.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`) && ( PathPrefix(`/webhook-test`) || PathPrefix(`/rest`))
      - traefik.http.routers.n8n.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`) && ( PathPrefix(`/webhook-test`) || PathPrefix(`/rest`)) || PathPrefix(`/`)
      #- "traefik.http.routers.n8n.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`) || (Host(`${SUBDOMAIN}.${DOMAIN_NAME}`) && (PathPrefix(`/webhook-test`) || PathPrefix(`/rest`)))"
      #- "traefik.http.routers.n8n.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`)"
      - "traefik.http.routers.n8n.tls=true"
      - "traefik.http.routers.n8n.entrypoints=web,websecure"
      - "traefik.http.routers.n8n.tls.certresolver=mytlschallenge"
      - "traefik.http.middlewares.n8n.headers.SSLRedirect=true"
      - "traefik.http.middlewares.n8n.headers.STSSeconds=315360000"
      - "traefik.http.middlewares.n8n.headers.browserXSSFilter=true"
      - "traefik.http.middlewares.n8n.headers.contentTypeNosniff=true"
      - "traefik.http.middlewares.n8n.headers.forceSTSHeader=true"
      - "traefik.http.middlewares.n8n.headers.SSLHost=${DOMAIN_NAME}"
      - "traefik.http.middlewares.n8n.headers.STSIncludeSubdomains=true"
      - "traefik.http.middlewares.n8n.headers.STSPreload=true"
    depends_on: 
      - redis
      - postgres    
      

    volumes:
      - ~/.n8n:/home/node/.n8n
    # Wait 5 seconds to start n8n to make sure that PostgreSQL is ready
    # when n8n tries to connect to it
    command: /bin/sh -c "sleep 5; n8n start"


  n8n_queue:
    image: n8nio/n8n
    restart: always
    environment:
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=5432
      - DB_POSTGRESDB_DATABASE=${POSTGRES_DB}
      - DB_POSTGRESDB_USER=${POSTGRES_NON_ROOT_USER}
      - DB_POSTGRESDB_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=${N8N_BASIC_AUTH_USER}
      - N8N_BASIC_AUTH_PASSWORD=${N8N_BASIC_AUTH_PASSWORD}
      - QUEUE_BULL_REDIS_HOST=redis
      - QUEUE_BULL_REDIS_PORT=6379
      - N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
      - WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}/
      - N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY}
      - NODE_FUNCTION_ALLOW_BUILTIN=*
      - NODE_FUNCTION_ALLOW_EXTERNAL=*
      - GENERIC_TIMEZONE=${GENERIC_TIMEZONE}
      - N8N_PORT=5680
      - N8N_LOG_LEVEL=debug  # error, warning, info, verbose, debug
      - N8N_PROTOCOL=https
    cpus: 1
    ports:
      - 5680:5678
    depends_on:
      - postgres
      - redis
      - n8n
    volumes:
      - ~/.n8n:/home/node/.n8n
    command: /bin/sh -c "n8n worker"

  n8n_wh:
        image: n8nio/n8n
        restart: always
        environment:
          - DB_TYPE=postgresdb
          - DB_POSTGRESDB_HOST=postgres
          - DB_POSTGRESDB_PORT=5432
          - DB_POSTGRESDB_DATABASE=${POSTGRES_DB}
          - DB_POSTGRESDB_USER=${POSTGRES_NON_ROOT_USER}
          - DB_POSTGRESDB_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
          - N8N_BASIC_AUTH_ACTIVE=true
          - N8N_BASIC_AUTH_USER=${N8N_BASIC_AUTH_USER}
          - N8N_BASIC_AUTH_PASSWORD=${N8N_BASIC_AUTH_PASSWORD}
          - EXECUTIONS_MODE=queue
          - QUEUE_BULL_REDIS_HOST=redis
          - N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY}
          - QUEUE_BULL_REDIS_PORT=6379
          - N8N_HOST=${SUBDOMAIN}.${DOMAIN_NAME}
          - WEBHOOK_URL=https://${SUBDOMAIN}.${DOMAIN_NAME}/
          - NODE_FUNCTION_ALLOW_BUILTIN=*
          - NODE_FUNCTION_ALLOW_EXTERNAL=*
          - GENERIC_TIMEZONE=${GENERIC_TIMEZONE}
          - N8N_LOG_LEVEL=debug  # error, warning, info, verbose, debug
          - N8N_PROTOCOL=https
          - N8N_PORT=5679
          - N8N_ENDPOINT_WEBHOOK=webhook
          - N8N_ENDPOINT_WEBHOOK_TEST=webhook-test
        cpus: 1
        labels:
        - "traefik.enable=true"
        - "traefik.http.routers.n8n_wh.rule=Host(`${SUBDOMAIN}.${DOMAIN_NAME}`) && (PathPrefix(`/webhook`)|| PathPrefix(`/webhook-wait`))"
        - "traefik.http.middlewares.n8n_wh.headers.SSLRedirect=true"
        - "traefik.http.middlewares.n8n_wh.headers.STSSeconds=315360000"
        - "traefik.http.middlewares.n8n_wh.headers.browserXSSFilter=true"
        - "traefik.http.middlewares.n8n_wh.headers.contentTypeNosniff=true"
        - t"raefik.http.middlewares.n8n_wh.headers.forceSTSHeader=true"
        - "traefik.http.middlewares.n8n_wh.headers.SSLHost=${DOMAIN_NAME}"
        - "traefik.http.middlewares.n8n_wh.headers.STSIncludeSubdomains=true"
        - "traefik.http.middlewares.n8n_wh.headers.STSPreload=true"
        #- "traefik.http.middlewares.n8n_wh-redirectregex.redirectregex.regex=/webhook/(.*)"
        #- "traefik.http.middlewares.n8n_wh-redirectregex.redirectregex.replacement=:5679/webhook/$$1"
        ports:
          - 5679:5678
        networks:
          - default
        depends_on: 
          - postgres
          - redis
          - n8n
          - n8n_queue
        volumes:
          - ~/.n8n:/home/node/.n8n 
        command: /bin/sh -c "n8n webhook"```

Sorry about the formatting I copied it straight from my traefik-dynamic.toml
I’m going to dump some of my traefik-dynamic.toml here and see if you can translate that into the required parts for the docker container.

[http]

  [http.routers]

    [http.routers.n8n-router]
      rule = "Host(`api..x.yz`) && ( PathPrefix(`/api-test`) || PathPrefix(`/rest`))"
      service = "n8n"
      entryPoints = ["web-secure"]
    [http.routers.n8n-router.tls]
      certResolver = "cloudflare"
      [[http.routers.n8n-router.tls.domains]]
        main = "*.x.yz"


    [http.routers.n8n-webhook]
      rule = "Host(`api.x.yz`) && (PathPrefix(`/api`)|| PathPrefix(`/api-wait`))"
      service = "n8n-webhook"
      entryPoints = ["web-secure"]
    [http.routers.n8n-webhook.tls]
      certResolver = "cloudflare"
      [[http.routers.n8n-webhook.tls.domains]]
        main = "*.x.yz"

[http.services]

    [http.services.n8n-router.loadbalancer]
      [[http.services.n8n-router.loadbalancer.servers]]
        url = "http://x.x.x.x:5678"

    [http.services.n8n-webhook.loadbalancer]
      [[http.services.n8n-webhook.loadbalancer.servers]]
        url = "http://x.x.x.x:5678"
      [[http.services.n8n-webhook.loadbalancer.servers]]
        url = "http://x.x.x.x:5678"
      [http.services.n8n-webhook.loadBalancer.healthCheck]
        path = "/healthz"
        timeout = "3s"

The bit I can pull out is maybe the need to define a service?
If its still an issue I can take a look at how to translate traefiks toml into docker.
Another option I may recommend is trying out Caddy which I think the official docs covers.
A basic Caddy file for this would be something like:

n8n.example.com {
        reverse_proxy /api/* x.x.x.x:5678 {
        health_uri /healthz
}
        reverse_proxy /api-test/*  x.x.x.x:5678
        reverse_proxy /rest/* x.x.x.x:5678
        reverse_proxy /api-wait/* x.x.x.x:5789
        reverse_proxy / x.x.x.x:5678
}

Hey @michal , sorry to hear you’re having those issues.

It seems to me that there is an issue with your Redis configuration, where it believes to be running in a replica mode but it’s probably running standalone.

We can see this by the Redis init, stating it is trying to connect to master and is unable, and then the worker process says it’s trying to write data to Redis (which it must do, once the execution finishes) and it’s failing.

I saw that you have the redis replication set and some special configuration to Redis. I think you should revisit those and make sure they are properly set.

Hello @krynble, thanks for looking into this.

Could you or anyone point me towards some correct Redis config for n8n? Scaling docs don’t cover it apart from “set up Redis”, and in this video How to scale your n8n instance 🗻 - YouTube it seems to work out of the box (or they didn’t show the changes to default redis.conf)
So far I have tried running Redis in:

  • default configuration, which didn’t work, however n8n instances at least aren’t crashing
  • with redis.conf changing some replication parameters, such as bind 127.0.0.1 -::1, protected-mode no after which every n8n instance started logging Can’t connect to Redis/Redis unavailable and exited with 1
  • set it up following this Getting started with Redis | Redis after which the Redis container got to a restart loop saying nothin more than exited with code 0 (probably because of daemonize=yes)

In addition, no matter what I try, if starting up Redis is successful, the init log always ends up saying redis running mode=standalone. Is there a need to define replicas as their own services, like they do here, in order for redis to run in the replication master mode?

Thanks

Hey @michal did you ever find a solution for your problem? Could you please share any additional findings?

I am looking into setting up on ECS myself, so any pointers ahead would be great.

Thanks.

Hello @MarceloSchmidt, yes I did, however I am not sure what exactly was the source of this particular issue in the end. We hired a freelancer who set it up and some things he made differently to me were:

  • using non dockerized nginx instead of traefik as a reverse proxy, because there were some problems with setting up the routing rules inside the container
  • redis worked out of the box to my knowledge
  • then we had a problem with webhook processor not handling the requests despite otherwise correct setup (main sent it directly to workers), he seemed certain that what solved this issue was upgrading n8n docker image to 0.196.0 - he tried it on various other version and only on this it worked as it should

docker compose we ended up with:

version: '3.8'

volumes:
  db_storage:
  n8n_storage:
  redis_storage:

x-shared: &shared
  restart: always
  environment:
    - DB_TYPE=postgresdb
    - DB_POSTGRESDB_HOST=postgres
    - DB_POSTGRESDB_PORT=5432
    - DB_POSTGRESDB_DATABASE=${POSTGRES_DB}
    - DB_POSTGRESDB_USER=${POSTGRES_NON_ROOT_USER}
    - DB_POSTGRESDB_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
    - EXECUTIONS_MODE=queue
    - QUEUE_BULL_REDIS_HOST=redis
    - QUEUE_HEALTH_CHECK_ACTIVE=true
    - N8N_BASIC_AUTH_ACTIVE=true
    - N8N_BASIC_AUTH_USER
    - N8N_BASIC_AUTH_PASSWORD
    - N8N_HOST=${DOMAIN_NAME}
    - N8N_PORT=5678
    - N8N_PROTOCOL=https
    - NODE_ENV=production
    - WEBHOOK_URL=https://${DOMAIN_NAME}
    - N8N_ENDPOINT_WEBHOOK=${SUB_HOOK}
    - N8N_ENDPOINT_WEBHOOK_TEST=${SUB_HOOK_TEST}
    - N8N_ENCRYPTION_KEY=${ENCRYPT_KEY}
      # N8N_DISABLE_PRODUCTION_MAIN_PROCESS=true
  links:
    - postgres
    - redis
  volumes:
    - n8n_storage:/home/node/
  depends_on:
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy


services:
  postgres:
    image: postgres:14.5
    restart: always
    environment:
      - POSTGRES_USER
      - POSTGRES_PASSWORD
      - POSTGRES_DB
      - POSTGRES_NON_ROOT_USER
      - POSTGRES_NON_ROOT_PASSWORD
    volumes:
      - db_storage:/var/lib/postgresql/data
      - ./init-data.sh:/docker-entrypoint-initdb.d/init-data.sh
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -h localhost -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 5s
      timeout: 5s
      retries: 10

  redis:
    image: redis:7.0.5-alpine
    restart: always
    volumes:
      - redis_storage:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 5s
      retries: 10

  n8n:
    <<: *shared
    image: n8nio/n8n:0.196.0
    #command: /bin/sh -c "n8n start --tunnel"
    command: /bin/sh -c "n8n start"
    ports:
      - '127.0.0.1:5678:5678'

  n8n-worker:
    <<: *shared
    image: n8nio/n8n:0.196.0
    command: /bin/sh -c "sleep 10; n8n worker --concurrency=50"
    depends_on:
      - n8n

  n8n-wh:
    <<: *shared
    image: n8nio/n8n:0.196.0
    command: /bin/sh -c "sleep 20; n8n webhook"
    ports:
      - '127.0.0.1:5677:5678'
    depends_on:
      - n8n

nginx template (change example.com):

server {

	root /var/www/html;

	# Add index.php to the list if you are using PHP
	index index.html index.htm index.nginx-debian.html;

	server_name www.example.com;

        location ~* "/webhook/(.*)" {
		        proxy_set_header Host $host;
        	    proxy_set_header X-Real-IP $remote_addr;
        	    proxy_pass http://127.0.0.1:5677$request_uri;
		        proxy_set_header Connection '';
                proxy_http_version 1.1;
                chunked_transfer_encoding off;
                proxy_buffering off;
                proxy_cache off;
    	}

    	# Forward valid webhook requests to n8n (capturing)
        location ~* "/webhook/(.*)" {
        	    proxy_set_header Host $host;
         	    proxy_set_header X-Real-IP $remote_addr;
        	    proxy_pass http://127.0.0.1:5677/$1/$2;
		        proxy_set_header Connection '';
                proxy_http_version 1.1;
                chunked_transfer_encoding off;
                proxy_buffering off;
                proxy_cache off;
    	}

        location ~* "/webhook-test/(.*)" {
                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_pass http://127.0.0.1:5678$request_uri;
                proxy_set_header Connection '';
                proxy_http_version 1.1;
                chunked_transfer_encoding off;
                proxy_buffering off;
                proxy_cache off;
        }



	location / {
                # First attempt to serve request as file, then
                # as directory, then fall back to displaying a 404.
                proxy_pass http://127.0.0.1:5678;
        	    proxy_set_header Connection '';
   		        proxy_http_version 1.1;
   		        chunked_transfer_encoding off;
   		        proxy_buffering off;
   		        proxy_cache off;
	}
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/www.example.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/www.example.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

}
server {
    if ($host = www.example.com) {
        return 301 https://$host$request_uri;
    } # managed by Certbot



	server_name www.example.com;
    listen 80;
    return 404; # managed by Certbot
}

And when running it, you need to decide on number of workers suitable for your use case and tell that to docker compose. Below starts n8n with 2 workers.

docker compose up --scale
n8n-worker=2 -d

You will also need to set .env according to your needs and don’t forget init-data.sh which I did not include here because it was left unchanged.

1 Like

Can anyone help providing a solution to docker-compose n8n with traefik to get this working.

I am facing the problem on n8n queue mode that I can do production webhooks on activated workflows, but cannot do test webhooks on activated and deactivated webhooks when doing “Listen for test event”.

I did setup a additional subdomain on n8n webhook service, which is reachable, but only for production webhooks.

####### - n8n Main Process
  n8n-main:
    <<: *shared
    command: /bin/sh -c "sleep 5; n8n start"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.n8n-main.rule=Host(`${N8N_UI_SUBDOMAIN}.${DOMAIN_NAME}`)"
      - "traefik.http.routers.n8n-main.entrypoints=websecure"
      - "traefik.http.routers.n8n-main.tls.certresolver=myresolver"
      - "traefik.http.middlewares.n8n-main.headers.SSLRedirect=true"
      - traefik.http.middlewares.n8n-main.headers.STSSeconds=315360000
      - traefik.http.middlewares.n8n-main.headers.browserXSSFilter=true
      - traefik.http.middlewares.n8n-main.headers.contentTypeNosniff=true
      - traefik.http.middlewares.n8n-main.headers.forceSTSHeader=true
      - traefik.http.middlewares.n8n-main.headers.SSLHost=${DOMAIN_NAME}
      - traefik.http.middlewares.n8n-main.headers.STSIncludeSubdomains=true
      - traefik.http.middlewares.n8n-main.headers.STSPreload=true

n8n-webhook:
    <<: *shared
    command: /bin/sh -c "sleep 5; n8n webhook"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.n8n-webhook.rule=Host(`${N8N_WEBHOOK_SUBUDOMAIN}.${DOMAIN_NAME}`)"
      - "traefik.http.routers.n8n-webhook.entrypoints=websecure"
      - "traefik.http.routers.n8n-webhook.tls.certresolver=myresolver"
      - "traefik.http.middlewares.n8n-webhook.headers.SSLRedirect=true"
      - traefik.http.middlewares.n8n-webhook.headers.STSSeconds=315360000
      - traefik.http.middlewares.n8n-webhook.headers.browserXSSFilter=true
      - traefik.http.middlewares.n8n-webhook.headers.contentTypeNosniff=true
      - traefik.http.middlewares.n8n-webhook.headers.forceSTSHeader=true
      - traefik.http.middlewares.n8n-webhook.headers.SSLHost=${DOMAIN_NAME}
      - traefik.http.middlewares.n8n-webhook.headers.STSIncludeSubdomains=true
      - traefik.http.middlewares.n8n-webhook.headers.STSPreload=true

Any ideas?

Hey @prononext,

It might be worth opening a new thread and including the error you are seeing and what your load balancer config looks like.

My first thought is maybe the webhook test uri is not directed to the main instance and is instead going to the webhook workers.