How to manage huge amount of data to pass to telegram node

Hi,I’m using a selfhosted 2.3.6 n8n in my lab, using a lxc container in proxmox 9I usually like to have a lok at this website https://elhacker.info in order to find new tutorials and all kind of files all around. As it’s difficult to find what new files are uploaded to the site, i decided to use n8n in order to make something like a snapshot using a python web scrapper, and compare it with the latest workflow run so as to detect if some file is uploaded.As there are a lot of files, when the output arrives to my telegram node, It gives me an error:

{“errorMessage”: “Your request is invalid or could not be processed by the service”,“errorDescription”: “Request Entity Too Large”,“errorDetails”: {“rawErrorMessage”: [“413 - {“ok”:false,“error_code”:413,“description”:“Request Entity Too Large”}”],“httpCode”: “413”},“n8nDetails”: {“nodeName”: “A7_Telegram”,“nodeType”: “n8n-nodes-base.telegram”,“nodeVersion”: 1,“resource”: “message”,“operation”: “sendMessage”,“time”: “22/1/2026, 9:48:14”,“n8nVersion”: “2.3.6 (Self Hosted)”,“binaryDataMode”: “filesystem”,“stackTrace”: [“NodeApiError: Your request is invalid or could not be processed by the service”," at ExecuteContext.apiRequest (/usr/lib/node_modules/n8n/node_modules/n8n-nodes-base/nodes/Telegram/GenericFunctions.ts:230:9)“," at processTicksAndRejections (node:internal/process/task_queues:105:5)”," at ExecuteContext.execute (/usr/lib/node_modules/n8n/node_modules/n8n-nodes-base/nodes/Telegram/Telegram.node.ts:2198:21)“," at WorkflowExecute.executeNode (/usr/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1045:8)”," at WorkflowExecute.runNode (/usr/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1226:11)“," at /usr/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:1662:27”," at /usr/lib/node_modules/n8n/node_modules/n8n-core/src/execution-engine/workflow-execute.ts:2297:11"]}}

One solution is to run the srapper script the very first time outside n8n and create a new database and prevent telegram from receiving such a huge amount of data, and then, put it in production so that if any other change is made to the files, it will arrive with no problem.But, just for learning purposes, is there any other way to make it work from beginning and prevent the first error? Here is my code:

{
“nodes”: [
{
“parameters”: {
“rule”: {
“interval”: [
{
“triggerAtHour”: 4
}
]
}
},
“id”: “8dc0845a-ac77-47bc-9839-b380ef919a97”,
“name”: “A1_Trigger”,
“type”: “n8n-nodes-base.scheduleTrigger”,
“typeVersion”: 1.1,
“position”: [
21728,
272
]
},
{
“parameters”: {
“command”: “python3 /.n8n/scraper.py”
},
“id”: “72106d9b-3bb9-44ac-8823-09a59cec960b”,
“name”: “A2_Scrapper”,
“type”: “n8n-nodes-base.executeCommand”,
“typeVersion”: 1,
“position”: [
21952,
272
]
},
{
“parameters”: {
“command”: “python3 /root/.n8n-files/insertar.py”
},
“id”: “6b3f423f-73ae-4014-96f8-d53f5c91124e”,
“name”: “A3_Insertar_DB”,
“type”: “n8n-nodes-base.executeCommand”,
“typeVersion”: 1,
“position”: [
22160,
272
]
},
{
“parameters”: {
“fileSelector”: “/root/.n8n-files/novedades.json”,
“options”: {}
},
“id”: “3cd3e9dd-8c95-414e-9338-ba35caa8941a”,
“name”: “A4_Leer_Novedades”,
“type”: “n8n-nodes-base.readWriteFile”,
“typeVersion”: 1,
“position”: [
22384,
272
]
},
{
“parameters”: {
“operation”: “fromJson”,
“options”: {}
},
“id”: “412f041d-ce03-4f86-b658-1172cd73b2eb”,
“name”: “A5_Extraer_JSON”,
“type”: “n8n-nodes-base.extractFromFile”,
“typeVersion”: 1,
“position”: [
22608,
272
]
},
{
“parameters”: {
“jsCode”: “// Capturamos el primer ítem que llega al nodo\nconst inputItem = $input.all()[0];\n\n// Extraemos la lista de archivos que está dentro de la propiedad "data"\nconst listaArchivos = inputItem.json.data;\n\n// Si la lista no existe o está vacía, avisamos\nif (!listaArchivos || listaArchivos.length === 0) {\n return { \n json: { \n texto_final: ":white_check_mark: Escaneo finalizado: No se han encontrado archivos nuevos en esta ejecución." \n } \n };\n}\n\nlet msj = ":bell: ¡Nuevos archivos detectados! :bell:\n\n";\n\n// Recorremos la lista interna de archivos\nlistaArchivos.forEach((archivo, i) => {\n const nombre = archivo.archivo || "Sin nombre";\n const enlace = archivo.enlace || "#";\n const fecha = archivo.detectado || "N/A";\n msj += ${i+1}. 📄 *${nombre}*\\n🔗 [Descargar](${enlace})\\n📅 _${fecha}_\\n\\n;\n});\n\nreturn { json: { texto_final: msj } };”
},
“id”: “b59b0d12-c147-4cb4-b8e1-5404e715c747”,
“name”: “A6_Formatear”,
“type”: “n8n-nodes-base.code”,
“typeVersion”: 2,
“position”: [
22832,
272
]
},
{
“parameters”: {
“chatId”: “1418784730”,
“text”: “={{ $json.texto_final }}”,
“additionalFields”: {
“parse_mode”: “Markdown”
}
},
“id”: “2b843855-f8b2-4c39-84c3-6c8d76ebac13”,
“name”: “A7_Telegram”,
“type”: “n8n-nodes-base.telegram”,
“typeVersion”: 1,
“position”: [
23040,
272
],
“webhookId”: “fc182bf0-d23d-4bb1-a7d4-4cea8356dd62”,
“credentials”: {
“telegramApi”: {
“id”: “lRLGefHU25m40wt7”,
“name”: “Telegram ElHacker.net
}
}
}
],
“connections”: {
“A1_Trigger”: {
“main”: [
[
{
“node”: “A2_Scrapper”,
“type”: “main”,
“index”: 0
}
]
]
},
“A2_Scrapper”: {
“main”: [
[
{
“node”: “A3_Insertar_DB”,
“type”: “main”,
“index”: 0
}
]
]
},
“A3_Insertar_DB”: {
“main”: [
[
{
“node”: “A4_Leer_Novedades”,
“type”: “main”,
“index”: 0
}
]
]
},
“A4_Leer_Novedades”: {
“main”: [
[
{
“node”: “A5_Extraer_JSON”,
“type”: “main”,
“index”: 0
}
]
]
},
“A5_Extraer_JSON”: {
“main”: [
[
{
“node”: “A6_Formatear”,
“type”: “main”,
“index”: 0
}
]
]
},
“A6_Formatear”: {
“main”: [
[
{
“node”: “A7_Telegram”,
“type”: “main”,
“index”: 0
}
]
]
}
},
“pinData”: {},
“meta”: {
“instanceId”: “36b3c3d7b960dcf8aecfa143cdfe54b57bdd67b81fa18ba665b02da3f105d9f8”
}
}

The scrapper code is like this:

import requests
from bs4 import BeautifulSoup
import json
from urllib.parse import urljoin
import re
import os

BASE_URLS = [
elhacker.INFO - Descargas Cursos, Manuales, Tutoriales y Libros’,
elhacker.INFO - Descargas Cursos, Manuales, Tutoriales y Libros’,
]

Listado unificado: originales + vídeo + música

EXTENSIONES_PERMITIDAS = (
# Archivos, Documentos y Ejecutables
‘.zip’, ‘.rar’, ‘.7z’, ‘.pdf’, ‘.iso’, ‘.exe’,
‘.tar’, ‘.gz’, ‘.lst’, ‘.txt’, ‘.epub’,

# Vídeo
'.mp4', '.avi', '.mpeg', '.mpg', '.mkv', 
'.mov', '.wmv', '.flv', '.webm', '.m4v', '.3gp',

# Música
'.mp3', '.wav', '.flac', '.aac', '.ogg', 
'.wma', '.m4a', '.aiff', '.alac', '.opus'

)

def scrape_elhacker():
all_files =
queue = list(BASE_URLS)
visited_urls = set()

# Uso de sesión para mejorar la velocidad
session = requests.Session()
session.headers.update({
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) n8n-monitor'
})

while queue:
    current_url = queue.pop(0)

    if current_url in visited_urls:
        continue
    visited_urls.add(current_url)

    try:
        response = session.get(current_url, timeout=15)
        if response.status_code != 200:
            continue

        response.encoding = response.apparent_encoding
        soup = BeautifulSoup(response.text, 'html.parser')

        for link in soup.find_all('a'):
            href = link.get('href')

            # Filtros de navegación
            if not href or '?' in href or 'Parent Directory' in link.text or href.startswith('/'):
                continue

            full_path = urljoin(current_url, href)

            # Si es una CARPETA, añadir a la cola
            if href.endswith('/'):
                if full_path not in visited_urls:
                    queue.append(full_path)
                continue

            # Si es un ARCHIVO permitido, procesar
            if href.lower().endswith(EXTENSIONES_PERMITIDAS):
                content_after = link.find_next_sibling(string=True)
                date = "N/A"
                if content_after:
                    match = re.search(r'(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2})', str(content_after))
                    if match:
                        date = match.group(1)

                all_files.append({
                    "path": full_path,
                    "nombre": href.replace('%20', ' ').lstrip('/'),
                    "fecha_modificacion": date
                })

    except Exception:
        continue

# IMPORTANTE: Devolvemos la lista de objetos (no un string JSON)
return all_files

if name == “main”:
# 1. Ejecutar el scraping
lista_archivos = scrape_elhacker()

# 2. Definir ruta permitida en el LXC
output_dir = '/root/.n8n-files'
output_file = os.path.join(output_dir, 'resultados_scraper.json')

# 3. Asegurar que la carpeta existe
if not os.path.exists(output_dir):
    os.makedirs(output_dir, exist_ok=True)

try:
    # 4. Guardar los datos en el archivo
    # json.dump toma la lista y la escribe directamente en formato JSON
    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(lista_archivos, f, ensure_ascii=False, indent=2)

    # 5. Imprimir estado mínimo para evitar error de buffer en n8n
    print(json.dumps({
        "status": "success",
        "count": len(lista_archivos),
        "file": output_file
    }))

except Exception as e:
    print(json.dumps({
        "status": "error",
        "message": str(e)
    }))

I’m unable to write all this post with a good format. Please, you can find it in a proper mode in Hi,I’m using a selfhosted 2.3.6 n8n in my lab, using a lxc container in proxmox - Pastebin.com

hi, @Conkernel !

The issue is not with n8n or your scraper, but with Telegram message size limits. On the first run, your workflow generates a very large message containing all detected files, which exceeds Telegram’s maximum message size and results in a 413 “Request Entity Too Large” error.

The correct approach is to either split the output into smaller chunks and send multiple Telegram messages, or send a short summary message and attach the full result as a file. This avoids hitting Telegram’s limits and is the recommended production pattern.

1 Like

Hi @tamy.santos . Thanks for your reply.
It’s a good idea to send the telegram message as a file with all the items

Thanks a lot for your help

Regards

1 Like

Run the scrapper script the very first time outside n8n and create a new database and prevent telegram from receiving such a huge amount of data.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.