Hi everyone,
I’m building a RAG-based chatbot assistant for a website, self-hosted on a VPS using n8n. My current stack involves: AI Agent + OpenAI (gpt-4o-mini) + Pinecone vector database.
I’m struggling with high latency: it often takes 16-18 seconds to get a response after a user sends a message.
I’m looking to optimize the user experience and would love your advice on two points:
-
Reducing Response Time: What are the best practices for speeding up n8n Agent and Pinecone queries? I’m already using gpt-4o-mini. Would shortening the system prompt, reducing the topK value in Pinecone, or switching from an AI Agent to a specific Chain (like Retrieval QA) make a significant difference?
-
Custom Preloader/Status Messages: The default n8n chat widget only shows the three typing dots (typing indicator) during the wait. Is there a way to display custom status messages instead (e.g., “Searching database…”, “Formulating response…”) to keep the user engaged?
-
Streaming: Does anyone have experience enabling “Stream Responses” in this setup? Can it effectively mask the latency by showing characters as they are generated?
Server Specs: Hostinger, 4 vCPU, 16GB RAM
I’d appreciate any insights or tips on how to bring this delay down to a few seconds. Thanks in advance!
Describe the problem/error/question
What is the error message (if any)?
Please share your workflow
(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)
Share the output returned by the last node
Information on your n8n setup
- n8n version:
- Database (default: SQLite):
- n8n EXECUTIONS_PROCESS setting (default: own, main):
- Running n8n via (Docker, npm, n8n cloud, desktop app):
- Operating system:
