System setup:
- n8n: v1.45.1 (Docker, orchestrating the pipeline and running agents)
- LlamaIndex: 0.12.42 (in Python virtual environment)
- llama-index-vector-stores-chroma : 0.4.2
- chromadb: Tried both 0.4.24 and 1.0.12 (latest, and recommended by plugin)
- numpy: 1.26.4 (downgraded to avoid np.float_ error)
- Python: 3.12.x
- Ollama: 0.1.34 (for local LLMs, used for embedding and completion)
- Streamlit: 1.35.0 (for UI testing, but the bug occurs even without Streamlit)
- Platform: Ubuntu 22.04 LTS (minipc)
- ChromaDB server: Running in Docker, port 8000 published, confirmed reachable
Problem description:
I’m building a local research database agent for document RAG (retrieval-augmented generation), orchestrated via n8n, using LlamaIndex and ChromaDB as the vector store (with the ChromaDB server running in Docker on port 8000).
No matter what combination of package versions, parameters, or environment variables I try, the following issue occurs:
- When initializing ChromaVectorStore (from llama-index-vector-stores-chroma),
- and supplying port=8000 (tried both as integer and string),
- LlamaIndex always ends up passing port=None to the ChromaDB HttpClient.
Resulting error:
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
- The value for port is correct in all kwargs and debug prints, but by the time LlamaIndex calls chromadb.HttpClient, it is always None.
- This happens with all reasonable permutations of server config (REST, embedded, with and without caching, etc).
What I’ve tried:
- Upgrading/downgrading all of:
- llama-index
- llama-index-vector-stores-chroma
- chromadb (0.4.24 to 1.0.12)
- numpy (1.26.x, 2.x)
- Passing port as both integer and string
- Setting environment variables (CHROMA_PORT, etc.)
- Setting chroma_api_impl=“rest” explicitly
- Removing Streamlit and all caching
- Using both embedded and REST/Server ChromaDB configurations
- Starting from a clean venv with only the required minimal packages
- Verified the ChromaDB server is running and reachable on port 8000 from the Python environment
- Checked all open GitHub issues for LlamaIndex and ChromaDB
Observed:
- This “port=None” bug is confirmed in multiple community threads and GitHub issues, but no working workaround seems to exist for LlamaIndex (>=0.12.x) + ChromaDB server (>=1.x) as of June 2025.
- FAISS and Qdrant vector stores work fine, but ChromaDB server integration is broken due to this port handling bug.
My goal:
I need a stable, production-ready vector database with robust Python client support for automated research pipelines (n8n, Streamlit, LlamaIndex, etc.), including document ingestion, filtering, and RAG-based QA via local LLMs.
Questions:
- Is there any way to get LlamaIndex (>=0.12.x) working with a recent ChromaDB server (>=1.x) without hitting the port=None bug?
- If not, what is the recommended stack for a robust local research agent that can scale for serious research use?
Any real-world solutions, minimal working examples, or package version combos would be greatly appreciated!