Hey all — we’re a small AI consulting firm in Spain running inference on our own GPU server (RTX 5090 on the way, currently on RunPod A40). No OpenAI in production, everything local: Llama 3.1 / Qwen 2.5 via Ollama, Qdrant for RAG, N8N for orchestration.
We need someone to take technical ownership. The business side is covered — we know exactly what each agent needs to do and have a full technical spec and manual ready to hand over on day one. We just need a dev who’s actually done this stack before and can run with it.
What we’re building:
-
Customer support agent (text + WhatsApp)
-
PDF quotation generator
-
CV screener with scoring
-
Full RAG virtual assistant with ticketing
-
Inbound voice agent (faster-whisper + Kokoro TTS)
Part-time, remote, ongoing. Fixed milestone to start, monthly retainer after that.
If you’ve built N8N workflows calling local LLMs and set up RAG pipelines with a vector DB, drop a reply or DM. Bonus if you’ve touched Twilio or WhatsApp Business API.
Before reaching out, three quick questions:
1. Have you built N8N + Ollama pipelines before? One example is enough.
2. Which vector DB have you used for RAG, and which embedding model?
3. Difference between /api/chat and /api/generate in Ollama?