[Hiring] Building a sovereign AI agentic platform on bare metal — N8N + Ollama + Qdrant, looking for someone to own the tech

Hey all — we’re a small AI consulting firm in Spain running inference on our own GPU server (RTX 5090 on the way, currently on RunPod A40). No OpenAI in production, everything local: Llama 3.1 / Qwen 2.5 via Ollama, Qdrant for RAG, N8N for orchestration.

We need someone to take technical ownership. The business side is covered — we know exactly what each agent needs to do and have a full technical spec and manual ready to hand over on day one. We just need a dev who’s actually done this stack before and can run with it.

What we’re building:

  • Customer support agent (text + WhatsApp)

  • PDF quotation generator

  • CV screener with scoring

  • Full RAG virtual assistant with ticketing

  • Inbound voice agent (faster-whisper + Kokoro TTS)

Part-time, remote, ongoing. Fixed milestone to start, monthly retainer after that.

If you’ve built N8N workflows calling local LLMs and set up RAG pipelines with a vector DB, drop a reply or DM. Bonus if you’ve touched Twilio or WhatsApp Business API.

Before reaching out, three quick questions:

1. Have you built N8N + Ollama pipelines before? One example is enough.

2. Which vector DB have you used for RAG, and which embedding model?

3. Difference between /api/chat and /api/generate in Ollama?

3 Likes

Hi Consultoriadeventas Team, welcome to the community

This is exactly the kind of platform I’ve been building.

I help teams set up fully local AI agentic systems, including n8n orchestration pipelines, Ollama LLM inference, and RAG pipelines with vector databases like Qdrant. I’ve built workflows that handle document ingestion → embedding → query → multi-channel outputs (text, Slack, WhatsApp), and integrated AI outputs into automated ticketing and PDF generation pipelines.

I’d be happy to discuss your current setup and show how I’d take ownership of the stack, ensuring each agent runs reliably on your GPU infrastructure.

Best,
Folafoluwa Stephen

Email: [email protected]
Book a call: Schedule a walkthrough call session here.

Hi, this looks closely aligned with the kind of systems I work on.

I’ve worked on automation and AI workflows using n8n, self-hosted models, RAG pipelines, and API-based integrations. I’m comfortable taking technical ownership, working from an existing spec, and turning it into a reliable production setup.

I have experience building agent workflows, integrating LLMs into business processes, and designing systems that are scalable and maintainable. I’d be happy to review your spec and see how I can help move this forward.

For your 3 questions:

  1. Yes, I’ve worked with n8n-based AI pipelines and local/self-hosted model orchestration.

  2. I’ve worked with vector databases such as Qdrant and Pinecone, with embedding models depending on the use case and infrastructure constraints.

  3. In Ollama, /api/generate is mainly for single prompt-completion generation, while /api/chat is built for multi-message conversational interactions using role-based messages.

:telephone_receiver: Book a quick call:Calendly - Automaxion

Hi @Consultoriadeventas Team,

This is exactly the type of stack I specialize in. I’ve built fully local AI agentic systems using n8n orchestration, Ollama LLM inference, and RAG pipelines with Qdrant. My workflows handle document ingestion → embedding → querying → multi-channel outputs (WhatsApp, Slack, PDF generation, ticketing).

I’d be happy to take full technical ownership of your platform, working directly from your spec to ensure reliable, production-ready performance on your GPU infrastructure.

Let’s discuss the project details and milestones — feel free to DM me or share the spec, and we can get started right away.

Best,
Muhammad Bin Zohaib
:open_mailbox_with_raised_flag: [email protected]
:link: Portfolio / Projects

I have 3+ years of experience building automation and AI workflows using n8n, including API-based orchestration, multi-step workflows, and integration with AI models. I’ve also worked on AI-driven systems involving document processing, lead automation, and structured data pipelines.

Regarding your questions:

  1. Yes, I’ve built n8n workflows integrating with self-hosted and API-based LLM setups, handling multi-step logic and data pipelines.
  2. I’ve worked with vector databases like Qdrant and handled embeddings for retrieval-based workflows.
  3. /api/chat is used for conversational context-based interactions (maintains chat history), while /api/generate is more for single prompt-response generation without conversation memory.

I’m also a top supporter of the n8n community and focus on building scalable, production-ready systems.

Would be great to connect, discuss your current architecture, and schedule a short meeting.

Hey Profile - sediality - n8n Community

I got you, I have been building all forms of automations for the past 2 years and have built 100s of flows for my clients. Have worked with all sorts of companies and gotten them 10s of thousands in revenue or savings by strategic flows. When you decide to work with me, not only will I build this flow out, but also give you a free consultation like I have for all my clients that led to these revenue jumps.

I have built a similar workflow like this for one of my clients. I can not only share that but also how you can streamline processes in your company for faster operations. All this with no strings attached on our first call.

Here, have a look at my website and you can book a call with me there!

Talk soon!

Questions-
1- Yes, I’ve built n8n + Ollama pipelines for local document Q&A — webhook triggers a request to Mistral 7B running locally via Ollama, processes the response, and routes it downstream, all air-gapped with zero data leaving the machine.

2- For RAG I’ve used Qdrant as the vector store with nomic-embed-text embeddings pulled locally through Ollama, with 512 token chunks and 50 token overlap for clean retrieval.

3- The key difference between /api/chat and /api/generate is conversation state — generate is stateless single-turn inference with a raw prompt string, while chat accepts a structured messages array with roles and maintains context across turns, making it the right choice for anything agent or chatbot related.

This one genuinely interests me — bare metal sovereign AI with n8n + Ollama + Qdrant is the direction I’ve been watching closely.

I’ve been building multi-agent systems on Claude API with n8n orchestration for a year. The move to self-hosted models and owning the full stack is something I’ve been thinking through. I’d like to own the tech on something like this.

Background: live algo trading system (86% win rate), multi-agent CommandCenter architecture, SMB automation clients. 20 years building businesses before going all-in on AI.

More about me in my own words: David Necaise — AI Automation Architect

Let’s talk. What does the timeline look like?

Hi,

I’ve built several production N8N workflows with local LLM integrations and RAG pipelines, which aligns well with what you’re looking for. Your stack (Ollama, Qdrant, N8N) is exactly what I’ve been working with, and I’m interested in taking technical ownership of this platform.

To answer your three questions directly: I’ve built N8N workflows calling Ollama for a call transcription and analysis system that processes audio, extracts insights, and scores leads based on conversation content. For RAG, I’ve used Qdrant with the all-MiniLM-L6-v2 embedding model in a lead qualification chatbot that retrieves context from company documents. On the Ollama API difference: /api/chat is the multi-turn conversation endpoint that maintains message history and context, while /api/generate is the single completion endpoint for one-off prompts without conversation state.

What’s particularly relevant is my lead qualification chatbot currently runs against local LLMs and integrates document retrieval, so I’ve solved the exact pattern you need for the RAG virtual assistant and CV screener. I’m also familiar with Twilio integrations from the SMS outreach pipeline, so WhatsApp Business API would be a straightforward extension.

Hi @Consultoriadeventas — this stack is exactly what I work with day-to-day.

To your three questions directly:

1. n8n + Ollama pipelines? Yes. Built a customer service agent where a webhook triggers an n8n workflow, calls Ollama’s REST API with a structured prompt, and routes the response based on intent — escalate to human, send WhatsApp reply, or log to CRM. Very similar to your customer support agent use case.

2. Vector DB + embedding model? Qdrant for storage, with nomic-embed-text pulled locally via Ollama for embeddings — fully sovereign, no external API calls. For multilingual content I’ve also used mxbai-embed-large.

3. /api/chat vs /api/generate? /api/generate is stateless — single prompt in, completion out, no memory. /api/chat maintains conversation history via a messages array (system/user/assistant turns), which is what you need for your support agent and RAG assistant to hold context across turns.

Your spec sounds well-scoped. The CV screener, PDF quotation generator, and RAG virtual assistant are all straightforward on this stack. The voice agent (faster-whisper → Ollama → Kokoro TTS) is the trickiest piece — latency tuning between STT/LLM/TTS is where most implementations stumble — but very doable with the right buffering strategy in n8n.

Part-time retainer works perfectly for me. Happy to review your technical spec and discuss milestones. Feel free to DM.

Hi! I’ve built exactly this stack multiple times. I don’t just connect nodes; I build production-ready backends with proper error handling and state management.

To answer your technical check:

  1. n8n + Ollama: Yes, I’ve built several pipelines where n8n orchestrates local LLMs for content generation and lead scoring.

  2. RAG & Vector DB: I usually work with Supabase (pgvector) or PostgreSQL as a vector store. For embeddings, I use OpenAI (text-embedding-3-small) or local models like mxbai-embed-large via Ollama to keep everything on-prem.

  3. /api/chat vs /api/generate: /api/generate is for a single prompt/response (completion), while /api/chat is designed for conversational state, accepting a structured array of messages (system, user, assistant) to keep the context.

Relevant experience for your project:

  • WhatsApp & Voice: I’ve built automated agents for WhatsApp. Here is a video of my own WhatsApp API in action: https://drive.google.com/file/d/1ZTD4LjgGcgIxrndxO6tDPUZ6Wb6KIudc/view?usp=drive_link

  • n8n + Supabase RAG: I’ve implemented RAG pipelines where n8n handles the chunking, embedding, and Upsert logic directly into Supabase.

  • Live Tech: You can check my portfolio and even test my automated booking system (n8n + Calendar) on my site.

My Contacts:

I’m ready to take technical ownership and run with your specs.

Best regards,

Mikhail

I run Evara AI, incubated at IIT Bhubaneswar, and I have built production systems across several of the exact modules you listed – WhatsApp customer support agents, inbound voice agents with Vapi, and RAG-based virtual assistants with vector search. On the local/sovereign side, I work with Ollama-hosted models and have experience wiring them into N8N workflows for structured task execution. For your screening questions: I have built Ollama pipelines that handle document ingestion, chunked embedding with vector stores, and retrieval-augmented generation for Q&A – the key architectural decision is whether to use Qdrant’s filtering capabilities for multi-tenant isolation or handle it at the application layer. The WhatsApp + voice + PDF generation stack you described maps directly to systems I have already shipped. I can carve out part-time hours on a retainer basis – DM me or email [email protected] to discuss specifics.

Hi, I build production AI agent systems and MCP servers at Pyfio. Multi-agent orchestration, RAG pipelines, and security auditing are core to what I do.

Recent work: an 8-stage autonomous pipeline (https://agentfeed.pyfio.com) and an MCP security scanner (https://audit.pyfio.com) with 50+ servers scanned.

Your stack (N8N + Ollama + Qdrant on bare metal) fits well with what I build. Happy to discuss scope.

More at Services - Pyfio or [email protected].

Hi,

This is well-aligned with what I build. Specifically:

  • Voice agent: I’ve deployed a production voice AI system for a healthcare client — handles real inbound calls, appointment scheduling, FAQ handling, and human escalation. Used Twilio for telephony.
  • WhatsApp bot: Built WhatsApp-based AI assistants with conversation memory and structured response handling.
  • RAG assistant: Production multi-agent system with Supabase/PostgreSQL as the context layer. Claude API handles the reasoning, custom Python orchestrator routes queries to specialist agents. Portfolio with details: priyanshukumar.co
  • n8n: Extensive experience with self-hosted n8n — workflow scheduling, error handling, retry logic, API orchestration.

The milestone + retainer structure works well — I typically deliver in phases so you can validate each component before moving to the next.

Available for a part-time remote engagement. Happy to discuss scope and milestones.

Priyanshu Kumar
AI & Automation Engineer
priyanshukumar.co

I’m very interested, I have over 5 ulcers of experience in software engineering and have made several n8n projects with several ai agents

Hey,

Three screening questions first, since that’s the right way to start.

1. n8n + Ollama: I build these locally. n8n handles the HTTP calls to Ollama’s API, pulls context from a vector DB, constructs the prompt, routes the response. No external LLM dependency, data stays on your hardware.

2. Vector DB: For sovereign setups I use Qdrant with nomic-embed-text via Ollama. For cloud projects, Supabase pgvector with OpenAI embeddings. Chunk size and overlap strategy matters more than which DB you pick — I tune it by document type. PDFs are a different problem than structured records.

3. /api/chat vs /api/generate: /api/generate is stateless, single-turn. /api/chat takes a messages array and handles conversation context natively. For your support agent and voice agent where context has to carry between turns, /api/chat is the right call. I always route agent workflows through it in n8n.

Your five systems, my read:

Customer support (text + WhatsApp): Webhook classifies intent, Qdrant retrieves, Ollama generates, routes through Twilio or WhatsApp Business API. Standard build, 1-2 days.

PDF quotation generator: Variable extraction from form or trigger, template fill, PDF render. Done this for e-commerce. Fast.

CV screener: Structured extraction with scoring rubric, Airtable output. Straightforward.

RAG virtual assistant with ticketing: This is the one I’d want to talk through before committing to a timeline. The retrieval is fine — the ticket routing depends on your knowledge base structure. I’d need to see that first.

Inbound voice agent: Kokoro TTS, faster-whisper, Ollama, n8n. Most complex of the five by a margin. Latency between components is the real problem, not the individual pieces. Solvable, but architecture decisions upfront determine whether it’s smooth or a mess six months in.

I work async, fully remote. Milestones in 1-2 days, full documentation on handover so your team can maintain it without me. If something breaks at 2am I want to know before you do.

Retainer path is what I’m after. Start with one system, see the quality, go from there.

DM or reply here.

Daemon