[Hiring] Building a sovereign AI agentic platform on bare metal — N8N + Ollama + Qdrant, looking for someone to own the tech

Consultoriadeventas · April 1, 2026, 11:56am

Hey all — we’re a small AI consulting firm in Spain running inference on our own GPU server (RTX 5090 on the way, currently on RunPod A40). No OpenAI in production, everything local: Llama 3.1 / Qwen 2.5 via Ollama, Qdrant for RAG, N8N for orchestration.

We need someone to take technical ownership. The business side is covered — we know exactly what each agent needs to do and have a full technical spec and manual ready to hand over on day one. We just need a dev who’s actually done this stack before and can run with it.

What we’re building:

Customer support agent (text + WhatsApp)
PDF quotation generator
CV screener with scoring
Full RAG virtual assistant with ticketing
Inbound voice agent (faster-whisper + Kokoro TTS)

Part-time, remote, ongoing. Fixed milestone to start, monthly retainer after that.

If you’ve built N8N workflows calling local LLMs and set up RAG pipelines with a vector DB, drop a reply or DM. Bonus if you’ve touched Twilio or WhatsApp Business API.

Before reaching out, three quick questions:

1. Have you built N8N + Ollama pipelines before? One example is enough.

2. Which vector DB have you used for RAG, and which embedding model?

3. Difference between /api/chat and /api/generate in Ollama?

Folafoluwa_Olaneye · April 1, 2026, 12:15pm

Hi Consultoriadeventas Team, welcome to the community

This is exactly the kind of platform I’ve been building.

I help teams set up fully local AI agentic systems, including n8n orchestration pipelines, Ollama LLM inference, and RAG pipelines with vector databases like Qdrant. I’ve built workflows that handle document ingestion → embedding → query → multi-channel outputs (text, Slack, WhatsApp), and integrated AI outputs into automated ticketing and PDF generation pipelines.

I’d be happy to discuss your current setup and show how I’d take ownership of the stack, ensuring each agent runs reliably on your GPU infrastructure.

Best,
Folafoluwa Stephen

Email: folafoluwaolaneye@gmail.com
Book a call: Schedule a walkthrough call session here.

automaxion · April 1, 2026, 12:58pm

Hi, this looks closely aligned with the kind of systems I work on.

I’ve worked on automation and AI workflows using n8n, self-hosted models, RAG pipelines, and API-based integrations. I’m comfortable taking technical ownership, working from an existing spec, and turning it into a reliable production setup.

I have experience building agent workflows, integrating LLMs into business processes, and designing systems that are scalable and maintainable. I’d be happy to review your spec and see how I can help move this forward.

For your 3 questions:

Yes, I’ve worked with n8n-based AI pipelines and local/self-hosted model orchestration.
I’ve worked with vector databases such as Qdrant and Pinecone, with embedding models depending on the use case and infrastructure constraints.
In Ollama, /api/generate is mainly for single prompt-completion generation, while /api/chat is built for multi-message conversational interactions using role-based messages.

Book a quick call:Calendly - Automaxion

Muhammad_Bin_Zohaib · April 1, 2026, 4:35pm

Hi @Consultoriadeventas Team,

This is exactly the type of stack I specialize in. I’ve built fully local AI agentic systems using n8n orchestration, Ollama LLM inference, and RAG pipelines with Qdrant. My workflows handle document ingestion → embedding → querying → multi-channel outputs (WhatsApp, Slack, PDF generation, ticketing).

I’d be happy to take full technical ownership of your platform, working directly from your spec to ensure reliable, production-ready performance on your GPU infrastructure.

Let’s discuss the project details and milestones — feel free to DM me or share the spec, and we can get started right away.

Best,
Muhammad Bin Zohaib
muhammad.specials@gmail.com
Portfolio / Projects

moosa · April 2, 2026, 7:27am

I have 3+ years of experience building automation and AI workflows using n8n, including API-based orchestration, multi-step workflows, and integration with AI models. I’ve also worked on AI-driven systems involving document processing, lead automation, and structured data pipelines.

Regarding your questions:

Yes, I’ve built n8n workflows integrating with self-hosted and API-based LLM setups, handling multi-step logic and data pipelines.
I’ve worked with vector databases like Qdrant and handled embeddings for retrieval-based workflows.
/api/chat is used for conversational context-based interactions (maintains chat history), while /api/generate is more for single prompt-response generation without conversation memory.

I’m also a top supporter of the n8n community and focus on building scalable, production-ready systems.

Would be great to connect, discuss your current architecture, and schedule a short meeting.

Aryan_Pmedia · April 2, 2026, 10:51am

Hey Profile - sediality - n8n Community

I got you, I have been building all forms of automations for the past 2 years and have built 100s of flows for my clients. Have worked with all sorts of companies and gotten them 10s of thousands in revenue or savings by strategic flows. When you decide to work with me, not only will I build this flow out, but also give you a free consultation like I have for all my clients that led to these revenue jumps.

I have built a similar workflow like this for one of my clients. I can not only share that but also how you can streamline processes in your company for faster operations. All this with no strings attached on our first call.

Here, have a look at my website and you can book a call with me there!

Talk soon!

Questions-
1- Yes, I’ve built n8n + Ollama pipelines for local document Q&A — webhook triggers a request to Mistral 7B running locally via Ollama, processes the response, and routes it downstream, all air-gapped with zero data leaving the machine.

2- For RAG I’ve used Qdrant as the vector store with nomic-embed-text embeddings pulled locally through Ollama, with 512 token chunks and 50 token overlap for clean retrieval.

3- The key difference between /api/chat and /api/generate is conversation state — generate is stateless single-turn inference with a raw prompt string, while chat accepts a structured messages array with roles and maintains context across turns, making it the right choice for anything agent or chatbot related.

Davethebrave · April 2, 2026, 11:25pm

This one genuinely interests me — bare metal sovereign AI with n8n + Ollama + Qdrant is the direction I’ve been watching closely.

I’ve been building multi-agent systems on Claude API with n8n orchestration for a year. The move to self-hosted models and owning the full stack is something I’ve been thinking through. I’d like to own the tech on something like this.

Background: live algo trading system (86% win rate), multi-agent CommandCenter architecture, SMB automation clients. 20 years building businesses before going all-in on AI.

More about me in my own words: David Necaise — AI Automation Architect

Let’s talk. What does the timeline look like?

Brandson · April 3, 2026, 10:18am

Hi,

I’ve built several production N8N workflows with local LLM integrations and RAG pipelines, which aligns well with what you’re looking for. Your stack (Ollama, Qdrant, N8N) is exactly what I’ve been working with, and I’m interested in taking technical ownership of this platform.

To answer your three questions directly: I’ve built N8N workflows calling Ollama for a call transcription and analysis system that processes audio, extracts insights, and scores leads based on conversation content. For RAG, I’ve used Qdrant with the all-MiniLM-L6-v2 embedding model in a lead qualification chatbot that retrieves context from company documents. On the Ollama API difference: /api/chat is the multi-turn conversation endpoint that maintains message history and context, while /api/generate is the single completion endpoint for one-off prompts without conversation state.

What’s particularly relevant is my lead qualification chatbot currently runs against local LLMs and integrates document retrieval, so I’ve solved the exact pattern you need for the RAG virtual assistant and CV screener. I’m also familiar with Twilio integrations from the SMS outreach pipeline, so WhatsApp Business API would be a straightforward extension.

salim_oulhaj · April 3, 2026, 4:23pm

Hi @Consultoriadeventas — this stack is exactly what I work with day-to-day.

To your three questions directly:

1. n8n + Ollama pipelines? Yes. Built a customer service agent where a webhook triggers an n8n workflow, calls Ollama’s REST API with a structured prompt, and routes the response based on intent — escalate to human, send WhatsApp reply, or log to CRM. Very similar to your customer support agent use case.

2. Vector DB + embedding model? Qdrant for storage, with nomic-embed-text pulled locally via Ollama for embeddings — fully sovereign, no external API calls. For multilingual content I’ve also used mxbai-embed-large.

3. /api/chat vs /api/generate? /api/generate is stateless — single prompt in, completion out, no memory. /api/chat maintains conversation history via a messages array (system/user/assistant turns), which is what you need for your support agent and RAG assistant to hold context across turns.

Your spec sounds well-scoped. The CV screener, PDF quotation generator, and RAG virtual assistant are all straightforward on this stack. The voice agent (faster-whisper → Ollama → Kokoro TTS) is the trickiest piece — latency tuning between STT/LLM/TTS is where most implementations stumble — but very doable with the right buffering strategy in n8n.

Part-time retainer works perfectly for me. Happy to review your technical spec and discuss milestones. Feel free to DM.

Mihail_Rogal · April 4, 2026, 10:42am

Consultoriadeventas:

Hey all — we’re a small AI consulting firm in Spain running inference on our own GPU server (RTX 5090 on the way, currently on RunPod A40). No OpenAI in production, everything local: Llama 3.1 / Qwen 2.5 via Ollama, Qdrant for RAG, N8N for orchestration.

We need someone to take technical ownership. The business side is covered — we know exactly what each agent needs to do and have a full technical spec and manual ready to hand over on day one. We just need a dev who’s actually done this stack before and can run with it.

What we’re building:

Customer support agent (text + WhatsApp)

PDF quotation generator

CV screener with scoring

Full RAG virtual assistant with ticketing

Inbound voice agent (faster-whisper + Kokoro TTS)

Part-time, remote, ongoing. Fixed milestone to start, monthly retainer after that.

If you’ve built N8N workflows calling local LLMs and set up RAG pipelines with a vector DB, drop a reply or DM. Bonus if you’ve touched Twilio or WhatsApp Business API.

Before reaching out, three quick questions:

Have you built N8N + Ollama pipelines before? One example is enough.

Which vector DB have you used for RAG, and which embedding model?

Difference between /api/chat and /api/generate in Ollama?

Hi! I’ve built exactly this stack multiple times. I don’t just connect nodes; I build production-ready backends with proper error handling and state management.

To answer your technical check:

n8n + Ollama: Yes, I’ve built several pipelines where n8n orchestrates local LLMs for content generation and lead scoring.
RAG & Vector DB: I usually work with Supabase (pgvector) or PostgreSQL as a vector store. For embeddings, I use OpenAI (text-embedding-3-small) or local models like mxbai-embed-large via Ollama to keep everything on-prem.
/api/chat vs /api/generate: /api/generate is for a single prompt/response (completion), while /api/chat is designed for conversational state, accepting a structured array of messages (system, user, assistant) to keep the context.

Relevant experience for your project:

WhatsApp & Voice: I’ve built automated agents for WhatsApp. Here is a video of my own WhatsApp API in action: https://drive.google.com/file/d/1ZTD4LjgGcgIxrndxO6tDPUZ6Wb6KIudc/view?usp=drive_link
n8n + Supabase RAG: I’ve implemented RAG pipelines where n8n handles the chunking, embedding, and Upsert logic directly into Supabase.
Live Tech: You can check my portfolio and even test my automated booking system (n8n + Calendar) on my site.

My Contacts:

Website: https://mikedevai.netlify.app/
WhatsApp: +375293761570
Telegram: @hely_chatbots

I’m ready to take technical ownership and run with your specs.

Best regards,

Mikhail

Priyanshu_Kumar · April 6, 2026, 10:13am

I run Evara AI, incubated at IIT Bhubaneswar, and I have built production systems across several of the exact modules you listed – WhatsApp customer support agents, inbound voice agents with Vapi, and RAG-based virtual assistants with vector search. On the local/sovereign side, I work with Ollama-hosted models and have experience wiring them into N8N workflows for structured task execution. For your screening questions: I have built Ollama pipelines that handle document ingestion, chunked embedding with vector stores, and retrieval-augmented generation for Q&A – the key architectural decision is whether to use Qdrant’s filtering capabilities for multi-tenant isolation or handle it at the application layer. The WhatsApp + voice + PDF generation stack you described maps directly to systems I have already shipped. I can carve out part-time hours on a retainer basis – DM me or email priyanshukumarmaurya2224@gmail.com to discuss specifics.

Pyfio · April 8, 2026, 6:47am

Hi, I build production AI agent systems and MCP servers at Pyfio. Multi-agent orchestration, RAG pipelines, and security auditing are core to what I do.

Recent work: an 8-stage autonomous pipeline (https://agentfeed.pyfio.com) and an MCP security scanner (https://audit.pyfio.com) with 50+ servers scanned.

Your stack (N8N + Ollama + Qdrant on bare metal) fits well with what I build. Happy to discuss scope.

More at Services - Pyfio or hello@pyfio.com.

Priyanshu_Kumar · April 12, 2026, 5:52pm

Hi,

This is well-aligned with what I build. Specifically:

Voice agent: I’ve deployed a production voice AI system for a healthcare client — handles real inbound calls, appointment scheduling, FAQ handling, and human escalation. Used Twilio for telephony.
WhatsApp bot: Built WhatsApp-based AI assistants with conversation memory and structured response handling.
RAG assistant: Production multi-agent system with Supabase/PostgreSQL as the context layer. Claude API handles the reasoning, custom Python orchestrator routes queries to specialist agents. Portfolio with details: priyanshukumar.co
n8n: Extensive experience with self-hosted n8n — workflow scheduling, error handling, retry logic, API orchestration.

The milestone + retainer structure works well — I typically deliver in phases so you can validate each component before moving to the next.

Available for a part-time remote engagement. Happy to discuss scope and milestones.

Priyanshu Kumar
AI & Automation Engineer
priyanshukumar.co

kevork_lepedjian · April 12, 2026, 9:59pm

I’m very interested, I have over 5 ulcers of experience in software engineering and have made several n8n projects with several ai agents

Daemon_Automation · April 13, 2026, 10:40am

Hey,

Three screening questions first, since that’s the right way to start.

1. n8n + Ollama: I build these locally. n8n handles the HTTP calls to Ollama’s API, pulls context from a vector DB, constructs the prompt, routes the response. No external LLM dependency, data stays on your hardware.

2. Vector DB: For sovereign setups I use Qdrant with nomic-embed-text via Ollama. For cloud projects, Supabase pgvector with OpenAI embeddings. Chunk size and overlap strategy matters more than which DB you pick — I tune it by document type. PDFs are a different problem than structured records.

3. /api/chat vs /api/generate: /api/generate is stateless, single-turn. /api/chat takes a messages array and handles conversation context natively. For your support agent and voice agent where context has to carry between turns, /api/chat is the right call. I always route agent workflows through it in n8n.

Your five systems, my read:

Customer support (text + WhatsApp): Webhook classifies intent, Qdrant retrieves, Ollama generates, routes through Twilio or WhatsApp Business API. Standard build, 1-2 days.

PDF quotation generator: Variable extraction from form or trigger, template fill, PDF render. Done this for e-commerce. Fast.

CV screener: Structured extraction with scoring rubric, Airtable output. Straightforward.

RAG virtual assistant with ticketing: This is the one I’d want to talk through before committing to a timeline. The retrieval is fine — the ticket routing depends on your knowledge base structure. I’d need to see that first.

Inbound voice agent: Kokoro TTS, faster-whisper, Ollama, n8n. Most complex of the five by a margin. Latency between components is the real problem, not the individual pieces. Solvable, but architecture decisions upfront determine whether it’s smooth or a mess six months in.

I work async, fully remote. Milestones in 1-2 days, full documentation on handover so your team can maintain it without me. If something breaks at 2am I want to know before you do.

Retainer path is what I’m after. Start with one system, see the quality, go from there.

DM or reply here.

Daemon

Anton_Goloskokov · April 18, 2026, 3:00pm

Hey - this is a stack I want to work with more, and I have the foundation to hit the ground running.

To answer your three questions directly:

I haven’t built n8n + Ollama specifically, but I’ve built 4 production n8n workflows that call LLM APIs through HTTP Request nodes - three sequential Claude API calls in one workflow with response parsing, error handling, and retry logic between each step. Ollama’s REST API follows the same pattern, just pointing at localhost instead of api.anthropic.com. I can have a working n8n + Ollama pipeline within hours of getting access.
I’ve worked with Supabase (pgvector) for vector storage. For embeddings - OpenAI’s text-embedding-3-small in production, but for a fully local stack like yours I’d go with nomic-embed-text or mxbai-embed-large through Ollama. Haven’t used Qdrant directly but it’s a REST API with well-documented endpoints - same integration pattern I’ve done many times in n8n.
/api/generate is stateless - you send a single prompt, get a completion back, no conversation memory. /api/chat takes a messages array with roles (system/user/assistant), maintains dialogue context across turns. For your customer support and virtual assistant agents you’d use /api/chat. For one-shot tasks like CV scoring or quote generation, /api/generate is cleaner and faster.

Broader context on me:

4 production n8n workflows on GitHub: github.com/penkayone/n8n-automation-portfolio
Self-hosted n8n on Docker with Postgres, Grafana monitoring, cron scheduling
Backend dev: Python, PHP, JavaScript - so custom Code nodes and scripts alongside n8n are normal for me
Google Calendar integration already built and running
Comfortable with self-hosted infrastructure - I manage my own VPS with Docker containers, not just clicking buttons in cloud dashboards

The projects you listed (support agent, PDF generator, CV screener, RAG assistant, voice agent) are all variations of the same architecture I already build: input → LLM processing → structured output → delivery. The local LLM part is new territory for me in production, but the orchestration layer is exactly what I do.

Part-time remote with milestone then retainer works perfectly for me.
DM me or reach out:
Telegram: @antongoloskokov
Email: An.goloskokov@gmail.com

jorge_leonardo_loaiz · May 9, 2026, 2:08pm

Hi,

This project is very close to the kind of platform I am building and want to keep developing: AI agents for business workflows, WhatsApp/customer support, document processing, CRM-style logic, and automation orchestration.

I can work from a clear technical spec and help turn it into reliable n8n workflows with local/self-hosted LLM calls, RAG retrieval, structured outputs, error handling and documentation.

To answer your questions:

I have built n8n-style AI orchestration and agent workflows using external and self-hosted/API-based LLM patterns. With Ollama specifically, the integration pattern is straightforward through HTTP Request nodes against the local Ollama API, with structured prompts, parsing and fallback branches.
For RAG, I would use Qdrant or Supabase pgvector depending on the infrastructure. For a sovereign/local stack, Qdrant with a local embedding model such as nomic-embed-text or mxbai-embed-large via Ollama is a strong fit.
/api/generate is stateless and better for single prompt-completion calls. /api/chat accepts role-based messages and is better for conversational agents where context needs to be preserved across turns.

I am based in Spain and would be interested in reviewing the spec and starting with a fixed milestone.

zynate · May 11, 2026, 12:01pm

Hey Consultoriadeventas

Building support Agents using text and whtsapp API isn`t complex. The complex part is building the pipeline according to the requirements of yours and making a production-grade autmation that doesnt fail at night or when youre asleep. So this is the point where I come in , I have built custom logic based workflows in different niches and right now i`m open to new projects!

Lets have a discussion on this!

maxim_makselyanov · May 22, 2026, 7:19am

Hi, I can help with the technical ownership slice if you want a fixed first milestone rather than a huge open build.

I’m strong in n8n, backend/API glue, local LLM/RAG workflows, Qdrant-style retrieval, logging/retries, and handoff docs.

A practical first paid milestone could be: n8n + Ollama + Qdrant RAG skeleton, one document workflow, one support-agent flow, test data, error handling, and a short runbook.

If you can share the current VPS/bare-metal stack and the first workflow you want live, I can scope the first slice.

Adliebe · May 22, 2026, 8:01am

This is close to the kind of private-agent system I would take on. I would suggest a fixed first milestone: one Ollama + Qdrant + n8n support-agent slice with a small document corpus, retrieval, answer drafting, ticket/WhatsApp handoff stub, approval boundary, and run/eval log. Once that is stable, the PDF quotation generator and CV scorer can be added as separate modules instead of one large fragile canvas.

Answers to your three questions:

For this exact stack, I would build the n8n → local LLM → retrieval → approval/ticket path first and keep every external action behind an explicit approval gate. The closest public proof I can show is workflow architecture, n8n-style handoff kits, and agent/security guardrails.
Qdrant and pgvector are both workable. For your stack I would use Qdrant first because you already named it, then pick embeddings based on corpus language: bge-m3 or multilingual-e5 for mixed Spanish/English, mxbai/bge-small if speed and cost matter more than recall.
In Ollama, /api/chat takes role-based message arrays and is the better fit for multi-turn agent state, system/developer context, and tool-style orchestration. /api/generate is prompt-to-completion and is better for deterministic transforms, extraction, summaries, or simple single-shot generation.

I can start with a scoped paid milestone and a written handoff so the system remains maintainable after the first agent is live.

Topic		Replies	Views
Ai agents, build within n8n or as an openai assistant Questions	2	220	December 12, 2024
[HIRING] Claude API + n8n Engineer — 31-Agent Business Intelligence System \| Remote Jobs	52	1206	June 6, 2026
Community Hangout, Nov 28: n8n & AI: Your Questions Answered [recording available] Announcements	16	1075	December 3, 2024
My first n8n AI agent using ollama(free model)-learning phase Built with n8n node , langchain , ai	0	70	February 5, 2026
Job Title: AI Automation Engineer (n8n & Voice AI Specialist) Jobs	27	710	June 5, 2026

[Hiring] Building a sovereign AI agentic platform on bare metal — N8N + Ollama + Qdrant, looking for someone to own the tech

Related topics