MCP VLLM qwen3.5 model

Hi,

I recently migrated Qwen 3.5 9B to a vLLM Docker instance. Previously, I implemented an MCP server that worked perfectly with OpenWebUI.

However, when I try to use this MCP directly in n8n (configuring the “OpenAI Chat Model” to point to my vLLM instance), it fails. Interestingly, the same setup works fine when hosting the model via Ollama or directly through OpenWebUI.

The error output I receive is:

<tool_call>\n<function=sub_rechercher_employe>\n<parameter=Nom>\nDoe\n\n<parameter=Prenom>\nJohn\n\n\n</tool_call>

Do you have any ideas on how to properly configure n8n to handle MCP and vLLM models?

Thanks for your help!

Hey @Cedric19,

The error output you’re seeing is the key clue here. The model is returning raw XML-style <tool_call> tags instead of proper OpenAI-compatible tool call JSON. This means vLLM is not applying the correct chat template for tool/function calling — so n8n’s tool-use agent receives the raw text instead of a structured tool call it can parse.

Here’s what’s happening and how to fix it:

1. Enable tool calling in vLLM with the correct chat template

When you launch vLLM, you need to explicitly enable tool calling support. Qwen 3.5 uses Hermes-style tool calling, so you should add these flags:


--enable-auto-tool-choice --tool-call-parser hermes
```

Full example:
```bash
vllm serve Qwen/Qwen3.5-9B \
  --enable-auto-tool-choice \
  --tool-call-parser hermes \
  --host 0.0.0.0 --port 8000
```

Without these flags, the model falls back to its raw trained behavior (the XML `<tool_call>` format you're seeing), which OpenWebUI can parse on its own but n8n cannot — n8n expects the OpenAI-standard tool call format from the API response.

**2. Verify the chat template supports tools**

If the above flags alone don't fix it, you may need to pass a custom `--chat-template` that includes tool definitions in the Jinja template. Check vLLM's docs on [tool calling](https://docs.vllm.ai/en/latest/features/tool_calling.html) for templates compatible with Qwen models.

You can test whether it's working correctly by hitting the vLLM endpoint directly with curl:

```bash
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.5-9B",
    "messages": [{"role": "user", "content": "who is john doe?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "sub_rechercher_employe",
        "parameters": {
          "type": "object",
          "properties": {
            "Nom": {"type": "string"},
            "Prenom": {"type": "string"}
          }
        }
      }
    }]
  }'
```

If the response contains `"tool_calls"` in the message object (JSON format), it's working. If you still see raw `<tool_call>` XML in the `content` field, the template isn't being applied.

**3. Why it works in OpenWebUI but not n8n**

OpenWebUI has its own built-in parser that can detect and extract those raw `<tool_call>` XML tags from model output. n8n's OpenAI Chat Model node does not — it relies on the API returning proper OpenAI-format `tool_calls` in the response JSON. That's why the same vLLM instance works with OpenWebUI but breaks in n8n.

**4. n8n credential config**

Make sure your OpenAI credential in n8n has:
- **Base URL**: `http://<your-vllm-host>:8000/v1`
- **API Key**: any non-empty string (vLLM doesn't validate it by default, but n8n requires the field)
- **Model**: must match exactly what vLLM loaded (e.g. `Qwen/Qwen3.5-9B`)

Hope this helps — the tool parser flag should be the main fix!

Thanks for your feedback.

Should I install something for “--tool-call-parser hermes"? What is hermes ?

@unstableentity , I’ve added it VLLM docker lauch , but the issue persists.

hi @Cedric19 !

I think the core issue is probably upstream from n8n. Since the model is returning raw <tool_call> text instead of OpenAI-style structured tool_calls, I’d verify the vLLM response format first with a direct /v1/chat/completions test using the same tools payload. If that response still comes back as plain text, then n8n won’t be able to execute the tool call reliably, and the fix is on the vLLM side with the model’s tool-calling parser or chat template rather than in the n8n workflow itself. The good next step would be to confirm whether your vLLM endpoint is actually returning tool_calls in the API response JSON.

yeah, sure :

What is the parameter highlighted in red?

That means the model does not support strict tool calling. So n8n cannot fully rely on it to return proper structured tool_calls, which matches the behavior you’re seeing.

the tool calling support in qwen3.5 is the blocker. if you need local thats reliable for agents, id try mistral or models with explicit tool calling support. otherwise claude/openai are the baseline that just works

Not clear for my point of view. So why it works fine when I use the model and MCP on OpenWebUi :thinking:

OpenWebUI has its own built-in parser that handles the raw <tool_call> XML format your model outputs — it bridges the gap on the client side. n8n doesn’t have that; it strictly expects OpenAI-format tool_calls in the API response JSON.

So OpenWebUI works despite the missing vLLM flag, while n8n requires proper structured output from the API itself. That’s why the same model behaves differently in the two tools.

1 Like

@Cedric19 , Did it all work out?