Tool calling chain with local Ollama models (7b/14b) - 2nd tool never executed

Tool calling chain with local Ollama models (7b/14b) - 2nd tool never executed

Describe the problem/error/question

Tool calling chain with local Ollama models (7b/14b) - 2nd tool never executed.
I’m building a multi-agent workflow in n8n with an AI Agent node using Ollama
(tested with qwen2.5:7b, qwen2.5:14b, qwen2.5:32b, llama3.1:8b, mistral:7b, mistral-small:24b).
The AI agent has 4 tools: agent_bdd, agent_api, agent_redacteur, synthese_vocale.
I tested each tool individually they all work perfectly.
I also tested linking all 4 tools without AI Agent using sequential Execute Workflow nodes it works perfectly too.
But when I use the AI Agent to orchestrate the system (with 7b/8b/14b and 24b models):

  • Tool 1 (agent_bdd) is called correctly and the output is the good output
  • Tool 2 (agent_api) is not executed the agent writes “Calling agent_api with input: {…}” as plain text in its response .
    Notes:
  • I put Temperature at 0.0.
  • The tools are properly configured (Call n8n Workflow, published, correct inputs)
  • The issue persists regardless of prompt length or structure
  • qwen2.5:32b works correctly and is the only model that executes all 4 tools correctly, but is too slow for our use case. I want to use a 7b or 8b model. I think the structure is pretty basic, so I don’t understand why I need a 32b model just to call 4 tools…
    Has anyone successfully chained 3+ tool calls with a 7b/14b local model in n8n?
    Any prompt engineering tips or model recommendations?
    Thank you
    n8n version: 2.10.3 (self-hosted)
    Models tested: qwen2.5:7b, qwen2.5:14b, qwen2.5:32b, llama3.1:8b, mistral:7b, mistral-small:24b

What is the error message (if any)?

Please share your workflow

(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)

Share the output returned by the last node

Information on your n8n setup

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Hi @Bapt

This isn’t an n8n bug, it’s a model capability limitation. When the agent writes “Calling agent_api with input: {…}” as plain text instead of actually firing the tool, it means the model is describing the action instead of generating the structured tool call format that n8n expects. Smaller models tend to lose track of that format after the first tool result comes back.

A couple of things to check and try:

First, make sure you’re using the Ollama Chat Model node (not the regular Ollama Model node). The non-chat version doesn’t support tool calling at all according to n8n docs.

Second, try qwen3:8b instead of qwen2.5. Ollama’s own tool calling docs use qwen3 as the reference model, and the qwen3 family improved structured output handling significantly. You can check which models are officially tagged for tool support here: https://ollama.com/search?c=tools

Third, 4 tools on a 7b model is a lot to juggle. If qwen3:8b still drops the second call, consider splitting the orchestration: one lightweight agent picks which sub-agent to call, and each sub-agent has fewer tools. This reduces the cognitive load on the model at each step.

Bottom line: multi-step tool chaining with 4 tools genuinely needs more model capacity than most 7b models can handle reliably.

2 Likes

Hi houda_ben !

Thank you so much for your detailed and helpful response!

I tried qwen3:8b as you suggested and it works perfectly — all 4 tools are called correctly and in the right order. This solves my problem!

I was indeed using the Ollama Chat Model node, so that wasn’t the issue. The key was switching from qwen2.5 to qwen3 family which clearly handles tool calling much better.

I’ll also keep in mind your suggestion about splitting the orchestration into smaller agents as a fallback for future cases.

Thanks again, this saved me a lot of time!

2 Likes

This is a known limitation with smaller models and multi-step tool calling. Here’s what’s happening and how to work around it:

## Why smaller models fail after the first tool call

When the AI Agent calls Tool 1 and gets a response, the context window now contains the system prompt + user message + Tool 1 call + Tool 1 result. At this point, smaller models (7B/8B) often lose track of the “I need to make another structured tool call” pattern and instead **hallucinate** the tool call as plain text — they write “Calling agent_api…” instead of emitting the proper function-call JSON.

This happens because:

1. **Context pressure** — after Tool 1’s output, the model has more tokens to track, and 7B models struggle to maintain the tool-calling format consistently across multiple turns

2. **Tool-calling fine-tuning depth** — 32B models like qwen2.5:32b have more robust instruction-following for structured outputs across multiple rounds

## Practical workarounds

### 1. Break into sequential sub-agents (most reliable)

Since your tools work perfectly in a chain of Execute Workflow nodes, **keep that architecture** but have each sub-agent handle only 1 tool:

`Agent 1 (7b) → calls agent_bdd → Agent 2 → calls agent_api → Agent 3 → calls agent_redacteur → Agent 4 → calls synthese_vocale`

Each agent only sees one tool, so it always calls it correctly.

### 2. Use models fine-tuned for tool calling

Some models handle multi-tool chains much better at small sizes:

- **qwen2.5-coder:14b** — the coder variant is better at structured output

- **mistral-small:22b** — Mistral’s tool-calling fine-tuning is stronger

- **command-r:7b** (Cohere) — specifically trained for multi-step tool use

### 3. Simplify the system prompt

With smaller models, every token counts:

- Keep tool descriptions minimal

- Add explicit instructions: “You MUST call each tool using the function call format. Never write tool calls as text.”

- Add step-by-step: “Step 1: Call agent_bdd. Step 2: Call agent_api with the result. Step 3: Call agent_redacteur. Step 4: Call synthese_vocale.”

### 4. Reduce Tool 1 output size

If agent_bdd returns a large result, it pushes the model past its effective working memory. Try truncating or summarizing tool output before it goes back to the agent.

## My recommendation

Option 1 (sequential sub-agents) is the most reliable. You already confirmed the sequential chain works — just wrap each step in a lightweight agent node that handles a single tool. You get reliability with the flexibility of AI-driven orchestration at each step.

Hope this helps!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.