Locally hosted LLM is not able to call tools

Hi,

my company and I are so super happy to test out n8n in order to make an educated decision if this is our tool moving forward into the era of Agentic AI. Therefore we are trying to run our local llama 3.3 super 49b with n8n AI Agent and got a problem:

Describe the problem/error/question

I have a on-prem hosted NVD Nim container llama3.3 super:

Set everything up connected everything to an AI Agent and tried to chat. Worked like a charm.
But the next step is to let the AI Agent use Tools. Therefore i changed the type to “Tools Agent” and added a simple WebEx Tool, where a message gets sent to a Room.

THE PROBLEM: LLM can correctly identify the tool and use it, but the tool never gets called. Instead the tool call is outputted via chat:

That is also visible in the executed nodes which shows that the WebEx Tool is not used:

Please share your workflow

Share the output returned by the last node

The Ai Agent then outputs:
[{
“output”: “[{“name”: “Create_a_message_in_Webex_by_Cisco”, “arguments”: {“Text”: “Hi Test”}}]”
}]

Expected output

We would expect the local LLM to be able to use Tools from the AI Tools Agent in n8n and get back to the use with output when the tool usages are finished.

just FYI, with GPT Models (API use) for example this exact workflow is working.

Looking forward to reading your ideas to get this going,
Flo

Information on your n8n setup

  • n8n version: 1.100.0
  • Database (default: SQLite): default
  • n8n EXECUTIONS_PROCESS setting (default: own, main): default
  • Running n8n via (Docker, npm, n8n cloud, desktop app): docker-compose self-hosted
  • Operating system:

Hey @fwasmeier,

Part of the problem could be that a lot of local LLMs are not “smart” enough to know they need to call a tool and don’t support them.

Have you checked to see if the model you are using does allow tool use?

1 Like

See below link for llama models supporting tool calls.

https://www.perplexity.ai/search/which-llama-models-support-too-AdeJi20YR1GXDzwxIMOwwg

Llama Models That Support Tool Calls

Several Llama models support tool calling (sometimes referred to as function calling), allowing them to interact with external APIs, functions, or services. Here’s a breakdown of which Llama models offer this capability and the types of tool calling they support:

Official Meta Llama Models

  • Llama 3.1
    • Supports JSON-based tool calling natively.
    • Widely implemented in platforms like Ollama and Groq, enabling agentic automation and integration with external tools or APIs1234.
    • Available in various parameter sizes (e.g., 8B, 70B, 405B).
  • Llama 3.2
    • Extends upon 3.1 with continued support for JSON-based tool calling.
    • Introduces “pythonic” tool calling, a more flexible and Python-friendly format1.
  • Llama 4
    • Supports both JSON-based and the new pythonic tool calling format.
    • Recommended to use the pythonic tool parser for best results.
    • Supports parallel tool calls, a feature not available in Llama 3.x1.

Community and Fine-Tuned Models

  • Fine-tuned Llama 3 Models
    • Community projects have fine-tuned Llama 3 (e.g., Llama3-8b-instruct) for enhanced function/tool calling, including LoRA adapters and quantized versions for efficient local deployment56.
    • These fine-tuned models are trained on datasets specifically designed for function calling tasks and are available in different formats (16-bit, 4-bit, GGUF for llama.cpp, etc.).
  • TinyLlama
    • A smaller, fine-tuned variant with tool/function calling support, suitable for resource-constrained environments6.

Comparison Table

Model Tool Calling Support Format(s) Supported Notable Features
Llama 3.1 Yes JSON-based Native support, broad adoption
Llama 3.2 Yes JSON, Pythonic Adds pythonic tool calling
Llama 4 Yes JSON, Pythonic Parallel tool calls supported
Llama3-8b-instruct* Yes (fine-tuned) JSON-based Community fine-tune, local use
TinyLlama* Yes (fine-tuned) JSON-based Small, efficient, fine-tuned

*Community fine-tuned models, not official Meta releases.

Key Points

  • Llama 3.1, 3.2, and 4 all support tool calling, with increasing capabilities and flexibility in newer versions123.
  • JSON-based tool calling is the standard across all, while pythonic tool calling is introduced in 3.2 and recommended for Llama 41.
  • Parallel tool calls are only supported in Llama 41.
  • Fine-tuned models such as those from the “unclecode” repository extend tool calling to smaller or more specialized Llama variants56.

In summary, if you need tool calling support, choose Llama 3.1 or newer. For advanced features like pythonic tool calling and parallel execution, Llama 4 is recommended. Fine-tuned community models are also available for specific use cases or lightweight deployments.

  1. Tool Calling - vLLM
  2. Tool support · Ollama Blog
  3. Tool Calling in Llama 3: A Step-by-step Guide To Build Agents - Composio
  4. https://www.reddit.com/r/LocalLLaMA/comments/1eaztwv/quick_review_of_llama_31_tool_calling/
  5. unclecode/llama3-function-call-lora-adapter-240424 · Hugging Face
  6. unclecode/tinyllama-function-call-Q4_K_M_GGFU-250424 · Hugging Face
  7. unclecode/llama3-function-call-Q4_K_M_GGFU-240424 · Hugging Face
  8. https://llama.developer.meta.com/docs/guides/tool-guide/
  9. Tool calls in LLaMa 3.1 - Docs - Braintrust
  10. Llama 4 | Model Cards and Prompt formats
  11. Llama 3.3 | Model Cards and Prompt formats
  12. Tools - LlamaIndex
  13. okamototk/llama-swallow
  14. Function Calling — NVIDIA NIM for Large Language Models (LLMs)

Hi @Jon,

thank you for reaching out quickly.

If you take a look at the model card from NVD it specifically states that the model is trained for tool calling.

I also tested if it was smart enough by changing the Agent Type to “OpenAI Functions Agent”. With that Agent type i get a 400 error (no body) error response from n8n.

This is the console log of using the Model in the “OpenAI Functions Agent”:

2025-06-27T11:08:45.467Z | error | 400 status code (no body) {"file":"error-reporter.js","function":"defaultReport"}
2025-06-27T11:08:45.467Z | debug | Running node "AI Agent" finished with error {"node":"AI Agent","workflowId":"jOqu92akylxZQm06","file":"logger-proxy.js","function":"exports.debug"}
2025-06-27T11:08:45.467Z | debug | Executing hook on node "AI Agent" (hookFunctionsPush) {"executionId":"6809","pushRef":"wvad9rmsml","workflowId":"jOqu92akylxZQm06","file":"execution-lifecycle-hooks.js"}
2025-06-27T11:08:45.468Z | debug | Pushed to frontend: nodeExecuteAfter {"dataType":"nodeExecuteAfter","pushRefs":"wvad9rmsml","file":"abstract.push.js","function":"sendTo"}
2025-06-27T11:08:45.468Z | debug | Workflow execution finished with error {"error":{"level":"warning","tags":{},"context":{},"functionality":"configuration-node","name":"NodeApiError","timestamp":1751022525464,"node":{"parameters":{"notice":"","model":{"__rl":true,"value":"nvidia/llama-3.3-nemotron-super-49b-v1","mode":"list","cachedResultName":"nvidia/llama-3.3-nemotron-super-49b-v1"},"options":{}},"type":"@n8n/n8n-nodes-langchain.lmChatOpenAi","typeVersion":1.2,"position":[-840,-320],"id":"cec34fcd-ddfd-4bcb-b4bd-b97031e8ee17","name":"Local","notesInFlow":true,"credentials":{"openAiApi":{"id":"dX1EaCNOPnvtPwDG","name":"Local Reasoning Model"}}},"messages":["400 status code (no body)"],"httpCode":"400","description":"400 status code (no body)","message":"Bad request - please check your parameters","stack":"NodeApiError: Bad request - please check your parameters\n    at Object.onFailedAttempt (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+n8n-nodes-langchain@file+packages+@n8n+nodes-langchain_9ca6f82764a6c40719e9f8a538948cbd/node_modules/@n8n/n8n-nodes-langchain/nodes/llms/n8nLlmFailedAttemptHandler.ts:26:21)\n    at RetryOperation._fn (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/[email protected]/node_modules/p-retry/index.js:67:20)\n    at processTicksAndRejections (node:internal/process/task_queues:105:5)"},"workflowId":"jOqu92akylxZQm06","file":"logger-proxy.js","function":"exports.debug"}

!!! IMPORTANT
The part that is making me curious: When i use Plan & Execute Agent with the model and attach a Wikipedia tool, it is able to use the tool and come back to the user.

The tests above tell me that the model is in theory capable of calling tools, but somewhere else might be a problem I am not able to see.

Questions that came up from this:

  • is there an difference in how tools are called between the Plan & Execute Agent, OpenAI Functions Agent, Tools Agent
  • Which Agent would be the correct one for this case (the llama3.3 uses the OpenAI API standard)

If I can help you with more information or debugging logs please let me know and I am happy to assist

Thank you @Wouter_Nigrini for providing information. From the model card it is viable for tool calling. Also testing with the Plan and Execute Agent type resulted in a successful usage of the Wikipedia tool already. Problems come with Tools Agent or OpenAI Functions Agent.

1 Like