N8N tool call sucks

N8N doesn’t work well with tools, sometimes it doesn’t call for some obscure reason, because the tests of the same system prompt in openrouter for example regardless of LLM, the function appears, but n8n ignores it after a few loops, if you for example have 10 questions to ask, and with each answer, a function call to write a spreadsheet for example, on the third or fourth answer, it no longer calls the tool. It seems like it gets lost, I’ve done EVERYTHING. I’ve changed the prompt several times, I’ve redone it with the help of several LLM, and I’ve come to the conclusion that the problem is in N8N. I’ve tested Dify and it works first time with all calls and iterations. N8N was not made for this, very weak.

Other failed attempts:

I have already tried to create an MCP server, I have already described the tool in detail, I have already modified the structure of the agent’s output, I have already tried to inject something into the user input.

Are you using a memory node?
I sometimes find when the context window becomes too big, the agent starts to become unreliable with tool calls.
You can check in the logs if it’s building up a massive input.

Yes, I am using memory, but on my 5th or 6th response it no longer calls the tool. What bothers me is the lack of knowing what LLM actually returned, this is not visible, and the memory being large should not affect it, is this a bug in N8N?

thanks for the reply.

I’ve found the same issue. GPT-4.1 frequently fails, Sonnet-4 is more reliable, and the new Kimi K2, which is supposed to be one of the best tool call models in the world works maybe 1/10 of the time. These models work nearly 100% of the time in Cursor. It’s clearly an n8n issue, not a model or context window issue.

I’m having the same issue. It just seems unreliable, that’s the best way I can put it.
Some of my workflows work perfectly, while others don’t. I’ve even copied and pasted entire n8n agent nodes over, and they don’t work on certain workflows. It’s very confusing. I’ve been trying everything to problem solve this.

I feel like it shouldn’t be that hard to call a tool, especially when some of my prompts are literally "“only calling this tool”

Would love to hear if anyone else is having issues lately, or if this is an n8n issue right now.

Totally understand the frustration—when an agent chain silently drops a Tool Call mid-loop it feels like you’re debugging in the dark. In our testing the main culprit is usually context-window bloat combined with n8n’s default 90-second Node execution ceiling. Each time your agent writes a new row to the spreadsheet it expands the conversation buffer, so by the fourth or fifth question the prompt has doubled in size and the HTTP Request node that hosts the AgentExecutor starts brushing up against the timeout—n8n retries, the agent sees a partial conversation history, and the function call never fires.

A quick mitigation we’ve had success with is inserting a tiny “Gatekeeper” sub-workflow just before the AgentExecutor call. The Gatekeeper does two things: 1) uses tiktoken to count the tokens in conversationHistory and trims overflow older messages, and 2) checks the agent-call latency budget stored in an n8n static data key. If either limit is exceeded it short-circuits with a structured JSON response (e.g., {"status":"deferred","reason":"token_budget"}) that you can handle downstream—no more silent failures.

Long-term, you might consider flipping the topology: let a stateless agent live outside of n8n (LangChain’s Runnable Agent works well) and have it POST only the necessary side-effects back into n8n via a Webhook node. That keeps the conversational context inside the agent runtime and leaves n8n to do what it’s great at—reliable IO and retries.

Curious—have you tried measuring how many tokens your conversation accumulates per loop, and if so, did trimming history improve Tool Call consistency?