Get rid of this sudden thinking output

Describe the problem/error/question

I am using the self hosted docker. Using models on Ollama cloud, I get the thinking output in the agent response and it’s killing my workflows. There seems to be no way to shut this off. This is a new issue and it’s really a problem here. Does anyone have a fix? I can’t be the only person with this problem. the response used to be only what follows “Let me craft the response:”. Now it’s got all this extra stuff. Not only on this workflow but many others. thank you

What is the error message (if any)?

Please share your workflow

(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)

Share the output returned by the last node

[
{
“output”: “The tool provided detailed weather information. Let me compose a concise response under 390 characters that includes the key weather info, tides (which timed out so I’ll note that), and any extreme events (none).\n\nLet me craft the response::sun_behind_rain_cloud: San Diego Weather Now: 64°F, partly cloudy, humidity 82%, winds WNW 5 mph. Barometer 29.92 in.\n\nTonight: Partly cloudy, low ~55°F, winds calming.\n\nFri: Mostly sunny, high 67°F, W gusts to 20 mph.\n\nSat: 30% chance showers after 11 AM, high 65°F.\n\nSat night: Showers likely (70%).\n\nSun: 50% showers tapering off, high 65°F.\n\n​:sunrise: Sunrise 6:10 AM / Sunset 7:25 PM\n​:warning: No active weather alerts.\n\nTide data unavailable at this time — check back for updates!”
}
]

Information on your n8n setup

  • n8n version: 2.17.5
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app): docker
  • Operating system: ubuntu

I added a structured output parser to the affected workloads and it seems to have resolved it.

{
“thinking”: “Your internal step-by-step chain of thought here.”,
“final_answer”: “The actual data or response for the user.”,
“category”: “Classification label”
}

Hi @Barry_Bahrami, welcome!
It is good to see the output parser working, but I do not think that is an ideal solution that is good but not for this use case. Just make your system prompt very strict towards output, or even better, just use a better model like GPT-4o or higher.

Glad it’s working.

just adding: Your workflow now handles the extra thinking output cleanly, even though the model is still generating it, so if it stays stable that’s totally fine to keep using. I’d also check whether the Ollama Cloud model tag changed recently, since upstream model updates can change default behavior overnight. Long term, pinning a specific version or switching to a stable model without reasoning output would be the cleaner fix if token usage, latency, or formatting become an issue.

@Barry_Bahrami the ollama chat model node doesn’t expose the think parameter but ollama’s api supports "think": false to kill reasoning at the source — no stripping needed, saves tokens too. bypass the langchain ollama node and hit the api directly with HTTP Request:

swap your-ollama-host for your actual ollama url and the model name to whatever you’re running, done.