I have two memory nodes in testing. One local (docker, synology) and one on Supabase. The tables are freshly created. Prompt and chat model are exactly the same. The memory node settings are the same, except for the database connection. On Supabase I only get the German response (which suits me just fine), but with the local model (and also with the Easy Memory from n8n, by the way) I always get the English thoughts included, even though I explicitly forbade them in the prompt. What could be the reason for this? The messages I don’t like look something like this:
… works, though he’s more famous for his orchestral and choral works. Let me respond in a friendly way with the correct information. Johannes Brahms starb …
Did I set something wrong? A bug? Thanks for any help, best regards Thomas
Hi @musicanera Welcome!
I think you need to for now upgrade your @langchain/ollama to v1.0.3 inside your Docker container, and then even after it shows the same behavior we can move forward to other debug setup.
welcome to the n8n community @musicanera
I’d start by looking at your Session Key and the actual history being pulled from memory. When the same prompt acts differently between Supabase and a local Postgres, it usually means the model is getting a different context in each case.
If I were you, I’d try a few things: use a brand-new Session Key for each test, double-check that your tables are actually empty, and make sure the Context Window Length is identical. Also, just verify that the Chat Trigger and Agent are definitely hooked up to the same memory node.
The fact that you’re seeing that English snippet is usually a red flag that some previous context is leaking in or that the model’s raw output is hitting the final node. I’d suggest inspecting the JSON output of the last node—that should show you exactly what’s being passed along.
Are you using a shared database for both environments, or are they completely isolated?
This is actually a pretty interesting issue. The English thoughts showing up in your local memory is likely coming from the stored conversation history in Postgres, not from the current prompt. The memory node retrieves past messages and includes them in context, so if earlier conversations had English content, it will show up again regardless of your prompt language instruction.
A few things to check:
Clear the memory table for your local node and start fresh to see if the English stops appearing
Check the “Session ID” setting on both memory nodes. If they share the same session or table but different connection strings, they might be pulling from overlapping data
The Supabase node may be stricter about session isolation by default
Basically the model is not ignoring your German-only instruction, it’s just echoing back what was already stored. Flushing the table should confirm this.
It has actually turned out that the database is not responsible for the behavior, but rather the query of the Chat-LLM-Node. With Ollama, the problem of “multilingualism in the output” occurs reproducibly; with OpenRouter and the same LLM, the output is flawless. I have tried running VACUUM ANALYZE on the memory_table and clearing the workflow cache – success was achieved once, but not reproducibly. The rest of the workflow remains the same; the database is either local on the machine or on Supabase in the cloud. So I strongly suspect that Ollama in combination with n8n is causing the trouble. Unfortunately, I’m not making any progress with the troubleshooting. It’s also irritating that there is a workflow that generates a flawless chat output with the help of Ollama (with comparable settings to the other workflows).
@musicanera so this is an ollama-specific thing, the model is leaking its chain-of-thought into the output. add "num_predict": -1 and "raw": true in your Ollama model node under Additional Options to stop it from dumping internal reasoning into the response, the openrouter endpoint strips taht automatically which is why it works fine there.
The num_predict: -1 and raw: true fix for Ollama is a good find. I ran into similar behavior when building a FB Messenger chatbot with n8n + Ollama. The chain-of-thought leaking into responses caused real problems in production before we switched the LLM setup. For anyone hitting this issue, also worth checking if your system prompt explicitly tells the model not to include internal reasoning in responses. That helped us stabilize output across sessions.
Thank you for the hint. Unfortunately, I can’t find a way to set num_predict: -1 and raw: true in the Ollama Chat Model settings. I don’t see anything in the “Parameters” tab. Perhaps I could define something like that in the “Settings” tab instead. Would setting “Max Tokens to generate” to -1 be an equivalent replacement?
@musicanera yeah max tokens -1 won’t do the same thing, that just controls output length. drop a Code node right before ur memory node and strip the thinking manually with something like $json.output.replace(/<think>[\s\S]*?<\/think>/g, '').trim() — ollama models wrap their reasoning in <think> tags and the langchain node doesn’t clean that up before storing it.