Feature Request: Add MCP return control to avoid returning binary/base64 in workflow execution results

Describe the problem/error/question

I am using the new Instance-level MCP integration in n8n together with OpenAI’s Developer Mode (Tools / MCP host).

  • When I call text-only workflows through MCP, everything works perfectly.

  • The problem appears when I call a workflow that includes an HTTP Request node calling Google Gemini’s Nano Banana image model, which returns base64-encoded image data.

Inside my n8n workflow, I:

  1. Receive the base64 image result from Gemini,

  2. Process it (convert/upload it to an image hosting service),

  3. And ensure that the final node output is only a small JSON with an image URL, for example:

{
  "image_url": "https://…/generated.png",
  "prompt": "some christmas card prompt"
}

From the workflow designer’s perspective, there should be no large binary data left in the final output.

However, when this workflow is executed via Instance-level MCP from OpenAI Developer Mode, I observe:

  • The OpenAI MCP tool call response becomes extremely slow,

  • The conversation context in OpenAI is effectively “blown up”,

  • The LLM quickly hits context limits and begins with hullucination answers.

By contrast, the same MCP setup with text-only workflows does not show this behavior.

Because I don’t have direct visibility into the exact JSON that n8n returns to the MCP host, I can’t definitively prove it, but:

Based on the behavior, it is a strong hypothesis that the MCP execution response currently includes full workflow run data, including intermediate nodes with base64/binary content (e.g. the raw Gemini image result), even though the last node returns only a small JSON.

For workflows that involve image generation or any binary payload, this behavior makes the MCP integration practically unusable with LLMs, because the host (OpenAI) tries to load all of that internal data into the model’s context window.

What is the error message (if any)?

There is no explicit error from n8n.

On the OpenAI side, the symptoms are:

  • Tool responses that appear too large,

  • Context being exhausted very quickly after a single tool call,

  • In some cases, behavior consistent with truncated responses or the model silently dropping parts of the tool output.

So this is not a runtime crash in n8n, but rather a payload-size / context-limit issue on the LLM host side, very likely caused by the size of the MCP response.

Please share your workflow

Share the output returned by the last node

Information on your n8n setup

  • n8n version: Version 1.121.3
  • Database (default: SQLite): SQLITE
  • n8n EXECUTIONS_PROCESS setting (default: own, main): main
  • Running n8n via (Docker, npm, n8n cloud, desktop app): Hostinger, docker
  • Operating system: Ubuntu 24

Just a short clarification for the request:

When using Instance-level MCP, workflows that include image-generation or other binary steps (like Gemini Nano Banana) cause the MCP response to include the entire binary payload from intermediate nodes — even if the final workflow output is only a small JSON value (such as an image URL).

This makes the response extremely large and will overflow the MCP host context (e.g., OpenAI).

So the suggestion is:

Binary data generated inside the workflow should not be returned through MCP unless explicitly requested.

A clean mode like “return only final node output” or “strip binary from runData before returning” would solve this.

If my understanding is wrong, dont hesitate to correct me. thanks…

Thanks :folded_hands:

Unfortunately: it is true what you said! I also uses the new instance-level MCP functionality.

By using the tool “chatbox” I am able to see exactly what is transfered from n8n. And that is a lot of stuff! Not only binary data but also secrets as used in http requests!

So: when a MCP client uses a N8N MCP Server it gets the whole internal data from the workflow. Any step. Any parameter…

How to hinder that behavior?

P.S. I did not tried yet the old school MCP server variant of N8N…

Update: I’m now using the “oldschool” MCP server node and replaced the instance-level MCP functionality by one workflow which acts as the central MCP server instance.

Advantage: the old MCP server node does not show the behavior as descriped above. No additional, no internal information published.

+1 — this is a major blocker for using instance-level MCP with AI coding assistants.

Running n8n 2.6.3 on PostgreSQL 18 + pgvector. I have a Context Store (RAG pipeline) with 8 workflows exposed via the instance-level MCP server (/mcp-server/http). The primary workflow does hybrid semantic + fulltext search, then synthesizes an answer via a local LLM (Ollama).

The problem in numbers:

  • Webhook response (via Respond to Webhook node): ~600 chars — just the answer, sources, and metadata
  • MCP execute_workflow response: ~200,000+ tokens — full resultData including every intermediate node’s output (SQL queries, raw search results, embedding vectors, Ollama HTTP request/response bodies, etc.)

That’s a 300x bloat factor. When Claude (claude.ai) calls this tool via MCP, those ~200K tokens consume context window for no reason. The answer is 600 chars — the other 99.7% is internal execution data the AI client never needs.

What I had to do as a workaround:
For my CLI tool (Claude Code), I bypass MCP entirely and call the webhook URLs directly via curl. This returns only the curated Respond to Webhook output. But Claude Chat (claude.ai) can only use MCP — there’s no way to make it call a webhook directly. So my only options are:

  1. Build MCP Server Trigger node wrappers for each workflow (adds complexity, duplicates routing)
  2. Build a custom external MCP server that proxies to my webhooks
  3. Wait for this feature

Suggestion:
The instance-level MCP should respect the Respond to Webhook node (or an equivalent Respond to MCP node) the same way the Webhook trigger does with responseMode: “responseNode”. If a workflow has a designated response node, only return that node’s output via MCP — not the entire resultData. This would make instance-level MCP actually usable for production AI tool integrations.

For workflows without a response node, the current behavior (full resultData) could remain as the default for backwards compatibility.