Plan and Execute Agent with Claude model output

Daniel_Wiesiu · February 24, 2025, 1:40pm

Hello everyone,

I am currently testing the Plan and Execute Agent using two models—Claude Haiku 3.5 and GPT-4o. My workflow is relatively simple: a chat message takes user input as a prompt, which is then processed by the models.

Through various tests with both basic and more complex prompts, I have noticed a significant difference in their performance. For simpler tasks, both models perform comparably. However, when handling more advanced prompts, Claude Haiku 3.5 tends to struggle.

Specifically, Haiku seems to drift off-topic, using substantially more tokens than GPT-4o while still failing to complete the assigned task properly. I wanted to ask if there is a known reason for this behavior. Is this simply a characteristic of the Claude model, or could it be related to how the Plan and Execute Agent operates internally? Could there be some underlying issue with how n8n interacts with Claude models?

Aside from refining the prompt or adjusting model parameters, are there any potential fixes for this? Of course, one option is to use GPT-4o instead, but my personal preference leans towards Claude.

Thanks for any ideas!

n8n · February 24, 2025, 1:40pm

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

Alex_Huang · February 24, 2025, 1:54pm

with my past experiences with this two different model, GPT-4O is performing much better than claude in understanding. But claude is perfect to do some implement job like write code.

Daniel_Wiesiu · February 24, 2025, 3:48pm

I totally agree with the writing code part. In most of the cases I used it, Claude gave much more valuable and more concise solutions.

system · May 25, 2025, 3:48pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.