I have an AI Agent in a subworkflow that has several AI Agent tools, each of which it should be calling every time it gets called. However, it’s not calling them reliably - sometimes it does, mostly it doesn’t.
Here’s my subworkflow:
And here’s my system prompt:
You are a software program manager, working with a team of agents. Your job is NOT to create anything yourself, but to use your tools and team to create actionable documentation that a team of developers and designers can use to build an MVP.
Use your team to:
Determine the features and requirements
Determine the tech to be used
Write a specifications document
Create visual mockups
Save all the assets created - documentation, designs, and anything else - in Google Drive and/or Docs.
When you are done, provide an explanation for EACH TOOL of why you did or did not use it.
TOOLS / TEAM
Product Manager - Use this to write out clear users stories and requirements.
Software architect - Use this to consider the requirements and decide on the tech stack and other services / software that best suits the requirements.
Engineer - Use this to write details specs after the user stories and the tech stack have been documented.
Designer - Use this to create visual mockups for the app after the requirements, tech, stack, and specs have been defined.
File Manager - Use this to organize and store all assets created in Google Drive
I’ve also prompted it to provide an explanation of why it does or doesn’t use each tool, and it doesn’t output that at all.
I’m pretty new at this so any help is appreciated. Thanks!
Hey there! It sounds like your agent might be suffering from “decision fatigue” or unclear instructions. To help narrow down the cause, here are a few strategies and questions to help you troubleshoot:
1. Refine the “Why” and “When”
AI Agents rely heavily on the Description field of each tool to decide if it’s relevant.
-
Are your tool descriptions distinct? If two tools have overlapping descriptions, the AI might get “lazy” and pick the first one it sees.
-
Try this: Explicitly state in the tool description: “Use this tool ALWAYS when [Scenario X] occurs.”
2. Implement Few-Shot Prompting
Sometimes telling the AI what to do isn’t enough; you need to show it.
- Have you tried adding examples to your System Prompt? Providing 2-3 “Golden Examples” of a conversation where the agent correctly calls every tool can significantly improve reliability.
3. Scenario Mapping
It’s helpful to create a simple logic table for yourself to ensure there isn’t a “dead end” in your workflow logic.
- The Test: Create a list of 5–10 varied prompts. Does the agent have a clear path to a tool for every single one? If the logic feels fuzzy to you, it will definitely be fuzzy to the LLM.
4. Optimize with n8n Evaluations
If you’ve done the above and it’s still hit-or-miss, it’s time for data-driven refining.
- Use the n8n Eval node or the built-in evaluation features. By running your scenarios through an evaluation pipeline, you can tweak your prompts and immediately see if the “success rate” of tool calls goes up or down.
2 Likes
Thank you for the help!
I asked the agent to list each tool and explain why it did or did not use it, but it doesn’t do that either.
Example sounds like a good idea, but since the output I’m expecting is so large I’m not sure I could include 3 in the system prompt. This same issue applies to evals - I read the documentation you linked to and it seems like this would work for short, know expected outputs but not extended documentation I’m trying to create. Are there any resources you could point me to on that?
For this type of agent use case, do you recommend more of a thinking model or not? It’s just orchestrating so I have it on GPT 4.1 mini, but I also attached a Thinking node. I’m not sure if it’s doing too much thinking so it doesn’t think it needs tools, or not enough thinking to the point where it’s unaware of the tools.
You need the examples and evals just for the tool selection part. I think that isnt a big output, its just the tool name.
you dont need to give the full response from every tool, just the first part of the flow, so the selection.
you know what i mean?