I’m currently developing the invoicing module for one of my applications, but the AI agent takes an unreasonably long time to complete its task, typically between 2 to 5 minutes. In my opinion, the task itself isn’t very complex, which makes me wonder if there are far more efficient ways to design or implement this process?
The process works as follows: The invoice is retrieved from our database, converted into Base64, and sent to Gemini 2.5 Flash for a quick validation check to determine whether it’s a genuine invoice. The result is then passed on to AI Agent1, whose task is to extract the remaining details and produce the output in JSON format.
I’ve noticed that GPT-5 Nano isn’t actually as fast as expected. In fact, I’ve been getting faster results with 4.1 Mini. However, the pricing difference between them is significant, and Mini doesn’t always adhere to the required JSON output format, which often leads to errors.
I’m really curious to hear how you’d approach this and if you’ve encountered a similar problem before. Even a 50% reduction in time would be a huge improvement. Would memory be helpful in this context, or is it designed mainly for chats?
I’ve noticed that every time it updates something, it seems to run the full system prompt and everything again. Is that normal? Is there a way to stop this / go around this?
It’s not about the model itself is taking much time to think or act, but that it repeats its actions more than once. As I’m seeing in the image you provided every tool is called around 5 to 9 times which is unnecessary and definitely not normal(As long as this is not what you intended). Either way this is a huge amount of token usage for a single execution.
It’s likely caused by an unclear system prompt/message. Being more specific could be a solution.
Hey @Mookie_Lian, thanks for your reply! That’s interesting. Here’s what I think is happening: the invoice contains multiple services, and each service is being checked by another AI agent. It seems like the agent is making a separate call for each service, so if there are four services on the invoice, that results in four calls.
Maybe I could avoid this by using a prompt like: “When calling AI Agent X, only make one call containing all the items you need answers for”?
I was also thinking to have one AI Agent before that is formatting the JSON, then a split node to split up the tasks to different lists and send those to 3 different Execute Workflow nodes that are set to “run once for each item” to split up the searches a bit?