I am using conversational agents and the in store memory to keep track of conversations and its working well, until I’ve tried to add OpenAI Vision API to the mix.
There is currently no way to get vision to work with uploading a base64 or image URL as is required to get openAI to see the image and process it.
So no I have a HTTP node that makes an API call and gets a reply, but I cant get this reply added to the memory store as it hasn’t come via a conversational agent, and if you try to take that output from the HTTP node as an input to the conversational agent, it tries to reply to itself and we end up with a double usage of the API for the same question.
There must be a way to get it to add this HTTP OpenAI vision input to the conversational agents.
I don’t fully understand what you are trying to do here, Are you trying to add the output of a normal node to a memory store so you can use it? Are you calling your http node as a tool maybe in another n8n workflow?
This is the payload, but cant be used in the conversational agent inputs so I have to end up using a HTTP node, even though there is an option for GPT vision in the model options attached to the AI agent.
{
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image? If it is food extract potential calories."
},
{
"type": "image_url",
"image_url": {
"url": "data:image/webp;base64,{{ $json["base64Image"] }}"
}
}
]
}
],
"max_tokens": 300
}
takes its output as input and tries to answer its answer. thus not useful.
I’ll have a look at the subworkflow thing and see how I get on. which might be a workaround until the conversation agent can be updated to accept image inputs etc.