AI Conversation Agents and Memory Store

I am using conversational agents and the in store memory to keep track of conversations and its working well, until I’ve tried to add OpenAI Vision API to the mix.

There is currently no way to get vision to work with uploading a base64 or image URL as is required to get openAI to see the image and process it.

So no I have a HTTP node that makes an API call and gets a reply, but I cant get this reply added to the memory store as it hasn’t come via a conversational agent, and if you try to take that output from the HTTP node as an input to the conversational agent, it tries to reply to itself and we end up with a double usage of the API for the same question.

There must be a way to get it to add this HTTP OpenAI vision input to the conversational agents.

Vision - OpenAI API

payload = {
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Hey @RedPacketSec,

I don’t fully understand what you are trying to do here, Are you trying to add the output of a normal node to a memory store so you can use it? Are you calling your http node as a tool maybe in another n8n workflow?

maybe this will help?

HTTP GPT VISION API CALL

This is the payload, but cant be used in the conversational agent inputs so I have to end up using a HTTP node, even though there is an option for GPT vision in the model options attached to the AI agent.

{
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image? If it is food extract potential calories."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/webp;base64,{{ $json["base64Image"] }}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

Does that make a little more sense?

That makes it more confusing, You are not going through the conversational agent so I wouldn’t expect the memory to work.

What does happen if you use the output of the http request node in the conversational agent as an input :thinking:

Or what happens if you put that http request node in a workflow then call it as a took from the agent does that work?

It

takes its output as input and tries to answer its answer. thus not useful.

I’ll have a look at the subworkflow thing and see how I get on. which might be a workaround until the conversation agent can be updated to accept image inputs etc.