Hi everyone,
I’m building an agent in N8N that needs to process both text and image inputs as part of my use case.
I have image URLs available, which I want to pass to the agent so that the LLM can access and analyze the image content, and provide a verdict based on the given instructions.
However, when I pass the image URLs directly, the LLM doesn’t seem to fetch or interpret the actual image. Instead, it just reads the URL as plain text and generates a response based only on that—completely ignoring the image content.
Currently, I am using Sonnet 3.5 through AWS bedrock credentials, but I have the bandwidth to switch to Gpt4o/4o-mini through Azure Openai services.
Is there a correct way to pass image URLs (or image data) so that the LLM actually processes the image? Any guidance would be greatly appreciated!