I’m using an http node to make an API request to Gemini using text and an image. I used an extract data to bas65 string node to use that string in the Gemini node. But when the output is returned this is what I receive:
Honestly, I am reading the prompt for the second LLM Chain node and I am rather confused…
I've generated an image description. I want you to take that image description
and then give me five variants of that image based on the following input.
Input:
Bright orange background, cartoonish character in the middle, text
"Upgrade your systems today", Leftclick logo bottom right hand corner
(mouse pointer icon).
Return your output in the following JSON format:
{
"variants": [
"First modified image description",
"Second modified image description",
"Etc"
]
}
Rules:
-Your task is to generate new descriptions based on the changes that we
will later feed into an image generation model.
-So take the original input, modify the original description, and then generate
four other variants with slight modifications to things like color, style,
copy, etc.
-Do not change company names.
-Do not make large changes (Only on colors)
It is not clear whether you want the output to be images or descriptions (both are mentioned):
“give me five variants of that image”
“First modified image description”, etc
generate new descriptions
generate four other variants (of an image)
The following sentence is pure confusion: “Leftclick logo bottom right hand corner
(mouse pointer icon).”
I apologize. Apparently I needed another cup of coffee. That last sentence was form another node I was configuring that got there because I copy an pasted. I realize that I was wasting token input on some of the rules so I omitted them. Anyway, here’s the new input of the node you asked for:
I’ve generated an image description. I want you to take that image description and then give me five variants of that image based on the following input.
Rules:
-So take the original input, modify the original description, and then generate four other description variants with slight modifications to things like color, style, copy, etc.
-Do not change company names.
-Do not make large changes (Only on colors)
BUT the main problem is on the las node (which is the one I want to output an image)
hi @jotyrojo
Im facing a very similar problem. Thought of sharing here in case this helps you or maybe it can lead to finding a solution for both of us.
I followed a slightly different approach:
Added an HTTP request to use Google’s Imagen 3
It returns a bytesBase64encoded (I validated the response with an online tool and I can see the image)
I then pass it through a Code block to convert to binary
It “apparently” generates the image well, but I cannot see it.
The only thing I can think of is that because I’m self-hosting n8n there might be some filesystem permissions not set well…
My problem is: I cannot get the image to post on LinkedIn or X, etc.
I’ve decided to start using OpenAI since Gemini started giving me a lot of trouble saying that an image of a cute bear is against policy. When I kept trying it kept responding me like this:
“I am not an image generator model.”
“The image is against policy.”
It even gave me an excuse when trying to generate an image of bees that it might be confused on be sexual in nature.
Also, when asking for an e-coli (a bacteria) image: basically, said that image generated medical in essence and it could be used to spread false information.
I even tried asking for an image generating prompt that doesn’t violate the policy… when I used the prompt… same thing.
Anyway, I’m tired of Gemini error and difficulty in contrast to OpenAI. I learned a lot with Gemini thou. Maybe I’ll keep it for individual cases Thanks for the help, everyone