It would help if there was a node for:
Google Gemini multimodal (Vertex AI)
My use case:
TL;DR: Cheaper, seemingly faster, and possibly better than OpenAI’s AnalyzeImage node.
N8N already has a node for Open AI GPT4 Vision API, called “OpenAI - Analyze Image”. It was released recently, possibly following a request in Please add support of the new OpenAI features [done] - #26 by tomtom
I did a few comparison between OpenAI and Google for the same multimodal use case: image + prompt. Gemini didn’t do bad at all. It seems faster (comparing the Google console with the N8N node, not really a fair comparison) and better creative results (i.e. my impressions, not a fair comparison either)
The biggest difference is the pricing: for an image+prompt, Gemini is 4X cheaper (based on an image of roughly 600x600). Google’s pricing is flat per image, while OpenAI’s pricing is proportional to the image size.
I therefore think that the Google-based node could be more popular than the OpenAI-based node. The UI and parameters for the Google node (prompt + image URL) could be the same as for the OpenAI node.
Any resources to support this?
Vertex AI has a sandbox in the Google cloud console
API docs are at Google Cloud console
Pricing at Pricing | Generative AI on Vertex AI | Google Cloud
I understood that Vertex AI is the name for the GenAI multimodal API. PaLM only does text in inputs and outputs. The model running inside Vertex AI I could test is called “gemini-1.0-pro-vision-001”.
I decline any responsibility for the fact that Google could, at any time, rename their models and product in the most confusing way possible
Are you willing to work on this?
I can make a fork in my workflow and help testing that requested node against the currently available OpenAIanalyzeImage node.