I’ve recently seen on Twitter that OpenAI’s new O3 model has impressive capabilities when working with images, particularly for tasks like cropping. I’m currently using N8N for automation workflows and would love to integrate the O3 model into my setup. However, after trying to implement this myself, I’m struggling to figure out how exactly to utilize the O3 model specifically for image cropping within an N8N workflow. Alternatively, if the O3 model isn’t yet available for integration, I understand the GPT-4 model can also effectively crop images, which would work perfectly for my needs.
Could anyone explain or provide guidance on how to correctly integrate and use OpenAI’s O3 or GPT-4 model in N8N for cropping images?
Without knowing the context it is a bit hard to know what your input / output for this workflow looks like.
eg if you just want to crop an image, you can do that with the “Edit image”-node in n8n, no need for using AI at all. Keep in mind case is assuming you know the cropping-area.
Alternatively, I could imagine you could use AI to some extent to find the cropping are in the first place and then apply it.
Maybe you can provide a bit more context for your case?
What I’m trying to achieve is to specifically use OpenAI’s O3 model (or 4o as an alternative) to interact with images directly via N8N. The workflow I envision goes like this:
• The workflow is triggered.
• The image is passed to OpenAI, along with instructions—for example, “Crop this image to 800x800 pixels for an Instagram post, specifically focusing on Fiona from Shrek.”
• OpenAI processes the image intelligently, cropping it according to my specific instructions, and returns the cropped image as a response.
I’m able to perform this exact process manually in the ChatGPT desktop app, where I upload an image and request specific crops with detailed instructions. Previously, I tried detecting coordinates and then cropping using built-in tools, but leveraging OpenAI directly seems quicker, easier, and more precise.
The main gap is understanding how to connect and interact with the OpenAI models via N8N specifically for image-based tasks like this.
afaik the image features provided in the ChatGPT desktop-application are not yet available via API, see openAIs announcement about 4o-Image-Generation (section " Access and availability") : https://openai.com/index/introducing-4o-image-generation/
it might take several weeks for this to become usable in third-party tools and platforms like n8n. Until then you’ll have to find other ways to implement this in a workflow