Best approach for AI-powered content-aware cropping of 7000+ wallpaper images?

I’m building an automated workflow to intelligently crop decorative wallpaper images based on customer wall dimensions (width × height in cm). The catalog has 7000+ images with very different subjects (flowers, leaves, geometric patterns, figures, etc.).

My current approach:

  1. AI Agent with OpenAI Vision analyzes the image and returns crop coordinates as percentages (left, right, focal_y)

  2. A JavaScript node converts percentages to pixels based on image dimensions and wall size

  3. A Python/Pillow node executes the actual crop

The problem is that GPT Vision estimates coordinates visually and is sometimes imprecise — subjects like leaves or flowers end up slightly cut at the edges.

What I already tried:

  • Setting temperature: 0 for determinism :white_check_mark:

  • Detailed prompt with design principles and examples :white_check_mark:

  • focal_y for vertical positioning :white_check_mark:

My question: What would be the best approach to improve precision on a large and varied catalog?

  • Refine the prompt further?

  • Add OpenCV in the Python node to detect precise object boundaries after GPT gives the approximate area?

  • Something else entirely?

n8n version: 2.12.2 (Self Hosted) Running via: Docker OS: Windows Executions process: External task runner (Python)

Hey @Alessio_Jeshili, welcome!
I think using this:

would improve your project.

Also, prompts are everything in AI agents, so you have to check your prompts here:

Your project idea is awesome, but 7k images are very concerning. Instead, why don’t you let AI generate the image based on the user requirement? I think that would be cheaper than actually picking an image from that huge catalogue.