I’m building an automated workflow to intelligently crop decorative wallpaper images based on customer wall dimensions (width × height in cm). The catalog has 7000+ images with very different subjects (flowers, leaves, geometric patterns, figures, etc.).
My current approach:
-
AI Agent with OpenAI Vision analyzes the image and returns crop coordinates as percentages (
left,right,focal_y) -
A JavaScript node converts percentages to pixels based on image dimensions and wall size
-
A Python/Pillow node executes the actual crop
The problem is that GPT Vision estimates coordinates visually and is sometimes imprecise — subjects like leaves or flowers end up slightly cut at the edges.
What I already tried:
-
Setting
temperature: 0for determinism
-
Detailed prompt with design principles and examples

-
focal_yfor vertical positioning
My question: What would be the best approach to improve precision on a large and varied catalog?
-
Refine the prompt further?
-
Add OpenCV in the Python node to detect precise object boundaries after GPT gives the approximate area?
-
Something else entirely?
n8n version: 2.12.2 (Self Hosted) Running via: Docker OS: Windows Executions process: External task runner (Python)