What I’m Building
I’m building a conversational AI agent for creative professionals (starting with surface designers). Core goals:
-
Understand who the user is (brand, style, use cases)
-
Understand what they’re making (project goal, resolution, aspect ratio)
-
Generate images conversationally — no prompt engineering
-
Iterate naturally — “make it more vibrant” uses previous image
-
Adapt output based on user role (pattern vs mockup)
Architecture
Stack: n8n, Google Gemini, AWS S3, MongoDB
Main Workflow Flow:Chat Trigger → Load Project Settings → Load Long-term Memory → AI Agent → Image Tool → Save to MongoDB
Image Sub-workflow:
-
Receives: userId, projectId, projectSettings, userSettings, content
-
Enriches prompt with brand context
-
Calls Gemini 3.1 Flash Image Preview
-
Uploads to S3 → Saves to MongoDB
What I Need Feedback On
1. System Message (Main Workflow)
# ROLE
You are a creative partner for {{ $('Load Long-term Memory').item.json.name }}.
Keep responses short and conversational unless the user asks for more.
# CONTEXT
## USER SETTINGS
Use the following data to tailor tone, preferences, and decisions:
{{ $('Load Long-term Memory').item.json.userSettings.toJsonString() }}
## PROJECT SETTINGS
Align all outputs with the current project’s goals, style, and constraints:
{{ $('Load Project Settings').item.json.projectSettings.toJsonString() }}
# TOOL
## ImageTool
Call ImageTool whenever the user requests anything visual. Do not ask for confirmation.
When calling, content.prompt must be a complete brief — synthesize their request
with their brand, style, and project goal. Never pass raw user words alone.
New image → content.prompt only.
Iteration → content.input_image_s3_key from the last tool result in memory
+ content.prompt describing what to change and what to preserve.
After ImageTool returns, reply in 1-2 sentences and offer one next step.
Question: Is this too much instruction? Too little? How do you balance guidance without hardcoding behavior?
- Iteration Logic
Current flow:
-
User: “make it more vibrant”
-
Agent finds last assistant message with attachment in short-term memory
-
Extracts s3_key → calls ImageTool with
content.input_image_s3_key
Question: Is this the right pattern? How do you handle “use this uploaded image AND make it like that previous one” (multiple references)?
- User Role Adaptation
I want different outputs based on user role:
-
Surface designer → flat patterns, no product mockups
-
Marketer → lifestyle images, mockups
Currently handled in the sub-workflow’s prompt enrichment.
Question: Should this logic live in the system message or the tool? Where’s the right place?
4. Multi-Image Support (Not yet implemented)
Question: How should multiple user-uploaded images be passed to the tool? Should the sub-workflow support multiple reference images? How does the agent decide primary reference?
5. Hardcoded vs Dynamic
A core principle from my team: “We don’t want hard-coded instructions. This is not programming.”
But I find myself adding more structure for consistency.
Question: Where’s the line between “guidance” and “hardcoding”? How do you strike the balance?