[Workflow Included] I built an n8n pipeline that turns messy supplier docs into publish-ready store content

:waving_hand: Hey n8n Community,

A friend of mine runs an online store, and for every new product they get supplier inputs in whatever format the supplier feels like: spec PDFs, Excel sheets, a few photos, some loose notes. Someone then hand-writes the title, descriptions, specs and SEO fields. I built them a pipeline that does it end to end, and I’m sharing all four workflows.

What it does: intake form → extract specs → analyse photos → generate content → poll status. Drop in the files and notes, get back review-ready content (title, descriptions, meta fields, features, tags, attributes).

The four workflows

  • WF1 – Intake & spec extraction. Saves files to Drive, routes each by type (PDFs/images → easybits Extractor, Excel → Code node), merges into one spec object, resolves brand, hands off to WF2.

  • WF2 – Image analysis. Runs each photo through an Extractor pipeline to capture what’s visible (colour, features, angle), then passes it to WF3.

  • WF3 – Content generation. Builds context from spec + image data + notes and has Gemini write the full content set. Hard rule: only features that are in the spec or visible in the images, no inventing.

  • WF4 – Status polling. A small webhook the frontend polls for progress and the finished draft.

Extractor setup

  • n8n Cloud: verified node, just search easybits Extractor in the node panel. No install.

  • Self-hosted: Settings → Community Nodes → Install → @easybits/n8n-nodes-extractor.

Then create a pipeline at easybits, define your fields, and paste the Pipeline ID + API key into the node. It reads the binary straight from the previous node.

Workflows (all four, sanitized): n8n-workflows/easybits-product-content-creation-workflow at e3103344d9b3358402dc38a3a862d510bb4e7c5e · felix-sattler-easybits/n8n-workflows · GitHub

Cross-workflow calls use placeholder IDs you re-point after import, plus your own Google + Extractor credentials.

How do you handle brand-voice consistency in generated content? I went with a per-brand profile the model reads from, curious if others template it harder.

Best,
Felix

1 Like

The four-workflow split is the right call here - keeping intake/extraction, image analysis, content gen, and polling as separate workflows makes each piece testable independently and lets you retry a failed WF3 without reprocessing all the files.

One thing worth adding to WF3: a post-generation validation step that checks the output against the original spec object. Since you’re constraining Gemini to not invent facts, a Code node that verifies key extracted values (e.g., dimensions, material, model number) actually appear in the generated text catches hallucinations before the draft goes to review. Simple string match or fuzzy match, depending on how strict you need it.

2 Likes

Hey @nguyenthieutoan, thank you so much for the kind words! One of the biggest advantages of splitting the workflow dials back to what you mentioned. If my friend wants to add new functionality later on, I don’t have to dive back into one huge workflow and modify everything. Instead, I can simply update the relevant subworkflow or add another one without affecting the existing logic. It makes the whole setup much easier to maintain and extend over time.

The post-generation validation step was actually already on my roadmap after the first round of testing. The only reason I haven’t implemented it yet is that I wanted it to validate more than just the key values (which I completely agree is important). I also want it to check whether the generated content matches the desired tone of voice.

The challenge is that my friend is still figuring out what tone of voice he wants for the shop. So we agreed to ship the first version, let him generate content for a while, and then use that experience to define a consistent style. Once that’s clear, I’ll add the validation step to enforce both factual correctness and the desired writing style.

I completely agree with you, though. A validation layer is what separates a reliable production workflow from a generic content generator that occasionally hallucinates.