Three production n8n workflows for live AI demos - email, meeting, and agentic research (importable JSON)

Trippyengineer · March 23, 2026, 6:32am

Sharing three workflows I built for a live AI workshop. They fire
automatically during a presentation at configured slide numbers,
but they work perfectly as standalone pipelines too.

Email Pipeline
Classifies incoming emails via Claude, drafts a reply, routes
escalations to the right person, logs everything to Sheets.

Key nodes: Gmail Trigger → Claude (Haiku) → Code (parse JSON) →
IF (escalation check) → Gmail → Google Sheets

Meeting Pipeline
Takes any meeting transcript, extracts action items + decisions +
biggest risk, builds a follow-up email, sends to all attendees
pulled from a Sheets roster.

Key nodes: Form Trigger + Webhook → Claude → Parse →
Get Attendees (Sheets) → Gmail + Slack + Sheets log

Evidence Intelligence Engine
This one’s more involved. Research question goes in, a structured
evidence brief comes out.

Claude first decomposes the question into a search plan.
Perplexity runs web + academic search in parallel. Claude evaluates
evidence quality — if insufficient, it refines the queries and runs
again (max 2 iterations). Final synthesis written to a Google Doc
and posted to Slack.

Key nodes: Form Trigger → Claude (Opus) → Perplexity ×2 →
Merge → Claude (evaluator) → IF (quality gate) → Claude (writer) →
Google Docs → Slack + Sheets

All three JSON files are in the repo under /n8n.
Import → reconnect credentials (Anthropic, Gmail, Sheets,
Slack, Perplexity) → toggle Active → done.

Full repo (includes the Python orchestrator that triggers these
during a live presentation):

Let me know if you have questions on the Evidence Engine loop —
the quality gate logic took a few iterations to get right.

Trippyengineer · March 24, 2026, 3:39am

Great question — and honestly the part of the system I Iterated on the most. The evaluator is entirely prompt-based, no structured
scoring. Claude receives the aggregated search results from both Perplexity endpoints and evaluates against three things: source diversity (are we pulling from more than one type of evidence), coverage (does this
actually address the sub-queries that were decomposed, not just the surface question), and confidence (are the sources making claims or hedging everything).

It returns a structured JSON — sufficient: true/false
plus a brief reason string. The reason string is what
drives the query rewrite if it comes back false. Rather than rewriting blindly, Claude uses its own diagnosis of what’s missing to generate tighter sub-queries the second time.

On the 2-iteration cap — it hits it less often than
you’d expect. Maybe 1 in 5 runs. Most research
questions either get sufficient evidence on the first
pass or the second rewrite pulls enough to cross the threshold. The forced synthesis at cap 2 is mostly a safety valve for genuinely underserved topics where the evidence ceiling is a domain problem, not a query problem. In those cases the brief still gets written but Claude flags the evidence gap explicitly in the output.

The honest limitation: “sufficient” is still
Claude’s judgment, not a calibrated metric. I’ve
thought about adding a citation count floor or
a source credibility score as a secondary gate
but haven’t needed to yet. Would be the first
thing I’d add if this was running on higher-stakes
research questions.