Been using this one daily for about six weeks. Sharing the full JSON and the architecture reasoning because there’s one decision in here that makes a significant difference and I don’t see it discussed much.
What it does
POST any article URL to a webhook — from a bookmarklet, a browser extension, anything that can make an HTTP request. The workflow fetches the page, strips it to readable content, sends it to AI, and saves a structured note to Notion automatically.
Node chain: Webhook → HTTP Request → Code Node → OpenAI → Notion
The part most people skip
The naive version is HTTP Request → OpenAI → done.
The problem: a typical webpage is 30,000–80,000 tokens of navigation markup, script bundles, and CSS before any real content. You burn API budget feeding noise to the model and the summaries suffer for it.
The Code node sits between the fetch and the AI call and does three things in sequence — strips script tags, strips style tags, then strips all remaining HTML, collapses whitespace, and truncates to 8,000 characters:
const input = $input.first().json;
const html = input.data || input.body || input.html || input;
const text = String(html)
.replace(/<script[\s\S]*?<\/script>/gi, ' ')
.replace(/<style[\s\S]*?<\/style>/gi, ' ')
.replace(/<[^>]+>/g, ' ')
.replace(/\s+/g, ' ')
.trim()
.substring(0, 8000);
return [{ json: { text, url: $('Webhook').item.json.body.url } }];
Drops from ~60K tokens to under 4K. The AI only sees actual content.
The AI prompt
Summarize this article as a research note. Return JSON:
{"title":"","summary":"","key_insights":[],"tags":[]}
Text: {{ $json.text }}
JSON output mode is on in the node settings — no parsing needed downstream.
Notion setup — 6 fields
The node writes: Title (title), Summary (rich text), Key Insights (rich text, bullet formatted), Tags (multi-select), URL (url), Date Saved (date), Source (rich text, hardcoded “Article”).
Create these properties in your database before importing. Share the database with your n8n integration — this is the most common reason it fails on first run.
Two things worth knowing
Add User-Agent: Mozilla/5.0 as a header on the HTTP Request node — some sites return 403 without it.
JavaScript-rendered SPAs return empty content because HTTP Request gets the shell HTML before the page renders. For those, pre-process through a read-later service first.
JSON attached. One thing to update before running: replace the Notion database ID in the Notion node with your own — get it from your database URL.
Happy to answer questions on the Code node logic or the Notion field mapping.
Tags: show-and-tell knowledge-management notion openai webhook code-node productivity