How I built a personal knowledge base that fills itself — webhook + HTTP + Code node + AI + Notion

Been using this one daily for about six weeks. Sharing the full JSON and the architecture reasoning because there’s one decision in here that makes a significant difference and I don’t see it discussed much.


What it does

POST any article URL to a webhook — from a bookmarklet, a browser extension, anything that can make an HTTP request. The workflow fetches the page, strips it to readable content, sends it to AI, and saves a structured note to Notion automatically.

Node chain: Webhook → HTTP Request → Code Node → OpenAI → Notion


The part most people skip

The naive version is HTTP Request → OpenAI → done.

The problem: a typical webpage is 30,000–80,000 tokens of navigation markup, script bundles, and CSS before any real content. You burn API budget feeding noise to the model and the summaries suffer for it.

The Code node sits between the fetch and the AI call and does three things in sequence — strips script tags, strips style tags, then strips all remaining HTML, collapses whitespace, and truncates to 8,000 characters:

const input = $input.first().json;
const html = input.data || input.body || input.html || input;

const text = String(html)
  .replace(/<script[\s\S]*?<\/script>/gi, ' ')
  .replace(/<style[\s\S]*?<\/style>/gi, ' ')
  .replace(/<[^>]+>/g, ' ')
  .replace(/\s+/g, ' ')
  .trim()
  .substring(0, 8000);

return [{ json: { text, url: $('Webhook').item.json.body.url } }];

Drops from ~60K tokens to under 4K. The AI only sees actual content.


The AI prompt

Summarize this article as a research note. Return JSON:
{"title":"","summary":"","key_insights":[],"tags":[]}

Text: {{ $json.text }}

JSON output mode is on in the node settings — no parsing needed downstream.


Notion setup — 6 fields

The node writes: Title (title), Summary (rich text), Key Insights (rich text, bullet formatted), Tags (multi-select), URL (url), Date Saved (date), Source (rich text, hardcoded “Article”).

Create these properties in your database before importing. Share the database with your n8n integration — this is the most common reason it fails on first run.


Two things worth knowing

Add User-Agent: Mozilla/5.0 as a header on the HTTP Request node — some sites return 403 without it.

JavaScript-rendered SPAs return empty content because HTTP Request gets the shell HTML before the page renders. For those, pre-process through a read-later service first.


JSON attached. One thing to update before running: replace the Notion database ID in the Notion node with your own — get it from your database URL.

Happy to answer questions on the Code node logic or the Notion field mapping.


Tags: show-and-tell knowledge-management notion openai webhook code-node productivity

1 Like

Welcome @udayshankar to our community! I’m Jay and I am a n8n verified creator.

The section on HTML stripping before sending to the model is exactly the kind of practical detail most workflow tutorials skip. Going from 60K tokens of nav markup to under 4K is a big deal for both accuracy and cost. Six weeks of daily use is also a real signal that the architecture actually holds up - good share.