Cost effective method for AI agent to read website content (article)

ezraluandre · July 19, 2025, 3:00am

Hi, I have a question about AI agent to read a content from a news outlet. Let’s say article on CNN outlet on this url: In Hong Kong, domestic workers must walk a precarious tightrope. One stumble can be disastrous.

From my understanding I need to give my AI agent a http request tool node connected to any web scrapping that way the AI agent can read the whole content on that url. The problem is when AI read all those content it categorize all the content as the input token. Is there a cost effective way for AI agent to read the content but not incurring a high cost input token?

jabbson · July 19, 2025, 3:34am

Hey @ezraluandre hope all is well.
My understanding is your question is about feeding the content of the website as HTML, which includes the markdown, instead of feeding only the actual textual content.

Generally speaking if you wish to save some tokens you could mode the portion of your workflow where you get the page either to a separate workflow and call that workflow from a n8n workflow tool, OR you could (if that fits into your flow) move it before the AI agent. Both of these ways allow you to prepend an HTML node and extract the text before either returning to the agent or feeding the agent with the content.

Looking at the page you mentioned if has about 180k chars of html which will be around 50-60k tokens, and could cost from half a buck with gpt 4.1 to 10 cents with gpt 4.1-mini.

With just the text (no markdown) the same page is surprisingly lengthy still, around 40k chars, which will be significantly lower in price to tokenize - 7 cents with 4.1 and 1 cent with mini.

So, this is the path you could take.

ezraluandre · July 19, 2025, 4:12am

Hey @Jabbson thanks for your reply.
What I really want is for my ai agent to understand the content of the website. I tried a simple test by only adding a trigger and ai agent node that connect to gpt-4o or gpt-4o-mini without a tool node. It return that AI chat model doesn’t have capability to read the content of the url. I think gpt-4o only can read website using native chatgpt website and not the API.

Because of that AI agent need to have a tool node (web scraping) to read the website. My understanding is that if I add web scraping and whatever that tool node is return it will be the input token but what I really want isn’t the whole article but for my AI agent to understand the content of the article.

Let’s say for example there’s a new policy such in this article and I want my AI agent to give me the summary or insight about this article. From my understanding I need my AI agent to read this article through web scraping tool node so the AI agent can read the article before doing a summarization or thinking and giving me insight about the article. This is where my confusion stems from, if the AI agent need to read all the content and by doing so all the chars become an input token then it could charge me a lot since I need my AI agent to read a lot of article.

That is why, does giving AI agent an ability to read the page using web scraping tool makes the content to be charged as input token or not? and if the answer is yes is there a way that is cost effective for the AI agent to understand the content of an article?

jabbson · July 19, 2025, 4:29am

Yes, if you wish your Agent to be able to act on the data in the article you will need to feed it to the agent first or make the model discover it (for instance with web_search type response from openai models, which can only be called through API right now), the cost is for you to decide - choosing a smaller model or using a self-hosted model are your options for saving some money - if you want to save on model thinking, you can run the model locally, which most often will produce slower responses .

system · July 26, 2025, 4:30am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.