Using HTTP Request tool for AI agents to scrape websites

ptford · September 14, 2024, 2:18am

Describe the problem/error/question

I tried searching, but could only find instances using APIs. Im trying to use the HTTP request tool with AI agents to simply GET (i.e. visit) a website and read data from it. I tried using a normal HTTP Request node in the workflow to just extract the HTML, but in most cases the HTML contains too many tokens for the AI to process, so I’m trying to see if a tool can do this easier.

What is the error message (if any)?

When using the HTTP tool, I only ever get a “Tool response was not in the expected format” error message.

Please share your workflow

Information on your n8n setup

n8n version: 1.56.2
Database (default: SQLite): SQLite
n8n EXECUTIONS_PROCESS setting (default: own, main): main
Running n8n via (Docker, npm, n8n cloud, desktop app): Docker Compose
Operating system: Ubuntu

KHarv · September 14, 2024, 2:41pm

Hello and welcome!
It’s standard practice in scraping to preprocess HTML before sending it into the AI. I would suggest using Code node to parse your input and extract what you want. But beyond that, scraping is a very deep subject and there are various ways to go about it, so it’s hard to say more without knowing the exact application.

ptford · September 15, 2024, 12:37am

Got it, yes I suppose more context would definitely be helpful. I want to be able to send the AI a link as part of a prompt and have it visit the link and return back information about the company on that website. This is easily done in ChatGPT UI, but the OpenAI API does not provide a way to let the AI access the internet so I need to provide a tool for it to browse the web. Scraping is probably not the correct word I should use here. I suppose it could also take a screenshot of the link and do some image recognition to gather the info, although that would not include scrolling the webpage.

system · December 14, 2024, 12:38am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.