Web spider / crawler to create JSONL for openAI's Fine Tuning?

Hi all,

I want to create a web crawler, to scan over part of our website, to then feed that data to chatgpt, to ask it to create jsonl out of each page.

I am a little stumped as to where to start. I wonder if there’s a guide on spiders or similar I can use?

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Hey @privateuserguy,

We don’t have a guide for that, I think a good starting point could be to load your sitemap xml if you have one and use each item in an http request node to get the page content then pass it to OpenAI to process.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.