How to format web scraped data to google sheets?

Super new to n8n and I was hoping I could get some eyes on this workflow I’ve been working on. The site that I’m scraping has a page with a list of 212 programs and their program links. Clicking on each program link takes you to a program information page with data that I’d like to extract. For each link on the page I would like the workflow to:

  1. Go to that program’s page
  2. Extract the HTML of the program’s information
  3. Create a sheet for the program using the headers of the program’s page as column names and the text under each header as the value for the column.

The below workflow does mostly that, but I’m looking to improve upon it and get help on a few areas:

  1. Right now I limited my last HTTP Request note to execute once. But what the best way to set it up to handle iterating over 212 program links?
  2. Am I using the Set node correctly? Because of the way the data is formatted, I ended splitting it to get the headings that will become the column names, and their values and then merging them.
  3. The last step is to write the information to a GoogleSheet. The help I need here is that I’d like the headings to be the column names instead of rows.

I appreciate any help or improvements from the community!

Hi @kenepas, I looked at your workflow but I am not sure I fully understand the problem. So let’s take this step by step.

  1. Right now I limited my last HTTP Request note to execute once. But what the best way to set it up to handle iterating over 212 program links?

The node would do this by default if you don’t limit it. Like most n8n nodes, the HTTP Request node would run once for each incoming item. So if you pass on 212 items to the node, it would make 212 separate requests.

  1. Am I using the Set node correctly? Because of the way the data is formatted, I ended splitting it to get the headings that will become the column names, and their values and then merging them.

So n8n wouldn’t limit how exactly you use a node. Meaning if this structure works for you that’s fine, no need to change anything. But if you’re asking for opinions tbh I would have expected the Set node to be used instead of the Function node to create the full URL :slight_smile:

  1. The last step is to write the information to a GoogleSheet. The help I need here is that I’d like the headings to be the column names instead of rows.

This is the part I don’t understand unfortunately. So the data structure returned by your Set node isn’t what you would like to see in your Google Sheet? Can you confirm how you want your sheet to like instead?

1 Like