HTTP / HTML extract loop problem

martin_sabo · January 12, 2022, 12:08pm

Hello community.
Im doing parser with loop.
The workflow is like this:

HTTP Request - starting URL domain.com/startUrl
HTML Extract - in this step im extracting URL of json file (/something/ID/something.json)
— (something.json has dozens of pages, on each page/json file are 0-5 links to profiles (some profiles are private so doesnt have links to profile) - links i need to save into google sheet)
so next step is SET where im adding domain before, and paging after - https://domain.com{{$json[“data”]}}?page=
next is HTTP Request where URL is {{$node[“Set”].json[“review.json”]}}{{$runIndex+1}} - runIndex starting from 0 and =page 0 and 1 are the same, so +1 is ok for me.
next is HTML Extract where im extracting 1)links to profiles, 2)name of profile (this will be used in next step)
next is IF - checking if json?page=1 has “name of profile” - if doesnt - workflow is stoping, if have “name of profile” it goes to next
next IF - checking if exist “links to profile” - if no - return again to last HTTP Request - if it has “link to profile” it continue to
FUNCTION where links are splitting into one url per line in json (links are without domain like /profileURL)
so in next step SET - adding domain before link
next writing links to Google Sheet. This first run goes ok.

As I mentioned - there are many json pages (.json?page=1, .json?page=2, etc, so im returning to

HTTP Request again, to fetch next page {{$node[“Set”].json[“review.json”]}}{{$runIndex+1}} now sould return page=2, but here is an error and it looks like it forgot start URL

Error when starting LOOP is:

Workflow:

What I need is to loop json pages till there will be no names and links in json file. At this moment this one has cca 240pages, but (half empty) json is generating even when trying page=999.

In close future i will need a hint how to read Google Sheet with many urls - one url per one workflow execution, then read next url. And how in this case about “runIndex”? how can i reset it? (on next ID/something.json file? becasue on next url i need to start it from zero again, and I plan to make this as loop too.

Information on your n8n setup

**n8n version: 0.158.0
**Database you’re using (default: SQLite): PS
Running n8n with the execution process [own(default), main]:
**Running n8n via [Docker, npm, n8n.cloud, desktop app]: docker

MutedJam · January 13, 2022, 8:45am

Hi @martin_sabo, welcome to the community

It looks like in your node HTTP Request8 you are using an expression like {{$node["Set"].json["review.json"]}}{{$runIndex+1}}.

Based on your description it sounds to me like this is what is happening: When this node processes its first item, it would read the review.json property of the first item of your Set node. Once HTTP Request8 processes its second item, it would try reading the second item of your Set node which doesn’t exist.

To always read the first item of your Set node, you can put $item(0) in front of your existing expression as described here.

Hope this helps

For your second question:

a hint how to read Google Sheet with many urls - one url per one workflow execution, then read next url. And how in this case about “runIndex”? how can i reset it? (on next ID/something.json file? becasue on next url i need to start it from zero again, and I plan to make this as loop too.

May I ask why you want to execute a workflow many times? This seems like it complicates things a lot. That’s because almost all n8n nodes would run once for each item they receive. So when reading a Google Sheet like this:

n8n would already make three individual HTTP requests using a simple workflow like this:

Example Workflow

So purely based on the description it sounds to me like you wouldn’t need loops (and the headache that sometimes comes with them).

martin_sabo · January 13, 2022, 9:17pm

Thank you for quick answer. After some testing i will choose a way with cron probably.

MutedJam · January 14, 2022, 8:24am

Hope my answer helped, enjoy your automation journey!