My team and I created a workflow to scrape webpage data using the ScrapingBee API key. The issue we’re facing is that the website we’re scraping uses a bot detection mechanism. To bypass this, we use cookies.
Here’s the problem: for the first URL, we provide our own cookie, and the scraping is successful. In the output, we receive a cookie from ScrapingBee. For the next URL, the workflow should use the cookie from the previous output instead of our original cookie.
In simple terms:
Get a cookie from the browser for the first request.
Use the cookie returned from the first request in your second request.
Use the cookie from the second request in the third one, and so on.
Hey @Prem7 , try adding an additional “hanging” node after HTTP Request node (say, coled “Cookie”) to collect the returned cookie to be used in the next itration. Then if it is not the first run you can reference that cookie with an expression like this {{ $runIndex ? $('Cookie').first(0, $runIndex - 1).json.cookie : 'INITIAL_COOKIE' }}.
Here’s just a visual to demostrate the idea. You can run to observe how session cookie gets updated taking value from the “Cookie” node stored in the previous iteration.
from the output the cookies appears at different index each time so i used the name of the cookie to find it each time irrespective of its position but its returning as null/undefined what to do here?
Referenced node is unexecuted
An expression references the node ‘Scrapingbee Cookie’, but it hasn’t been executed yet. Either change the expression, or re-wire your workflow to make sure that node executes first.
The solution I offered was due to your statement “for the first URL, we provide our own cookie”. That is what “INITIAL COOKIE” is for. Your screenshot does not correspond to my solution with “hanging” node. For some reason you looped it back to the Set node. Moreover, it is significant for that hanging node to be the top branch.