HTTP Extract - Dynamic Content

whx · March 15, 2025, 3:27pm

I am looking to automate data collection. Presently, I do it manually. I access the website, and I follow each link to download the PDFs.

Here is the website: Enforcement action | ICO

I can make the request to the website using HTTP Request; and I can use HTTP Extract to take data out. The issue is the website is dynamic. After making the HTTP Request, it doesn’t download the website, after all content has been loaded. Instead, it only loads the template without dynamic content.

I want to…

Pass the website to HTTP Request
Use HTML Extract to access each entry in the list
Download PDF from each list item

Can it be done?

Franz · March 15, 2025, 4:15pm

The content is retrieved using an API call with JavaScript. You can see this through the Developer Tools in browsers like Chrome. This request can be copied (using Copy as cURL) and then added as an HTTP Request Node in your workflow.
After that, just perform a Split Out, and you’re done.

However, here’s a friendly reminder: Using the likely undocumented API too excessively can quickly result in your IP address being blocked.

whx · March 17, 2025, 8:21pm

Thank you, @Franz ! I learned a lot from this solution. It did help! I am now trying to add a loop to go over all pages

system · March 24, 2025, 8:21pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.