HTTP Extract - Dynamic Content

I am looking to automate data collection. Presently, I do it manually. I access the website, and I follow each link to download the PDFs.

Here is the website: Enforcement action | ICO

I can make the request to the website using HTTP Request; and I can use HTTP Extract to take data out. The issue is the website is dynamic. After making the HTTP Request, it doesn’t download the website, after all content has been loaded. Instead, it only loads the template without dynamic content.

I want to…

  • Pass the website to HTTP Request
  • Use HTML Extract to access each entry in the list
  • Download PDF from each list item

Can it be done?

The content is retrieved using an API call with JavaScript. You can see this through the Developer Tools in browsers like Chrome. This request can be copied (using Copy as cURL) and then added as an HTTP Request Node in your workflow.
After that, just perform a Split Out, and you’re done.

However, here’s a friendly reminder: Using the likely undocumented API too excessively can quickly result in your IP address being blocked.

2 Likes

Thank you, @Franz ! I learned a lot from this solution. It did help! I am now trying to add a loop to go over all pages :smiley:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.