Http node to get variable from website

Hello all, i think my https node issue is to basic that I cant find a solution in the forum, so please bare with me:

  • Goal: get the email of a company from HK’s finance watchdog’s database

Sorry for the noob question, but would appreciate any pointers!

Hi @fengelh, Welcome!

You can use a regular expression to extract the value from var emailData =

If you ask your preferred AI to write that expression and then use it in a Set node, you will get the emails,

Here is an example:

3 Likes

Thanks Mohamed and thanks for the warm welcome! I am not trying to be difficult, but is there a way to extract the emails in the http node already? Because I am doing this up to 40k times across companies and individuals - so if I go with your proposed solution, the process will take exponentionally longer as its extracting the entire html via the http node and then doing a match accross the entire html. (again, sorry, not trying to be ungrateful, but hoping for a more efficient solutin - if that exists)

1 Like

Thanks, no problem!

I actually inspected the website’s network traffic before replying to see if there was a hidden API or a background XHR request loading the emails but unfortunately I didn’t find any,

It appears that the emails are hardcoded directly into the HTML source, so you’ll need to make an HTTP request to get the HTML and then scrape the data you need,

Regarding performance: I don’t think extracting text from HTML will make it slower,
It’s just about 1 ms:
image
So with 40k calls, that would be 40 seconds..

1 Like

Heya! Thanks for looking under the hut of the network movements beforehand, and the great explanation! Will go with that solution that you kindly provided! Thanks a lot again!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.