Seeking Assistance with Data Crawling from Websites

Hello everyone,

I hope you are all doing well. I’m currently working on a project that requires collecting procurement and bidding information from the following websites:

I’m looking for a reliable and efficient way to crawl or scrape data from these two sources. The data includes (but is not limited to) procurement notices, bidding documents, contractor information, and any other relevant public records. Once collected, this data will be used for research and analysis purposes (e.g., understanding market trends, evaluating bidding activities, etc.).

If anyone has experience working with data extraction from these websites, please share your expertise or any available tools. I would really appreciate insights on best practices, potential pitfalls, or sample code that could point me in the right direction.

Thank you in advance for your time and assistance, and I look forward to any advice you can provide!

Not an expert of scraping, only trying to help.

You need to analyze http request while opening the page. On chrome opening developer tools (F12 keyboard key). On network tab you have all requests, some time there is a simple GET request with some parameters.

this is an example:
https://cdn.dauthau.asia/datafomo/arr_data_2.json?t=2025041018
Here you have directly a json. In some cases it could be what you need.

Another analysis is to inspect html output to identify where is the content you need: Example the content table is on this path
#siteContent > div.col-main.col-xs-24.col-sm-16.col-md-18.col-lg-18 > div > table

Those two things are where I personally may start.
Adding an httprequest node and try to decode json (example 1) or httprequest + html extract node to point on .bidding-table selector

Some webpages are js rendered so you need different approach. Some webpages has captcha so it is more difficult do do the job. Some webpage has no recurring and extabilished strucure so… again it’s a pain.