Scrape webpage that runs Javascript

Describe the problem/error/question

I am trying to scrape a webpage containing a list of events and their corresponding locations, filter for a specific location, and then notify users that employee parking may be unavailable on those dates due to the event.

The webpage in question is: https://www.vancouverconventioncentre.com/events.
Using the CSS Selector, I am able to extract the dates (div.event-date) and event names (div.event-details h2) to an array.
However, the event’s location appears to require a calcBuilding function to generate the location, so while showing the page source in a browser displays an event entry like:

<div class="event-container">
   <div class="event-ctn">
      <div class="event-date">
         <div class="day flex flex-col items-center relative">
            <span>20</span>
            <span class="month text-sm absolute uppercase -bottom-2">Mar</span>
         </div>
      <div> › </div>
      <div class="day flex flex-col items-center relative">
         <span>24</span>
         <span class="month text-sm absolute uppercase -bottom-2">Mar</span>
      </div>
   </div>
   <div class="event-details">
      <h2 class="font-helvetica text-lg font-[600]">Vancouver International Auto Show</h2>
         <p class="event-location">
            <span class="relative">
               <svg class="fill-current h-4" xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 0 24 24" width="24px" fill="#000000">
                  <path d="M0 0h24v24H0z" fill="none"></path>
                  <path d="M12 2C8.13 2 5 5.13 5 9c0 5.25 7 13 7 13s7-7.75 7-13c0-3.87-3.13-7-7-7zm0 9.5c-1.38 0-2.5-1.12-2.5-2.5s1.12-2.5 2.5-2.5 2.5 1.12 2.5 2.5-1.12 2.5-2.5 2.5z"></path>
               </svg>
            </span>
            <span>West Building</span>
         </p>
      </div>
   </div>
</div>

performing an HTTP GET request on the URL returns a p.event-location HTML of:

<span class="relative">
   <svg class="fill-current h-4 " xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 0 24 24" width="24px" fill="#000000">
      <path d="M0 0h24v24H0z" fill="none"></path><path d="M12 2C8.13 2 5 5.13 5 9c0 5.25 7 13 7 13s7-7.75 7-13c0-3.87-3.13-7-7-7zm0 9.5c-1.38 0-2.5-1.12-2.5-2.5s1.12-2.5 2.5-2.5 2.5 1.12 2.5 2.5-1.12 2.5-2.5 2.5z"></path>
   </svg>
</span>
<span v-html="data.calcBuilding([{"360_view_embed":null,"amp_url":null,"api_url":null,"blueprint":{"title":"Spaces","handle":"spaces"},"building":{"value":"west","label":"West Building","key":"west"},"capacities":null,"capacities_adendum":false,"ceiling_height":null,"collection":{"title":"Spaces","handle":"spaces"},"combination_short_name":null,"combined_space":false,"combined_spaces":{},"date":"2023-05-08T19:43:49.000000Z","dimensions":null,"edit_url":"https:\/\/www.vancouverconventioncentre.com\/cp\/collections\/spaces\/entries\/3b96c79f-c48e-4384-9f14-80bd303067f8","gallery_frames":[],"id":"3b96c79f-c48e-4384-9f14-80bd303067f8","is_entry":true,"last_modified":"2023-05-08T19:43:49.000000Z","locale":"default","map_link":null,"mount":null,"order":null,"origin_id":null,"permalink":null,"private":false,"published":true,"sample_floor_plan":null,"slug":"west-building","status":"published","title":"West Building","type":{"value":null,"label":null,"key":null},"updated_at":"2023-05-08T19:43:49.000000Z","updated_by":{"api_url":null,"avatar":null,"edit_url":"https:\/\/www.vancouverconventioncentre.com\/cp\/users\/50caed37-66f7-48e7-8592-b7cf63dde159\/edit","email":"[email protected]","groups":[],"id":"50caed37-66f7-48e7-8592-b7cf63dde159","initials":"WA","is_admin":false,"is_publisher":false,"is_user":true,"is_writer":false,"last_login":"2024-02-02T19:58:59.000000Z","name":"Will Aesoph","preferred_locale":"en","roles":[],"super":true,"title":"[email protected]"},"uri":null,"url":null,"usable_area":null}])">
</span>

What is the error message (if any)?

As a result of the event-location not being calculated yet during the scrape, the TEXT results in being [empty].

Please share your workflow

Is there any way to allow any Javascript functions to run, so that I can scrape the page as presented in a browser? Or would I have to look into a headless browser solution to load the page and the scrape that result?

Information on your n8n setup

  • n8n version: 1.31.2
  • Running n8n via: docker-compose
  • Operating system: Ubuntu Server 22.04.4 LTS

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Since you’re self hosted you can try the puppeteer community node.

I haven’t used it myself yet but looks promising.

It uses the puppeteer JavaScript library which renders JavaScript for web scraping

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.