I’m building a Shopify order status automation that uses an AI voice agent to provide customers with an ETA for their delivery. I’m currently stuck on the web-scraping step where I fetch data from Swiship (Amazon).
The Workflow:
Webhook: Receives a phone number.
Shopify Find Customer: Searches for the customer by phone.
Shopify Get Orders: Retrieves the most recent order and extracts the tracking_url (e.g., Track your package...).
Browserless/HTTP Request: This is where the issue is. I need to scrape the page to find the delivery date.
Code Node: Uses Regex to extract the status and date from the HTML.
The Problem: My HTTP Request node (and previously the Browserless node) loads indefinitely when trying to fetch the tracking page. Even after clicking “Execute Node,” it never finishes, and I eventually have to stop it manually or it times out.
What I’ve Tried:
Switching Nodes: I moved from the Browserless “Smart Scrape” node to a direct HTTP Request node calling the Browserless /content API.
Stealth Mode: I’ve added stealth=true as a query parameter to bypass bot detection.
Wait Conditions: I’ve changed waitUntil from networkidle2 to load and domcontentloaded to speed it up.
Removing Selectors: I removed the waitFor selector (.tracking-status-date) to ensure it wasn’t just stuck waiting for an element that wasn’t loading, but it still hangs.
Timeouts: I’ve added a timeout parameter (20s-30s), but the node just spins until n8n hits its own internal limit.
Question: Has anyone successfully scraped Swiship/Amazon tracking pages recently? Is there a specific “Stealth” header or proxy setting I’m missing to prevent the browser from getting stuck in a “tarpit” or CAPTCHA loop?
From what I understand, an easier way would be to use Amazon’s official API, and since it’s for you, I imagine there’s no cost (but please confirm). You need to access the SP-API through the order tracking endpoint, which will return the ETA directly in JSON, without depending on scraping, without CAPTCHA, without timeout. I recommend replacing the Browserless node with an authenticated HTTP Request calling the SP-API, and the Code Node you already have to extract the status and date continues to work, just changing the structure of the JSON it will receive.
@Akhil_Madhav the Swiship page has anti-bot protection in front of it, so a plain HTTP Request node hangs forever — it gets a challenge page back instead of the tracking HTML and n8n has no way to solve the challenge. Browserless can sometimes get through but it’s hit-or-miss against Amazon’s detection.
building on what tamy mentioned — SP-API is the official path but it requires being a registered Amazon seller AND going through developer registration which can take a few weeks of back-and-forth with Amazon. if ur scraping tracking pages u probably arent on Amazon as a seller, so SP-API isnt actually available.
cleaner approach for ur exact use case is to skip the scrape entirely. Shopify’s order fulfillment object already has the carrier and tracking number in structured form:
if tracking_company is UPS / USPS / FedEx, hit their official tracking APIs directly — all three return ETA in JSON, no scraping needed. for Amazon Logistics shipments (which is what Swiship pages represent), the data isnt cleanly available via Shopify alone and Amazon doesnt expose a public tracking API for non-sellers.
for that last case, the cleanest path is a tracking aggregator API like 17track or AfterShip — they have official feeds for Amazon Logistics + 500+ carriers, return ETA as structured JSON, and run about $0.01-0.05 per lookup. one HTTP request node, no Cloudflare/captcha to fight. since ur using a voice agent for the response, having structured JSON back is way easier to work with than regexing HTML out of a JS-rendered page anyway.