Is it possible to scrape the sources from the responses to specific prompts launched in ChatGPT?

V2_Child Workflow.json (39.5 KB)
I’d be grateful for some help on how to scrape the sources cited by ChatGPT in its response to specfic prompts.
Expected behaviour:

  1. Parent workflow sends a prompt to the Child workflow.

  2. Child Workflow launches the prompt in ChatGPT,

  3. In the Child, I’m using an HTTP Request (“Fetch UI Sources“) which uses Playwright as a browser automation worker which:

  4. Opens ChatGPT in a logged-in browser session, submits the prompt & waits until the response finishes rendering

  5. Clicks the Sources button under the answer

  6. Extracts every visible source card, namely: URL & title.

    The HTTP Request is failing to achive its objective. Has anyone achieved the same in a different way?

    Any help would be grand. Thanks!
    Adam x

Thanks for your help with this Benjamin. I’ve tried the OpenAI’s Responses API with the web_search_preview tool (see Ask a Model node attached), which has proved very unreliable: it outputted 4 sources which didn’t appear among the nine ones which surfaced in a manual search in ChatGPT.

If I’ missing smth please feel free to suggest a fix.

Re Perplexity I hadn’t considered it as an alternative and I’ll certainly look into it. Thank you!

Child Workflow.json (36.3 KB)

Hi @Adam1

Thanks for sharing this, I think I understand the goal now.

If the target is the ChatGPT web UI, then Selenium - Playwright can work, but only as a temporary workaround.

In my opinion: it is not a strong long term business approach, because it depends on a UI that changes often, login/session stability, and timing behavior that is hard to keep reliable.

For a production workflow, I would treat UI scraping as the last option, not the main one.

A more stable path is to use an API that returns citations directly, and keep browser automation only for special cases where you really need the exact ChatGPT UI output.

So , I would not build the business around that as the core solution.

2 Likes

Thank you for the advice. I feared this wouldn’t be possible to scrape. I appreciate your input here. Thanks again.