HTTP Request node hangs indefinitely during web scraping – timeout not applied

Describe the problem/error/question

Hello everyone,

I’m having an issue with the “HTTP Request” node for web scraping. I’m running n8n on a server with the latest self-hosted version. After around 15 executions, the node starts running indefinitely. The timeout doesn’t seem to be applied, and as a result, it blocks the proper execution of the rest of my workflow. I’ve tried several solutions, but none of them worked.

Thanks in advance for any help or suggestions!

Please share your workflow

Share the output returned by the last node

absolute
Talented People Group, le Recruteur Expert - L'expert en recrutement
https://www.talentedpeoplegroup.com//creditjob.fr
https://www.talentedpeoplegroup.com//businesspeople.fr
https://www.talentedpeoplegroup.com//financepeople.fr
https://www.talentedpeoplegroup.com//rhpeople.fr
https://talentedpeoplegroup.com/wp-content/uploads/2023/09/terrasse.png
https://talentedpeoplegroup.com/wp-content/uploads/2023/09/tape.png
https://talentedpeoplegroup.com/wp-content/uploads/2023/09/nathan-mathilde.png
https://talentedpeoplegroup.com/wp-content/uploads/2023/09/maurine.png
https://talentedpeoplegroup.com/wp-content/uploads/2023/09/jonathan.png
https://talentedpeoplegroup.com/wp-content/uploads/2023/09/gabrielle.png
https://talentedpeoplegroup.com/wp-content/uploads/2022/08/reunion-terrasse.jpg
https://talentedpeoplegroup.com/wp-content/uploads/2022/08/salle-1.jpg
https://talentedpeoplegroup.com/wp-content/uploads/2022/08/reunion-interieur-e1661526664394.jpg
https://talentedpeoplegroup.com/wp-content/uploads/2022/08/collation-e1661526175137.jpg

Information on your n8n setup

  • n8n version: last
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • **Running n8n via (Docker, npm, n8n cloud, desktop app):**docker
  • **Operating system:**linux

Hey @senda hope all is good.

Which URL does the workflow try to process when it hangs?

hey @jabbson,
this url: “Talented People Group, le Recruteur Expert - L'expert en recrutement

Doesn’t look like it does in this short test:

Does it work for you?

Not individually, everything works fine. The problem occurs after about 15 loops, and then it runs indefinitely. and it happens with many different URLs; this is just the last example where it crashed

Can you provide the list of pages to run sequentially to trigger the issue?
Or is it the list you mentioned earlier? If so, the list has bad urls and images, is that expected?

The list that triggered it contains between 250 and 300 links, and it’s not always the same one that causes the error.

Does this mean that you have many links with images? Do you need to download all these images?

What I would try:

  • [more probable] exclude binaries from the list (like links with images)
  • [less probable] try to space out the requests in time, to make sure you are not hitting some sort of rate limiting.

I just tested my workflow with only the bad URLs and only the image URLs, and it worked perfectly fine.
Could the problem be related to cache or memory?

I assume the issue is due to the system getting overwhelmed by the images stored in memory.

If me answer helped you, kindly consider marking it as Solution. Thank you.

Cheers!

how can i delete the memory?

Depending on your circumstance and the solution you need, you might need to rethink this. It seems the problem you’re faced with is literally overloaded memory. Just like RAM in a computer system, or short term memory in our human brains, we need to intelligently re-sort, organize and prioritize all the bits for longer-term usage. (Memory Manager Node)

You might consider adding a knowledge base/RAG tool and iteratively store the data you need to the knowledge base to reduce the burden on the memory.

EP

You don’t need to, most temporary things in memory are auto-cleaned, when not needed anymore.

Thanks! I’ll check that!

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.