Working on a recursive crawler. It’s working correctly now, but with medium-big ecom sites it can freeze.
Help me identify optimization points.
I understand if I move parts of the workflow into subroutines, then I can free up memory. However with around 100k items in the flow, it can freeze at some point in the precess. So like with the test site - warhammer.com - I get 50 XMLs and around 35-37 it freezes.
I tried to build in a batcher expecting the objects are being released at the end of a cycle and this way I can reduce the size of the workflow if I control the incoming data in the task loop. However this still clogs around 100k.
Is this an iron issue? I’m on a small size digital ocean droplet. Or is this a frontend issue where my mediocre laptop runs out of recources? I think no tho, even if the ‘test workflow’ approach uses local resources.
This is a very interesting case.
Would you mind explaining your chain-of-thought to come up with this solution?
I mean… What made you think that this would be a better optimized logic?
I’m eager to understand how you made this work
EDIT: Oh! I actually see it now. It’s almost like you divided the required memory in half. Because you were waiting until everything finished for each item. And now you process items in parallel. Nice!
Ye, I could have been a bit more precise and highlight the sub-workflow better, but all I just moved the active task complitely into a workflow.
An interesting lesson is I was unable to make it work when there were 2 layers of sub-workflows. So when I left the sub-workflow of database insert inside the sub-workflow of the task itself.
But simply moving it into one layer, and boom. It’s fine.
This way with the batcher I can control pretty well the object count and with the task itself the majority of the runtime objects are being deleted at the end of the sub-workflow.