Problem Ensuring Continuous 'Split in Batch' Execution for Full Baserow Data Processing

Primus · February 23, 2024, 12:05pm

I am using a workflow in n8n that involves several nodes for processing a large dataset. Initially, data from a Baserow node is transferred to a ‘Split in Batches’ node, which is designed to handle the data in chunks of 100 items. This batched data is then sent to an OpenAI node for processing, and subsequently, the results are passed to a ‘Baserow1’ node, which updates the original data. My challenge is with the workflow’s continuity: I have approximately 100,000 rows to process, and while the first batch of 100 items is processed successfully, the workflow does not automatically continue to process the next batch of 100 items. Could you advise on how to modify the workflow so that it automatically processes the next batch of 100 items after completing a batch, thereby ensuring continuous processing of all 100,000 rows?

Please share your workflow

Information on your n8n setup

n8n version: Version 1.29.1
Database (default: SQLite): Default
n8n EXECUTIONS_PROCESS setting (default: own, main): main
Running n8n via (Docker, npm, n8n cloud, desktop app): cloud
Operating system: Windows

Jon · February 23, 2024, 12:28pm

Hey @Primus,

The issue here is on the Baserow node you are telling telling it to only get 100 items you would need to tell it to get everything then use the split in batches / loop node which wil take care of the chunking.

If you only wanted to pull 100 items from Baserow at a time it looks like we don’t have a way in the node to skip the first X results so you would need to use an http request node to get the data from the API then manually build out a loop to work with that as well.

We should probably add an option to the nodes to allow for manual looping of larger data sets.

Primus · February 23, 2024, 1:11pm

I tried getting it to get the whole dataset but ran into the error “Problem in node ‘Baserow‘. There might not be enough memory to finish the execution.” Even when I filtered the Baserow node to output only the required field, I still ran into the same error.

Let me get more coffee, and dive into creating an http request loop. Phew!

Primus · February 23, 2024, 1:21pm

Hey @Jon, what plan do I have to be on to get enough memory to execute about 100k rows from Baserow and then use the split in batches node to loop it 100 at a time into the OpenAI module and then update the Baserow table again? The Enterprise plan?

Primus · February 23, 2024, 8:22pm

@Jon

Can you help me with the http request node please?

I connected to Baserow using Basic Auth, and URL “https://api.baserow.io”. However, I can’t quite get the Query Parameter to pass right

{   "parameters": {     "url": "https://api.baserow.io/api/database/rows/table/258869/",     "query": {       "page": "{{$json[\"currentPage\"]}}",       "size": 100     }   }   }

Here’s the workflow

Thank you.

Jon · February 26, 2024, 11:49am

Hey @Primus,

The plan for 100k rows depends on how much data is actually there, 100k rows of 1 field is going to need to a lot less than 100k rows with 50 fields.

I am not sure where you are getting those query parameters from but they don’t seem to match with the Baserow API docs but I would probably use something like the below which will get the data in batches of 100 from Baserow.

To improve this further I would probably put the nodes in the loop in a subworkflow then return just the nextURL which can be used on the next loop so that the memory is freed up as it goes.

Primus · February 26, 2024, 1:01pm

Thank you @Jon .

As a workaround for now, I’ve split the database into ten tables while I process the data. When it’s time to push the data out to WordPress, I’ll use this workflow you’ve created for me.

Once again, thank you!

system · May 26, 2024, 1:01pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.