Combining json from multiple runs into one item

bdinsmore · September 8, 2023, 9:21pm

Describe the problem/error/question

Attempting to resolve a problem similar to this How to combine different items from different runs into one run?

I have a paginated response (600 items per run) that loops currently till it hits the end at 26 runs before no more items are returned.

I’ve got the downstream pipeline working to format things correctly for eventual insertion/update of mysql records.

However to do proper data comparison I need to compare both datasets in their entirety, not in batches.

What is the error message (if any)?

No error message just not getting correct data comparison due to incomplete inputs on one side.

Share the output returned by the last node

So ideally some kind of code to take all the items of all those runs and combine to one large json with all the items.

Information on your n8n setup

n8n version: 1.5.1
Database (default: SQLite): sqlite
n8n EXECUTIONS_PROCESS setting (default: own, main): own
Running n8n via (Docker, npm, n8n cloud, desktop app): docker
Operating system: ubuntu 22.04

russellkg · September 10, 2023, 8:32pm

I’m chiming in here as this would be a fantastic thing. Essentially, it is a node that takes the JSON from all the runs the previous node ran, concatenating it into one list. It then hands that as one run to the next node.

It sounds memory expensive, but it would solve at least one issue I’m having right now. (In another post.)

-Russ

bdinsmore · September 11, 2023, 1:40pm

It’s either that or see if I can get a similar output to csv. This seems like a really frustrating oversight so far.

The “compare dataset” node doesn’t seem to take multiple runs into account if one input is the full deal and the other is batched output like what I have above. It compares the first run correctly, but every other run I’d have to re-import the second input to get the correct comparison.

MutedJam · September 11, 2023, 2:40pm

Hi @bdinsmore & @russellkg, check out this example workflow: It uses code, but should pretty much do what you have in mind:

Just make sure to replace NoOp in the Code node after the loop is finished with the name of the node producing your items over multiple runs.

bdinsmore · September 11, 2023, 2:53pm

Does not work if you’re not using the split into batches node.

I have batches/runs from a web call. SO it doesn’t apply. I already tried this one.

MutedJam · September 11, 2023, 2:57pm

This should still work @bdinsmore, you’d just need to update your IF node accordingly so it only continues with the Code node once you’re done looping.

bdinsmore · September 11, 2023, 3:36pm

I’ve tried this and done exactly what you describe.

From the documentation here Built in methods and variables reference | n8n Docs

russellkg · September 11, 2023, 6:50pm

I have it working with a couple of nested workflows. I have a master workflow that kicks everything off. It currently looks like this:

The two workflows it calls make API calls to the source for my data, massage it and send it back to the Master.

Get All Hudu Devices

Get all Mosyle Devices

I did it this way so I have all the data. I can then compare, contrast, and resolve between the systems. This should allow me to limit the number of API calls.

So far the data looks good.

-Russ

bdinsmore · September 11, 2023, 8:01pm

So passing 13k (in 23 runs) items to the execute node is going to run the workflow 13k (or even 23) times isn’t it? Rather than insert all 13k items as 1 run?

russellkg · September 11, 2023, 10:05pm

Not sure I follow, but that’s not unusual.

The Master Node runs, then it calls each of the worker nodes.

I can see if the systems exist, and if it needs updating. The worker workflows are only pull the data and prepare it for returning.

I may have other workflows that can executed to post data back to the source services.

Right now I am attempting to figure out how to search a list from one platform for a serial number from the other. I have both lists in the Master workflow, one is 91 items and the other is 156 items long.

-Russ

MutedJam · September 12, 2023, 6:19am

Hi @bdinsmore, this variable will only work with the Split in Batches node, but the Code merging your runs will work just fine without this variable.

You’d only need to adjust your IF node to match your specific looping logic (for example by checking if you have reached the last page of your paginated data, or by checking if the response from the HTTP Request node contains no more results).

Afterwards, merging the different runs should work as expected and would leave you with 1 run and 13,800 items after the Code node (instead of 23 runs with 600 items each) in your example.

Hope this makes sense!

MutedJam · September 12, 2023, 6:27am

Hi @russellkg, purely based on the description I suspect you could use the Merge node in “Combine” mode for this. The exact usage will of course depend on your data structure, but here’s a quick example:

This example will produce two dummy items on the Hudu branch and four dummy items on the Mosyle branch. It’ll then Merge the two Hudu items with the corresponding Mosyle items, while leaving the remaining Mosyle items untouched.

russellkg · September 12, 2023, 4:02pm

Hi @MutedJam,

I solved this with a different workflow.

I have two similar workflows that cleanup the data. Future optimization would be to create single workflow that can process both streams. But let’s “Keep It Simple Stupid” right now.

Once that returns, I use the combine node to find orphans and matches. Not done yet, but there I am.

system · December 11, 2023, 4:02pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.