Accumulating / Gathering / Aggregation

philipsd6 · November 11, 2020, 10:11pm

As far as I can tell there’s a gap around accumulating/gathering/aggregation. Let me illustrate with a real example:

Here, I’m getting an HTML page and extracting the rows from a table. But it’s paginated, so I also grab the next page parameter and loop back around to get the next page (and so on.) The “If” node is used to stop the loop if we’ve reached the last page. Now, the HTML Extract Rows node will pass n items (one for each page) to “Accumulate” which has the follow code:

const rows = []

let counter = 0;
do {
  try {
    const theseRows = $items("HTML Extract Rows", 0, counter).map(item => item.json.rows);
    rows.push.apply(rows, ...theseRows);
  } catch (error) {
    return [{json: {rows}}];  
  }

  counter++;
} while(true);

However, the “Accumulate” node returns n items, with only the last one containing the fully accumulated results – even if “Execute Once” is enabled, and even though “Function” nodes are documented to run Once, not once per item as the “Function Item” node does. Am I misunderstanding the documentation?

I resolved the problem by connecting the “Accumulate” node to the “If” fail output:

However, this is less intuitive to me, as I expect the logic as: “If there are no more pages, stop getting the next page” but here we’re saying “If there are no more pages, stop the next page loop and trigger accumulation of the results in a completely disconnected node from the extract rows node.”

As a user, I expect one or several of the following features:

The “Function” node runs once and has access the complete item stream of all the previous nodes.
- Presumably, the “Function Item” node will run once per item and have access to the current stream of items
A “Last” node that returns the last item from a previous node
A more generic “Get Item” node that returns the nth item from the previous node (in which case the last node would be -1)
A more flexible “Aggregate” node that has modes like count/sum/average/append/etc… with a group by field to control the aggregation level(s).
- If the target is an array, append would do the sensible thing.

Let me know if I’m completely off-base, or if this is a feature we should vote on!

jan · November 11, 2020, 10:44pm

However, the “Accumulate” node returns n items, with only the last one containing the fully accumulated results – even if “Execute Once” is enabled, and even though “Function” nodes are documented to run Once, not once per item as the “Function Item” node does. Am I misunderstanding the documentation?

Yes, that is probably not totally clear or easy to understand in the beginning. By default do nodes execute once per item. So let’s say you have 10 items, each contains an URL. If you send that data in an HTTP Request Node it would make 10 requests. One request for each item. If you set “Execute Once” it would only make one request, to the very first URL.
So that option it is only about the items it processes and how often to execute for them. The code of the Funktion-Node runs only 1x no matter how many items there are. The code of the Function Item node runs for every item.
The small number on the nodes is about how often the node did “run”. So in the above URL example did the node “run” only 1x but it did process 10 items.
In probably 95% of the cases do nodes run only 1x in n8n but if you have loops or you connect multiple nodes to a single one, they run multiple times. And that is then when this number in the corner gets larger than “1”.
So in your above example did the HTTP Request node run 2x. First, because it got data from the Start node, and then it did run a second time because it got data from the IF node.
The HTML Extract Rows node did run 2x because the node before it got 2x data and so also it will execute 2x.

Totally agree that the current solution, collecting the data with a Function-Node, is less than optional. It is honestly very bad and impossible to create for new n8n users. It is, however, also just a temporary workaround till we come up with a proper solution. That is sadly currently the case with many things in n8n. Another example is splitting an array into multiple items requires right now a Function-Node. Also for that do we have to come up with an easy node that people can just plug-in. Right now we want to, however, make sure that we do not just push out some half-baked solution, we rather want to do proper research and invest time in thinking how it could properly be handled in the long term. That does obviously take longer but I think it is best for the product in the long term.

philipsd6 · November 11, 2020, 11:33pm

^(s/optional/optimal/)

OK, thanks for the clarification, I know this is a new project, so there’s more to come later… I see how I was confused by the fact that I’m looping around that HTTP Request node.

That said, I still think this is confusing. From my perspective:

passing 10 items to “HTTP Request” and then looping around to do that again should produce 20 items – not two batches of 10 items each
- We have the “Split in Batches” node to handle batching.
passing 10 items to “HTTP Request” with “Execute Once” and then looping around to do that again should produce 2 items.
- (I personally can’t think of a good use case for “Execute Once” —hidden away in the node Settings— that wouldn’t be better handled by having an explicit “First/Last/Get Item (n)” node to make it visible in the workflow.)

In my example, I would look for an “Aggregate” node and set:

field: rows
operation: append

…expecting one item to be emitted with rows set to the concatenated rows from all the incoming items.

Oh— and then that little number on the nodes in the UI should be the number of items produced, not the number of executions.

This is just my perspective as an end user — I hope I’m not being too opinionated!

maxT · November 17, 2020, 8:45am

Hey @philipsd6 ! I work on product design at n8n, just wanted to say that opinions/ feedback like yours are exactly the type of insight we need to make n8n better. So thank YOU and know that I’m noting down your feedback in my user notes.

Transform nodes are definitely on our agenda and understanding how you conceptualise a generic transform (“Aggregate”) is most insightful. If you have any other ideas for generic/ abstracted transforms that you’d use on a regular basis (and currently would have to use Function node) please do let me know!

wallinex · May 10, 2022, 10:10am

Any updates here? E.g if I have split something in batches to upload and use an IF node to proceed once all the items have been processed. I would love to accumulate all the items from all the executions in to one again.

Ed_P · May 10, 2022, 3:59pm

As a workaround I can suggest appending the results from each batch somewhere (in a local file, external database, you name it).
After all pages are processed it will be possible to read back the whole complete list.

maxT · June 7, 2022, 10:41am

@wallinex have you checked out the Item Lists node? It’s a helper node with various operations related to handling items of data in n8n, including aggregating them.

wallinex · June 9, 2022, 10:03am

Hi Max, I ended up writing two functions instead.

const allData = []

let counter = 0;
do {
  try {
    const items = $items("docs", 0, counter).map(item => item.json);
    const items2 = $items("SplitInBatches", 0, counter).map(item => item.json);
   const combineData = items.map((el, i) => ({...el, ...items2[i]}));
    allData.push.apply(allData, combineData);
  } catch (error) {
    return [{json: {allData}}];  
  }

  counter++;
} while(true);

then I merge all those items outside of the loop:

const allItems = $items("Merge Data")

return [{json: {allItems}}];

n1isaac · September 19, 2022, 4:17pm

@wallinex Would you be able to share an example of this in action?

wallinex · September 20, 2022, 8:15am

I would need to recreate the whole thing to do an example without delicate data.

Here is how the gui looks:

Where “Merge Items” is:

const allData = []

let counter = 0;
do {
  try {
    const items = $items("extract", 0, counter).map(item => item.json);
    allData.push.apply(allData, items);
  } catch (error) {
    return [{json: {allData}}];  
  }

  counter++;
} while(true);

Then outside of the loop in the “All Items” function:

const allItems = $items("Merge Items")

return [{json: {allItems}}];

This works on all types of items.