HTTP response too large - loop over in batches

I am trying to build an automation that takes time tracking data from a specific period and spits out a report. The issue I’m running into is the size of that data. For a whole month we can have up to 5k entries. Getting all this data at once crashes my n8n instance.

Luckily for me there’s two things

  • API supports getting data in batches
  • I only need maybe 15% of the data returned via API and by the time I clean it up the data size becomes manageable

I can’t figure out a way to do this ‘cleanup’ process gradually as the data comes in. The ideal process would be

  • Fetch part of the data
  • Delete extras, clean it up
  • Process data and further reduce size
  • Store processed data

Flow below is close to what I need but I’m not sure how to use the loop over items node with HTTP pagination

Welcome to the n8n community @terraxus!

I’ve built a similar workflow recently which includes a http request node which does paging. I solved it by putting the http node directly before the Loop Over Items node. In your case you may not even need to use a “Loop Over Items” - have you tried it without?

Did you test if the paging works properly? In your workflow the “Next URL” field shows red, so maybe that’s another issue.

For filtering down items you can also consider using the filter node which helps reduce the items that you have to process.

Hey @haimich thanks for following up! Please find my feedback below

I solved it by putting the http node directly before the Loop Over Items node. In your case you may not even need to use a “Loop Over Items” - have you tried it without?

Did you test if the paging works properly? In your workflow the “Next URL” field shows red, so maybe that’s another issue.

Yes, I tried it without and yes the paging works properly. The setup below works perfectly in terms of getting all data that I need, but (continued below)

For filtering down items you can also consider using the filter node which helps reduce the items that you have to process.

Thanks for the suggestion, but the problem I have is that I need ALL of the entries that I get from that HTTP request, it’s only that I don’t need all data from each entry.

For example 1 time entry might have 15 key/value pairs with a ton of information and I need only 3: name, time, project. Multiply that with the number of employees, number of projects, multiple entries per day and over a period of about a month we’re looking at about 4-5k entries. Processing this amount of data crashes my instance every time.

My other thought was instead of splitting the data processing was to split the entered date: e.g. if the requested period was 30 days split that in 5 batches of 6 days of data. I still have to test this out

I understand what you mean. One solution I can see is to data that doesn’t change very often in a data table. For example you could have one workflow that periodically fetches workload data and only stores the relevant information in a table workloads.

When the form is used you can query the data table instead of the workload API from tempo.io. You could also for example store the mapping of authorAccountID to author name in a different user table (before you fetch a mapping you check if you have the entry in your data table, if not fetch it and store it there.).

If this is a suitable approach for you depends on the amount of data (data tables have some restrictions in terms of size) and possble data privacy concerns.

I would also double check why your n8n instance crashes - I assume that you reach a memory limit. Maybe you can increase that and try again?

Thanks for the suggestions! That’s definitely in the plan for the future, it doesn’t make sense to send out all those requests every time.

Unfortunately that doesn’t help me in this case as the workflow fails/instance crashes immediately on the 2nd node, the first “HTTP request” one. That’s where the bulk of the data comes in.

I would also double check why your n8n instance crashes - I assume that you reach a memory limit. Maybe you can increase that and try again?

Is this possible on the cloud instance? I might have missed that somewhere, that would be ideal if I could do that

Unfortunately that doesn’t help me in this case as the workflow fails/instance crashes immediately on the 2nd node, the first “HTTP request” one. That’s where the bulk of the data Unfortunately that doesn’t help me in this case as the workflow fails/instance crashes immediately on the 2nd node, the first “HTTP request” one. That’s where the bulk of the data comes in.

Can you share what the error message is that you receive? Maybe you need to increase the “Interval Between Requests (ms)” to a higher value, eg. 500 ms.

Is this possible on the cloud instance? I might have missed that somewhere, that would be ideal if I could do that

I wasn’t aware that you were using the cloud version: probably it’s not possible there, you’re right.