How to retrieve only new records?

General question, although specific info may be need to answer it.

I use an HTTP node to query a data base to look for entries to a race. I am 99% finished building the flow to manipulate the data, so some checks and inject it into Mautic.

I am assuming each cron job initiated query of the source database will return ALL the entries to date. I would only want to retrieve, or process, entries since the last time the cron ran.

What would be the approach to setting that up?

1 Like

Hey @Robm, in many cases you can use a polling approach described here:

This would require a unique identifier for each individual dataset.

If the data source you are querying has timestamps you could also use these instead (and run your workflow once per hour fetching the entries for the last hour).

2 Likes

Thanks for that link! I was wondering about this too. This is such a common pattern that I think n8n should have a node to handle this out of the box.

1 Like

Quite a few trigger nodes actually use polling behind the scenes. E.g the Google Calendar Trigger would check regularly if new events have been created since the last polling run and the current polling run.

But I suspect you were more thinking of a generic approach, right? This would be a great feature request! Maybe you can put it in a separate post so it doesn’t get lost under this question?

1 Like

Exactly - I made the request, thanks!

2 Likes

To make this generic, and because you can’t control the returned data, you’d need to use workflow static data.

In this example, we get the last comment ID lastId from static data (or set to 0), the IF node then compares the IDs from the HTTP request to see if they’re greater than the lastId and acts accordingly. Finally it updates the static data lastId. For demo sake it uses a _limit parameter to limit how many results are returned, which is always lastId + 5 so we get new data each run. This way when you set the workflow to active, you’ll see how it functions with new data. Note: staticData does not work in testing mode, it only works on active workflows.

If you’d like this to be easier to access, like in expressions then pop a vote on this issue - Make workflow static data accessible

Second run:

8th run:

Workflow JSON.

2 Likes

Trying to get this to work.

I am working with an orderNumber - ORD00028340 for example which is a string.

I imagine I need a function to change this to a number, ie remove ORD, in order for the polling flow to work.

Googling and trying to learn javascript is proving tough, my function keeps getting errors. I am struggling to define the item.

Here is my function - blindly and, mostly dumbly, copied from stack over flow

var str = item.orderNumber;

var chars = str.slice(0, str.search(/\d/));
var numbs = str.replace(chars, '');

console.log(chars, numbs);

return item;

here is the results from the previous node


25
/
1296
JSONTable
[
{
"k1": {
"name": "CATHY",
"csaId": 4810,
"cellNumber": "redacted",
"emailAddress": "redacted"
},
"k2": {
"name": "GREG",
"csaId": 4475,
"cellNumber": "redacted",
"emailAddress": "redacted"
},
"k3": {
"name": "RYAN",
"csaId": 4405,
"cellNumber": "redacted",
"emailAddress": "redacted"
},
"k4": null,
"entryDate": "2021-09-24",
"entryTime": "15:13:14",
"raceName": "2021 PREFISH EVENT",
"orderNumber": "ORD00028340"
},

It looks like your orderNumber is zero padded string, so you can just compare them as strings.

Try this, it basically compares item.orderNumber to the staticData.lastID within an expression in the IF node.
The workflow generates 10 order numbers based based on current time + X seconds.

If you have a Github account, throw a thumbs up on this PR. It will allow you do do the same thing in a single node - ⚡ Add FilterNew node that reports new or unseen items by fahhem · Pull Request #2310 · n8n-io/n8n · GitHub

Or these I’ve just seen:

1 Like

Voted, and I’ll try your suggestions, thanks @pemontto

Thanks @pemontto

I have tested your suggestion, as is, without success. I tested with the workflow activated as I believe that is required for polling to work.

It’s my understanding that on 1st running the workflow, all 10 entries will go down the true leg of the IF node and be given a satic data last ID.

Then the next time the workflow runs, all 10 items should go down the false arm of the IF node as they were already processed in the 1st run? Am I testing correctly? Is my understanding correct?

All 10 go down the true path on the first run. But the next run (depending on timing) maybe 5-8 should go down the false path. This is because I generate order numbers based on a timestamp so they increase on every run which might’ve been confusing. It was just an example to show that old data won’t get re-processed, but new data should.

Look at one of your past executions and check the output of the Get LastID node. Now compare that number to the orderNumber column of the Query DB node. Any of those orderNumbers that are larger than the Get LastID will be processed.

Great, I see that now.

Hey @Robm
I built n8n-nodes-updates which enables you to split the items into 3 groups (based on the previous execution):

  • new items
  • updated items
  • old items

Good Luck

4 Likes

Looks great, keen to play with it. I don’t see it included in the latest version of n8n? Will it be a part of future releases?

I am not CLI proficient enough to install it myself with out step by step instructions I can follow dumbly.

My n8n is installed on DO using the n8n tutorial - so docker.

1 Like

Love the Node!

Any way you can tune it to check New Entries in past ‘X’ amount of hours?
e.g check New Entries within past 24 Hours?

For my 2 pennies worth.
If the webservice (or whatever you call), for the data you need doesn’t provide a “recent changes” request, or a date “from - to last updated”, or similar then the webservice (or whatever you call) will have to send all the data you request anyway. It’s just “hiding” the comparisons from the n8n end user.

Most of the time I just request to webservice then dump to database table (truncating database table first).
For data that I know will never change I extract that separately.
Example 1.) Zendesk tickets.
They have status closed which can never be altered.
Or not closed (i.e. open, solved, pending, on-hold) and can be altered.
I have a closed tickets database table and not closed database table.

So I webservice request all closed tickets from the date of last ticket in my closed tickets table to yesterday. (typically less than 500)
I then webservice request all not closed tickets (typically less than 1000) and just union the 2 tables together for my reporting.

Example 2.)
There is this application we have at work, called LMB (Lundi Matin Business) which only provides manual extracts of data to csv.
We have 150,000 records in one of the extracts and there is no way to manually filter only records with data changed since yesterday. So I have to manually extract all the records to ensure all data is recent.
(I have now have set this up as a playwright script so it is automated within n8n)
So again truncate table and insert new data every day for this one.

If I was dealing with millions of records, then I would obviously be re-thinking my strategy but for these “few thousand lines” tasks, I never reach any api request or database storage limits, so I don’t worry too much about it.
.
I don’t know how your proposal would work for large data sets, or how you determine what a “good new data set is to be added to static history data”.
I do know, from my 3 months experience using n8n that data more than 50,000 lines tends to slow things down in the n8n user interface when debugging process, (“split in batches” node is always my friend in these situations - or using a reduced data set during testing). Though I have not so far experienced any problems when the node is activated and run through a cron job.

1 Like

@naskdev This node doesn’t work anymore

N8N Version : 1.25.1

Hello all,
I have done a workflow to filter data and keep only the new one:

Here is how it works:

  • Compute hash for each data
  • Get all data stored in windows chat buffer
  • Compare the data hash with stored hash and keep new data
  • Store new data hash
  • Output data

This workflow must be called from other workflows and is used as a lib. The output of the node is new data only: