Google Sheets problem / Removing Duplicates

We are currently trying to make a workflow work.
I think we are having a logic flaw because there is already too much time spend on making this work.

Explanation for the workflow :
We want to retrieve companies that went bankrupt, get the companies, put them in google sheets. Do this once every day. The companies that are new we also want to send to a Mattermost channel. And save the new companies to the google sheet. So only the new and original companies should be send to Mattermost instead of just pushing it all to the google sheets.

Steps :
1 - Making a HTTP Request to a website and retrieve the raw HTML
- Method = get
- Link = https://www.faillissementen.com/insolventies/nederlandse-insolventies/?length=100
2 - Extract HTML From HTTP Request
- Source data = JSON
- JSON property = data
- Extraction Values
KVK = table#lijst tbody tr td:nth-child(2)
Datum = table#lijst tbody tr td:nth-child(3)
Bedrijf = table#lijst tbody tr td:nth-child(4)
Stad = table#lijst tbody tr td:nth-child(5)
Branche = table#lijst tbody tr td:nth-child(7) span
Status = table#lijst tbody tr td:nth-child(8)
3 - Code that will format the HTML in good table
const input = items[0].json;
const result = [];

for(let i = 0; i < input.Datum.length; i++){
const bedrijf = {
Datum: input.Datum[i],
Bedrijf: input.Bedrijf[i],
Stad: input.Stad[i],
Branche: input.Branche[i],
Status: input.Status[i],
KVK: Number(input.KVK[i])
};
result.push({json:bedrijf});
}

return result;

4 - Read from google sheets

5 - Merge the old google sheets and the new from the website

6 - Code 2
let kvkSet = new Set(); // Dit zal de KVK-nummers van bestaande bedrijven opslaan.
let nieuweBedrijven = []; // Dit zal de nieuwe bedrijven opslaan.

for(let i = 0; i < items.length; i++) {
let bedrijf = items[i].json;
// Als het KVK-nummer van het bedrijf niet in de set bestaat, voeg het dan toe aan de lijst van nieuwe bedrijven.
if(bedrijf.KVK && !kvkSet.has(bedrijf.KVK)) {
nieuweBedrijven.push({json: bedrijf});
kvkSet.add(bedrijf.KVK); // Voeg het KVK-nummer toe aan de set zodat we het niet opnieuw toevoegen.
}
}

return nieuweBedrijven; // Retourneer alleen de nieuwe bedrijven.

This should remove all the duplicates from google sheets, but it doesn’t work well.

7 - Send to google sheet
Here i just want to send the new ones to google sheet. After this I want to use the same new/original bankrupt companies and send them to Mattermost.

Making the google sheets from “append” to “append/update” helped a lot. Now i can filter on certain parts so only the new will be saved. But still this was not exactly what i was searching for.

Perhaps someone knows a better idea.

Hi @Jeffrey_Zschottche, I am sorry you are having trouble.

Btw, instead of describing your workflow you can simply select it on your n8n canvas, then use Ctrl+C top copy and paste it here on the forum using Ctrl+V. This will insert your wokrflow as a preview other, allowing other users to open and explore it (and saving yourself from having to write down a description of each field).

Now as for the actual challenge it seems you are using the Merge node in “Append” mode. Removing duplicates might be easier when in “Combine” mode which allows you to only let items pass that exist in one data source, but not the other. So assuming your sheet reflects the structure of the website table and has KVK as a unique key, something like below should do the job completely without code:

The Merge node in this example would compare the KVK field on Input 1 (the website) with the KVK field from Input 2 (the Google Sheet):

It would only items pass that exist on the website, but not my Google Sheet yet.

After the Merge node, I can then simply add my new rows to Google Sheets (or send out a notification about bankruptcies etc).

Hope this makes sense and helps :slight_smile:

1 Like