I’m scraping data from Apify, splitting it into items, and saving them to Google Sheets.
The problem: It saves all items again and again, even if they already exist.
I’m looking for a way to check if a row already exists and only add new entries.
I already tried using IF nodes and playing around with Remove Duplicates, but I’m a total beginner and couldn’t really get it to work the way I imagined.
Would really appreciate any tips or examples on how to set this up properly.
If there is a column you could use as a key to match, like an article ID or a URL, and assuming you wouldn’t lose information by updating/replacing a row with a “new scrape” of the same page/content, you could use the Google Sheets node’s Update Row operation instead of Append Row. If you want the same result without replacing a row, you’d still need a key column and use Google Sheets node’s Get Row[s] operation with a filter, then an If node, and then (if a match was not found) use the Google Sheets node’s Append Row operation.
Could you help me set this up exactly as you described, with the Get Rows node, an If node, and appending only if no match was found? I wasn’t able to get it working. Thanks!
I was just trying to give you enough of an idea to start with so you could figure it out. I would recommend you go through the courses to become familiar with how things work in n8n. You won’t always get someone to figure things out for you.
Here’s one way to make it work which requires a merge node to carry forward the input items that were not found in the existing sheet.