I have a Google Sheet full of scraped data from my email. Every row represents an invoice, and every cell carries invoice data. I need to create an automation to clean up the duplicity in rows. I know such a feature exists in Google Sheets, but I must use this logic. “Order all rows based on ‘Invoice number’. Find all duplicities. Compare all rows with the same “Invoice number” and remove those rows with fewer data in “Amount without VAT,” “Total amount,” “VAT,” “Supplier,” “Subscriber,” and “Payment terms.” Zero represents no data.”
I am about to use the Open AI node to clean it up but struggle to set it up correctly. I need help figuring out how to approach it.
Please share your workflow
Share the output returned by the last node
“error”:
“Invalid input data format. Please provide valid JSON data.”
Hey @Vlado_admin , it sounds like a simple task. To de-duplicate the data from the spreadsheet use Remove Duplicates node. You do not need AI for that.
Thanks for your reply, but I need to remove duplicity with conditions.
Let’s say I have three rows with the same invoice number. The first row is missing data in all cells, such as VAT and total price. The second row is complete, and the third row is missing only data in VAT. I want some node (OpenAI) to analyze these duplicates and keep only that row with the maximum data.
I hope it is clearer now.
Hey @Vlado_admin , thanks for clarification. I still believe you could get by without AI. For example, you could aggregate those records that share the same “Invoice number” and work individually on each group as demostrated in the workflow below. Pretty simple (with smart aproach).