Hey Guys, so I created this basic email classifier for SPAM emails that do not get classified as SPAM and are a disturbance for the inbox. But it is not as accurate as I would like.
My idea is to find a way for N8N to crawl through my whole email and get to know who I have had a connection with before or a new email, past emails and all the information in between.
So for this to work (or at least what I think) is that I need two parts.
A node that allows the automation to read all the gmail and interpret it.
A way to make the AI to learn though time and experience like a real LLM.
To get the level of accuracy you want, especially when you’re trying to analyze all your historical email and improve over time, you’ll want to move beyond a simple classifier. What you’re describing is essentially a RAG (Retrieval-Augmented Generation) agent.
A RAG setup lets you:
Pull in all your historical Gmail data (senders, threads, who you’ve interacted with, patterns of real vs spam, etc.).
Store it in a vector database.
Let an LLM make decisions based on that stored context rather than trying to “learn” over time like a traditional ML model.
Update the database as new emails arrive, so the agent becomes more accurate over time without having to retrain a model.
In practical terms for n8n:
Use Gmail nodes to sync and periodically update your email data.
Store the message metadata + content in a vector store (Pinecone, Supabase, Qdrant, etc.).
Use an AI node that does retrieval before classification so the LLM can reference your historical email patterns.
Decide if an incoming email is spam based on both its content and your past interactions.
This gives you the “gets smarter over time” effect without trying to train a full ML model inside n8n.
Basically, you’re setting up the brain that can power a whole suite of automations around your inbox.
I’d highly suggest you watch the video below. Great content IMO.