Automation of semantic comparison

hi guys, I want to create a solution that automate this process:

  • take every LLM output of a client for 1-4 different queries
  • compare the previous vs last output semantically to see if changed the output (especially if there’s no mention about legal cases)
  • send notification to the client only if the output chaged

I’ve already automatized the gathering of the data which go into a google sheet

one sheet fot gemini and one for chatgpt

the problem is how compare the texts.

I’m trying with the AI agent with the google sheets as tools but it retrieves the wrong dates and only compare outputs randomly even if i described in detail in the prompts how it should compare.

I don’t know how to get reliable comparisons.

Should I store the output in a vector DB to get more consistent results and if so how?

Should I use an AI agent or create an AI workflow?

How would you approach it?

ps: it would be cool to highlight significant differences in the text too

Hi there, from what i see here, what you are doing is not comparing it semantically, in order for a text to be compared semantically they first need to be embedded into vectors and then to compare them, you use something like this

You calculate the similarity between the two vectors.

similarity = cosine_similarity(textA_embedding, textB_embedding)
  • If result is close to 1.0 → texts are semantically similar.
  • If result is close to 0 → they’re unrelated.
  • If result is negative → opposite meanings (rare in practice for embeddings).

so the workflow would look something like this


n8n Semantic Similarity Workflow

1. Google Sheets – Fetch Texts

  • Use “Google Sheets → Read Rows”
  • Read a row with two columns: text_1 and text_2
  • Limit to 1 row for testing

2. HTTP Request – Embed text_1

  • Method: POST

  • URL: https://api.openai.com/v1/embeddings

  • Headers:

    • Authorization: Bearer YOUR_OPENAI_API_KEY
    • Content-Type: application/json
  • Body:

{
  "input": {{$json["text_1"]}},
  "model": "text-embedding-3-small"
}
  • Store result as embedding1

3. HTTP Request – Embed text_2

  • Duplicate previous node
  • Change input to:
"input": {{$json["text_2"]}}
  • Store result as embedding2

4. Code Node – Cosine Similarity

  • Add a Code node with this script:
function cosineSimilarity(a, b) {
  const dot = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const normA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const normB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dot / (normA * normB);
}

const a = items[0].json.embedding1;
const b = items[0].json.embedding2;
const similarity = cosineSimilarity(a, b);

return [
  {
    json: {
      similarity,
      result: similarity > 0.8 ? "Similar" : "Not Similar"
    }
  }
];

5. Output

  • Use a Set node, Webhook Response, or any output node to show:

    • similarity
    • result

if this answer your question please mark this as a solution and give it a like

Thank you so much for the detailed answer Fahmi.
But I wanted to ask if you think it would perform better if I create directly a Vector database to have a more solid knowledge base to compare output or for this usecase is better to just do cosine similarity on the go?

the choice only depends if you want to persist the data later on or no

if you want to persist the data then you use a database for that, if not then you dont need to store it in db

n usually when you store the data in the vector db, you’re just not storing the vectors, but also the metadata to identify it like the docs name, docs ids, etc.

which i dont think you need in the workflow that you want to achieved, because what you want to achieve is just comparing 2 things that you already know from where to get