Embeddings and clustering

I’m building a workflow for news monitoring. My idea is something like this:

  1. Once a day, fetch a bunch of RSS feeds.
  2. Use a LLM to filter out items that match my interests.
  3. Calculate embeddings for the filtered items.
  4. Cluster related items together.
  5. Send each group of related items to a LLM for a summary of them.
  6. Compile an email with each summary included.

What nodes do I need for embedding and clustering?

/Anders

Do you have a built solution so far I could provide input on?

No, still trying to sketch out this idea. But since I can’t figure out a way to do embeddings on whole articles (only find ways to do them by smaller chunks, which I suppose are for RAG-like implementations) or the clustering I wonder if I’m not approaching this the right way. :slight_smile:

You should be able to increase chunk size to custom chunk sizes for this.

1 Like