General newbe question: since my workflows using AI agents will work in non-English languages, my research shows that it would be a good idea to use embedding models that were trained on non-English datasets, as this would then provide better similarity search results, compared to models like openai. So I found that a much better solution would be to use a model like multilingual-e5. However I stumbled upon an issue how to actually use this model in my workflow as a subnode to my vector db nodes. None of the providers from the embeddings node offer that. Pinecone has embeddings includes in their service (incl. multilingual-e5), but that is not yet supported in n8n. So what are my options here?
It looks like your topic is missing some important information. Could you provide the following if applicable.
- n8n version:
- Database (default: SQLite):
- n8n EXECUTIONS_PROCESS setting (default: own, main):
- Running n8n via (Docker, npm, n8n cloud, desktop app):
- Operating system:
I’ve use Vector Embeddings in Brazilian Portuguese and Spanish without any issues. It’s actually very accurate.
It’s very simples to setup embeddings using the OpenAI nodes, so I think it’s worth the shot!
Sometimes we think we need the ABSOLUTELY BEST solution in the world. But in this case I think the benefits would be minimal.
Check out this tutorials from Cole Medin. He provides templates and documentation:
- Watch this first → https://www.youtube.com/watch?v=PEI_ePNNfJQ
- Then this → https://www.youtube.com/watch?v=T1ZKEmDN8AA
If my reply answers your question, please remember to mark it as a solution.
I think this feature is needed, as if you look at the MTEB leaderboard, there are so many embeddings models that could be used here.
I think we need a generic embeddings model endpoint that we can configure and connect to a Vector Store … the equivalent of a HTTP request module.
I know you can use the OpenAI module for embeddings models that have the same API structure, but you’re also limited by what parameters you can pass.
For example: I’m trying to use Jina.ai v3 embeddings model which has “late chunking” which is a flag that can be passed as a parameter. Embedding API
Jina’s embeddings endpoint doesn’t follow the same standard as Openai’s embeddings (input is an array for example)
And there are custom parameters that can be passed, like late_chunking.
Would be a great to allow custom embedding models!
I strongly agree with your view on ‘a generic embeddings model endpoint’. I also think that n8n currently has too few embedding nodes and lacks a customizable node like the OpenAI compatible mode that can call third-party embedding models. I hope this will be available soon.
I came across this thread looking for the same, and is indeed very important, otherwise we would be encouraged to use code nodes to embed, which defeats the purpose of the entire set of vectorstore embedding nodes n8n has built. a simple workaround for the moment would be to have an open “embedding model node” for us to fill with code as suggested above.
Thats a good idea I like it