So basically I found it often hard to determine exactly how much Tokens a Prompt will take before submitting to OpenAI. That textLength/4 doesn’t really made the cut for me.
Naturally we want to make that as efficient as possible, so I created this node.
With this node you can:
Encode a string into BPE Tokens (may be cool for custom training)
Decode an array of BPE Tokens back to a string (for funzies?)
Determine a strings token length before submitting to the OpenAI API
Calculate costs before submitting to OpenAI API
Split a text into chunks which match exactly a definable Token Limit
This is great! Useful for many tasks that utilize OpenAI’s API.
I am on the lookout for using the Embeddings model as a node (creating embeddings, storing them in vector databases, and searching the embeddings). If you are aware of any options, pls point me to them.
Hey, thanks for this. I got it working with the HTTP Request node, as you suggested. I was testing it with a free version of Pinecone DB (that too using the HTTP Request node). I am able to generate, store, and query embeddings.
awesome! I’m sure there will be nodes from the community regarding the common vector databases in the future, if not myself will implement some when I’m finally taking a deep dive into Vector DBs.