N8n Community Node: GPT-Tokenizer

Hey I’ve released a new node which should pair just great with the OpenAI Node:


So basically I found it often hard to determine exactly how much Tokens a Prompt will take before submitting to OpenAI. That textLength/4 doesn’t really made the cut for me.

Naturally we want to make that as efficient as possible, so I created this node.

With this node you can:

  • Encode a string into BPE Tokens (may be cool for custom training)
  • Decode an array of BPE Tokens back to a string (for funzies?)
  • Determine a strings token length before submitting to the OpenAI API
  • Calculate costs before submitting to OpenAI API
  • Split a text into chunks which match exactly a definable Token Limit

It uses this npm package under the hood:

let me know what you think of it! :wave:


Hey @geckse ,

Thank you for your efforts Marcel!

1 Like

This is great! Useful for many tasks that utilize OpenAI’s API.

I am on the lookout for using the Embeddings model as a node (creating embeddings, storing them in vector databases, and searching the embeddings). If you are aware of any options, pls point me to them.

Might indeed be Interesting to give the OpenAI Node the capabilities of embeddings but I’m afraid theres currently no node for that.

You could maybe get that done with the HTTP-Request.

Might be also worth noting that Weaviate comes with a build-in OpenAI vectorizer:

Maybe that helps with your intention. :slight_smile:

1 Like

Hey, thanks for this. I got it working with the HTTP Request node, as you suggested. I was testing it with a free version of Pinecone DB (that too using the HTTP Request node). I am able to generate, store, and query embeddings.

Jayavel S

1 Like

awesome! I’m sure there will be nodes from the community regarding the common vector databases in the future, if not myself will implement some when I’m finally taking a deep dive into Vector DBs. :eyes:

1 Like

Would love to see it.

@geckse is it possible to use your community node to automate text/data to embeddings and upsert them into pinecone with particular metadata?

like using an rss feed, csv, text…turn into embeddings, assign metadata, upload to pinecone.

If so, do you have any sample workflows or tips on how to do it?