OpenAI Predicted Output

Option for using predicted outputs, a feature of the OpenAI API.

Predicted Outputs enable you to speed up API responses from Chat Completions when many of the output tokens are known ahead of time. This is most common when you are regenerating a text or code file with minor modifications.

It helps reduce latency and save money.

My use case

My SaaS users can generate copywriting texts based on templates.
90% of the text is the same, except for the variables that are specific to the end-user’s product.

The AI should be able to adjust the text when needed, but most of the text will remain the same even when adjustments are needed.

I think it would be beneficial to add this because

It would save money and time.

Any resources to support this?

https://platform.openai.com/docs/guides/predicted-outputs?lang=curl

Are you willing to work on this?

Sorry, I have no idea how to implement this.

:+1: Upvote this request if you think it would be useful

I think claude does this automatically and cache the tokens to save money and time but im not sure if its exactly like this feature or not.