The idea is:
Gemini-2.0-flash-thinking-exp-1219 has released, along with their new Genai api (google.genai instead of google.generativeai), The old generativeAI API is being phased out slowly, as this new one is becoming feature complete.
I would like to request support for generate_content_stream, combined with the output “thinking” and “response” sections to be preformatted based on the response.candidate…, and allowing the user to minimize the thinking section.
My use case:
It will allow a streamed chunk by chunk response, while also allowing the user to easily see the internal thoughts and the external response separately.
I think it would be beneficial to add this because:
People want to use these new flash-thinking models for various small tasks, the TTFT of these flash models is incredible, and very useful as a middleman doing simpler tasks while waiting for Claude requests or OpenAI requests to go through. Unlocking the full capability of the Google Gemini API is probably a good idea right?
Any resources to support this?
Gemini 2.0 Flash (experimental) | Gemini API
Gemini 2.0 Flash Thinking Mode | Vertex API
Gemini 2.0 Flash Thinking Mode | Gemini API
Are you willing to work on this?
Yes, if there is Python work to be done, though I assume this project is using the js SDK.