Gemini-2.0-flash-thinking-exp new SDK feature complete?

Subash_Chandra · January 17, 2025, 8:10am

The idea is:

Gemini-2.0-flash-thinking-exp-1219 has released, along with their new Genai api (google.genai instead of google.generativeai), The old generativeAI API is being phased out slowly, as this new one is becoming feature complete.

I would like to request support for generate_content_stream, combined with the output “thinking” and “response” sections to be preformatted based on the response.candidate…, and allowing the user to minimize the thinking section.

My use case:

It will allow a streamed chunk by chunk response, while also allowing the user to easily see the internal thoughts and the external response separately.

I think it would be beneficial to add this because:

People want to use these new flash-thinking models for various small tasks, the TTFT of these flash models is incredible, and very useful as a middleman doing simpler tasks while waiting for Claude requests or OpenAI requests to go through. Unlocking the full capability of the Google Gemini API is probably a good idea right?

Any resources to support this?

Gemini 2 Cookbok on Github

Gemini 2.0 Flash (experimental) | Gemini API

Gemini 2.0 Flash Thinking Mode | Vertex API

Gemini 2.0 Flash Thinking Mode | Gemini API

Are you willing to work on this?

Yes, if there is Python work to be done, though I assume this project is using the js SDK.