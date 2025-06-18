Google DeepMind Releases Gemini 2.5 Pro And Flash Models, Introduces Flash-Lite 2.5 In Preview

In Brief Google DeepMind’s Gemini 2.5 Flash and Pro models are now generally available, with the 2.5 Flash-Lite—its most cost-efficient and fastest model in the 2.5 series—introduced in preview.

AI division of the technology company Google, Google DeepMind has released its Gemini 2.5 Pro and Gemini 2.5 Flash models, making them generally available. A preview version of Gemini 2.5 Flash-Lite has also been introduced, which is positioned as the most cost-effective and fastest model in the 2.5 series to date.

Gemini 2.5 Pro is the most advanced model in the Gemini series, intended for tasks requiring complex reasoning, code generation, problem-solving, multimodal input processing, and extended context understanding. The model supports multimodal inputs including text, images, audio, video, and documents, and currently features a context window of approximately one million tokens, with an upcoming expansion to two million. It incorporates structured reasoning mechanisms and a capability referred to as Deep Think, which enables parallel processing of complex reasoning steps. Performance metrics reportedly show high scores in areas such as coding, scientific reasoning, and mathematics, based on results from benchmark tests including LMArena and Humanity’s Last Exam.

Gemini 2.5 Flash is a high-throughput model optimized for efficiency and cost, while maintaining strong performance in general-use scenarios. It has been available since mid-June 2025 and includes reasoning capabilities by default, which can be modified through API settings. The model demonstrates improvements in benchmarks related to coding, reasoning, long-context comprehension, and multimodal functionality. Token efficiency has also been increased, with reductions in cost per operation—listed at $0.30 for one million input tokens and $2.50 for one million output tokens.

According to the announcement, Gemini 2.5 models are currently in use by developers and organizations such as Spline, Rooms, Snap, and SmartBear for production-level applications.

A preview release of Gemini 2.5 Flash-Lite has been introduced, described as the fastest and most cost-efficient model in the 2.5 family to date. The model is currently available for early use, with feedback from developers being encouraged during the preview phase.

Gemini 2.5 Flash-Lite is reported to show improvements across multiple performance areas—including coding, mathematics, scientific reasoning, and multimodal tasks—when compared to its predecessor, Gemini 2.0 Flash-Lite. It is optimized for high-volume, low-latency applications such as translation and classification, and exhibits reduced response times relative to both 2.0 Flash-Lite and 2.0 Flash across a diverse set of input prompts.

The model includes key functions found in other Gemini 2.5 versions, such as the ability to activate reasoning processes within variable budget limits, integration with external tools like Google Search and code execution systems, support for multimodal input, and a one million-token context window.

The preview version of Gemini 2.5 Flash-Lite is currently accessible through Google AI Studio and Vertex AI, where it is offered alongside the stable versions of Gemini 2.5 Flash and Gemini 2.5 Pro. These models are also available through the Gemini mobile app, and customized deployments of 2.5 Flash and Flash-Lite have been integrated into Google Search services.

