From Scans To Speech: How Google Is Redefining Healthcare AI
In Brief
Google updated its open-source MedGemma medical AI with abilities for interpreting scans like CT and MRIs, also releasing an open MedASR speech-to-text tool.
Technology company Google announced an update to its MedGemma AI model, enhancing support for medical imaging applications.
The new MedGemma 1.5 4B model incorporates feedback from the developer community to better support multiple medical imaging modalities, including high-dimensional scans such as CT and MRI, histopathology images, longitudinal imaging like chest X-ray time series, and anatomical localization tasks.
It also improves medical document understanding, enabling extraction of structured data from lab reports. Compared with the previous MedGemma 1 4B, the 1.5 4B update offers enhanced accuracy for text, medical records, and 2D imaging, while remaining compact enough to run offline.
For more complex text-based applications, developers can continue using the larger 27B parameter MedGemma model. Full details and benchmarks are available in the MedGemma 1.5 model card.
MedGemma was originally built as a multimodal system to reflect the complex data environment of medicine, with early versions supporting interpretation of two-dimensional medical images such as chest X-rays, dermatology images, retinal scans and histopathology samples. The latest release, MedGemma 1.5, expands these capabilities to include high-dimensional medical imaging, incorporating three-dimensional CT and MRI data as well as whole-slide histopathology. Developers can now create applications that process multiple image slices or patches together with task-specific prompts, enabling more advanced diagnostic and analytical use cases.
According to internal evaluations, MedGemma 1.5 demonstrates notable performance improvements across several domains, including classification of CT and MRI findings, histopathology analysis, anatomical localization in chest X-rays, longitudinal image review, and structured data extraction from laboratory reports. The model also shows substantial gains in medical text comprehension and electronic health record question-answering, reflecting broader advances in both vision and language performance.
This expanded functionality builds on Google’s earlier CT foundation tools and represents one of the first publicly available open multimodal models capable of handling high-dimensional medical data alongside traditional text and 2D imagery. While these features are still evolving, the company expects developers to achieve further improvements through domain-specific fine-tuning, supported by newly released tutorials and resources for CT and histopathology applications on Hugging Face and Model Garden.
Google Introduces MedASR To Enhance Medical Speech Recognition And AI Clinical Workflows
In addition, Google has released MedASR, an open automated speech recognition model fine-tuned for medical dictation, which converts speech to text and pairs with MedGemma for advanced reasoning tasks.
While text remains the dominant interface for large language models, spoken communication continues to play a central role in clinical practice, from physician dictation to real-time patient consultations, making accurate speech recognition an essential capability.
MedASR is designed specifically for medical language, enabling more reliable transcription of domain-specific terminology and serving as a natural input method for MedGemma. In comparative testing against the general-purpose Whisper large-v3 model, MedASR demonstrated significantly higher accuracy, producing substantially fewer transcription errors on both chest X-ray dictations and a broad internal benchmark covering multiple medical specialties and speaker profiles.
All HAI-DEF models, including MedGemma 1.5, MedASR, and the MedSigLIP image encoder, remain free for research and commercial use and can be accessed on Hugging Face or integrated into scalable applications on Vertex AI.
MedGemma Gains Global Traction As Healthcare Systems And Researchers Expand AI Adoption
According to Google, adoption of MedGemma is expanding among health technology startups and research teams worldwide, with the model increasingly used to accelerate development across a wide range of medical applications.
In Malaysia, Qmed Asia has integrated MedGemma into askCPG, a conversational system designed to provide access to more than 150 national clinical practice guidelines. According to the Ministry of Health Malaysia, the interface has improved the usability of these guidelines in routine clinical decision-making, while early pilot programs have reported particularly strong feedback on the platform’s multimodal medical imaging features powered by MedGemma.
In Taiwan, the National Health Insurance Administration has applied MedGemma to analyze preoperative assessments for lung cancer surgery. By extracting structured insights from tens of thousands of pathology reports and other unstructured clinical data, the initiative supports large-scale statistical analysis intended to inform policy decisions and improve surgical planning and patient outcomes.
Since its release earlier this year, MedGemma has also been widely referenced in academic medical AI research, where it has demonstrated strong performance as a foundational model for tasks such as medical text comprehension, multidisciplinary clinical decision support, and mammography reporting.
Disclaimer
In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.
About The Author
Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.
More articles
Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.