A Comprehensive Polish Medical Speech Dataset for Enhancing Automatic Medical Dictation

Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, col...

Full description

Saved in:
Bibliographic Details
Published inScientific data Vol. 12; no. 1; pp. 1436 - 13
Main Authors Czyżewski, Andrzej, Cygert, Sebastian, Marciniuk, Karolina, Szczodrak, Maciej, Harasimiuk, Arkadiusz, Odya, Piotr, Galanina, Marina, Szczuko, Piotr, Kostek, Bożena, Graff, Beata, Szplit, Dariusz, Budzisz, Mariusz, Narkiewicz, Krzysztof
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 16.08.2025
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2052-4463
2052-4463
DOI:10.1038/s41597-025-05776-1