A Comprehensive Polish Medical Speech Dataset for Enhancing Automatic Medical Dictation

Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, col...

Full description

Saved in:
Bibliographic Details
Published inScientific data Vol. 12; no. 1; pp. 1436 - 13
Main Authors Czyżewski, Andrzej, Cygert, Sebastian, Marciniuk, Karolina, Szczodrak, Maciej, Harasimiuk, Arkadiusz, Odya, Piotr, Galanina, Marina, Szczuko, Piotr, Kostek, Bożena, Graff, Beata, Szplit, Dariusz, Budzisz, Mariusz, Narkiewicz, Krzysztof
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 16.08.2025
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition.
AbstractList Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition.
Abstract Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition.
Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition.Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition.
ArticleNumber 1436
Author Budzisz, Mariusz
Narkiewicz, Krzysztof
Cygert, Sebastian
Graff, Beata
Czyżewski, Andrzej
Harasimiuk, Arkadiusz
Odya, Piotr
Marciniuk, Karolina
Galanina, Marina
Szczuko, Piotr
Kostek, Bożena
Szplit, Dariusz
Szczodrak, Maciej
Author_xml – sequence: 1
  givenname: Andrzej
  surname: Czyżewski
  fullname: Czyżewski, Andrzej
  organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics
– sequence: 2
  givenname: Sebastian
  surname: Cygert
  fullname: Cygert, Sebastian
  organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics
– sequence: 3
  givenname: Karolina
  surname: Marciniuk
  fullname: Marciniuk, Karolina
  organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics
– sequence: 4
  givenname: Maciej
  surname: Szczodrak
  fullname: Szczodrak, Maciej
  organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics
– sequence: 5
  givenname: Arkadiusz
  surname: Harasimiuk
  fullname: Harasimiuk, Arkadiusz
  organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics
– sequence: 6
  givenname: Piotr
  orcidid: 0000-0003-0288-6178
  surname: Odya
  fullname: Odya, Piotr
  organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics
– sequence: 7
  givenname: Marina
  surname: Galanina
  fullname: Galanina, Marina
  organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics
– sequence: 8
  givenname: Piotr
  surname: Szczuko
  fullname: Szczuko, Piotr
  organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics
– sequence: 9
  givenname: Bożena
  orcidid: 0000-0001-6288-2908
  surname: Kostek
  fullname: Kostek, Bożena
  email: bozena.kostek@pg.edu.pl
  organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics
– sequence: 10
  givenname: Beata
  surname: Graff
  fullname: Graff, Beata
  organization: Medical University of Gdańsk, Department of Hypertension and Diabetology
– sequence: 11
  givenname: Dariusz
  surname: Szplit
  fullname: Szplit, Dariusz
  organization: University Clinical Center, Medical University of Gdańsk, Department of Quality in Healthcare
– sequence: 12
  givenname: Mariusz
  surname: Budzisz
  fullname: Budzisz, Mariusz
  organization: eTrust Co. Ltd
– sequence: 13
  givenname: Krzysztof
  surname: Narkiewicz
  fullname: Narkiewicz, Krzysztof
  organization: Medical University of Gdańsk, Department of Hypertension and Diabetology
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40819123$$D View this record in MEDLINE/PubMed
BookMark eNp9ksFu1DAQhi1UREvpC3BAkbj0Ehjb8To-odW2QKUikABxtLzOZONVYi92Uom3x7spS8sBWZat8Tf__BrPc3Lig0dCXlJ4Q4HXb1NFhZIlMFGCkHJR0ifkjIFgZVUt-MmD-ym5SGkLAJRXGYVn5LSCmirK-Bn5sSxWYdhF7NAnd4fFl9C71BWfsHHW9MXXHaLtiiszmoRj0YZYXPvOeOv8plhOYxjM6OwRv3J2zIHgX5CnrekTXtyf5-T7--tvq4_l7ecPN6vlbWn5QtDSKgAF1OStmGzbuoXaCuTGQK0QONIWGqaQSSkU42uBqs0LOSgFZk35ObmZdZtgtnoX3WDiLx2M04dAiBttYnbYo7ZMiVrSmnJqK46NQaos8gazCZRMZK13s9ZuWg_YWPRjNP0j0ccv3nV6E-507qSQClRWuLxXiOHnhGnUg0sW-954DFPSnFXAFFdyb_z1P-g2TNHnXh0oKuua1pl69dDS0cufD8wAmwEbQ0oR2yNCQe8HRc-DovOg6MOg6H1tPielDPsNxr-1_5P1G-eQvgg
Cites_doi 10.48550/arXiv.1912.06670
10.1109/JPROC.2020.3004555
10.1109/ACCESS.2023.3284682
10.48550/arXiv.2006.10220
10.1109/SLT54892.2023.10023141
10.1109/IALP.2018.8629259
10.48550/arXiv.2410.03751
10.21437/Interspeech.2018-40
10.48550/arXiv.2305.10615
10.23919/SPA59660.2023.10274442
10.1109/SLT48900.2021.9383535
10.30420/456164031
10.1109/ASRU46091.2019.9003822
10.1145/3630106.3658996
10.48550/arXiv.2204.00333
10.48550/arXiv.2410.01677
10.21437/Interspeech.2021-299
10.1007/978-3-030-01270-0_28
10.34808/0pg7-2b80
10.1093/jamia/ocae157
10.48550/arXiv.2311.18805
10.18653/v1/2025.acl-industry.79
10.48550/arXiv.2506.10896
10.34740/KAGGLE/DS/6698205
10.48550/arXiv.2308.04455
10.21437/Interspeech.2024-1508
10.1109/JSTSP.2022.3184480
10.1007/978-3-642-28267-6_1
10.1109/ICASSPW59220.2023.10193570
10.1038/s41597-022-01423-1
10.1162/tacl_a_00627
10.1007/978-3-319-66429-3_51
10.48550/arXiv.2404.05659
10.1109/ASRU57964.2023.10389719
10.1111/j.1467-9280.2006.01684.x
ContentType Journal Article
Copyright The Author(s) 2025
2025. The Author(s).
The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
The Author(s) 2025 2025
Copyright_xml – notice: The Author(s) 2025
– notice: 2025. The Author(s).
– notice: The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: The Author(s) 2025 2025
DBID C6C
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7X7
7XB
88E
8FE
8FH
8FI
8FJ
8FK
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
FYUFA
GHDGH
GNUQQ
HCIFZ
K9.
LK8
M0S
M1P
M7P
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
7X8
5PM
DOA
DOI 10.1038/s41597-025-05776-1
DatabaseName Springer Nature OA Free Journals
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
ProQuest Central (Corporate)
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One Community College
ProQuest Central Korea
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
ProQuest Biological Science Collection
Health & Medical Collection (Alumni Edition)
Medical Database
Biological Science Database
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest Publicly Available Content
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Health & Medical Research Collection
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
Natural Science Collection
ProQuest Central Korea
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
Biological Science Database
ProQuest SciTech Collection
ProQuest Hospital Collection (Alumni)
ProQuest Health & Medical Complete
ProQuest Medical Library
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList
Publicly Available Content Database

MEDLINE - Academic

MEDLINE
Database_xml – sequence: 1
  dbid: C6C
  name: Springer Nature OA Free Journals
  url: http://www.springeropen.com/
  sourceTypes: Publisher
– sequence: 2
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 3
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 4
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 5
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 2052-4463
EndPage 13
ExternalDocumentID oai_doaj_org_article_c2958718131c43edae19ce3dec90e725
PMC12357909
40819123
10_1038_s41597_025_05776_1
Genre Dataset
Journal Article
GeographicLocations Poland
GeographicLocations_xml – name: Poland
GrantInformation_xml – fundername: Ministry of Science and Higher Education | Narodowe Centrum Badań i Rozwoju (National Centre for Research and Development)
  grantid: INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; NFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022
  funderid: 501100005632
– fundername: Ministry of Science and Higher Education | Narodowe Centrum Badań i Rozwoju (National Centre for Research and Development)
  grantid: NFOSTRATEG4/0003/2022
– fundername: Ministry of Science and Higher Education | Narodowe Centrum Badań i Rozwoju (National Centre for Research and Development)
  grantid: INFOSTRATEG4/0003/2022
GroupedDBID 0R~
53G
5VS
7X7
88E
8FE
8FH
8FI
8FJ
AAJSJ
AASML
ABUWG
ACGFS
ACSFO
ADBBV
ADRAZ
AFKRA
AGHDO
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BBNVY
BCNDV
BENPR
BHPHI
BPHCQ
BVXVI
C6C
CCPQU
DIK
EBLON
EBS
EJD
FYUFA
GROUPED_DOAJ
HCIFZ
HMCUK
HYE
KQ8
LK8
M1P
M7P
M~E
NAO
OK1
PGMZT
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
RNT
RNTTT
RPM
SNYQT
UKHRP
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
PUEGO
3V.
7XB
8FK
AZQEC
DWQXO
GNUQQ
K9.
M48
PKEHL
PQEST
PQUKI
7X8
5PM
ID FETCH-LOGICAL-c3651-c900901a901927ff8f08c5e3aa089e03e1f0d29e2775923b5e9f9f9e30990ab13
IEDL.DBID DOA
ISSN 2052-4463
IngestDate Wed Aug 27 01:21:18 EDT 2025
Thu Aug 21 18:23:29 EDT 2025
Sun Aug 17 23:51:48 EDT 2025
Sat Aug 23 14:21:45 EDT 2025
Fri Aug 29 02:30:24 EDT 2025
Thu Aug 21 00:15:59 EDT 2025
Sun Aug 17 01:10:37 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License 2025. The Author(s).
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3651-c900901a901927ff8f08c5e3aa089e03e1f0d29e2775923b5e9f9f9e30990ab13
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0001-6288-2908
0000-0003-0288-6178
OpenAccessLink https://doaj.org/article/c2958718131c43edae19ce3dec90e725
PMID 40819123
PQID 3240178818
PQPubID 2041912
PageCount 13
ParticipantIDs doaj_primary_oai_doaj_org_article_c2958718131c43edae19ce3dec90e725
pubmedcentral_primary_oai_pubmedcentral_nih_gov_12357909
proquest_miscellaneous_3240293971
proquest_journals_3240178818
pubmed_primary_40819123
crossref_primary_10_1038_s41597_025_05776_1
springer_journals_10_1038_s41597_025_05776_1
PublicationCentury 2000
PublicationDate 20250816
PublicationDateYYYYMMDD 2025-08-16
PublicationDate_xml – month: 8
  year: 2025
  text: 20250816
  day: 16
PublicationDecade 2020
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationTitle Scientific data
PublicationTitleAbbrev Sci Data
PublicationTitleAlternate Sci Data
PublicationYear 2025
Publisher Nature Publishing Group UK
Nature Publishing Group
Nature Portfolio
Publisher_xml – name: Nature Publishing Group UK
– name: Nature Publishing Group
– name: Nature Portfolio
References TI Amosa (5776_CR14) 2023; 11
5776_CR30
5776_CR31
5776_CR12
5776_CR34
5776_CR13
5776_CR35
5776_CR10
5776_CR32
5776_CR11
5776_CR33
5776_CR16
5776_CR38
5776_CR17
5776_CR39
5776_CR36
K Rayner (5776_CR27) 2006; 17
5776_CR37
5776_CR18
5776_CR19
A Kugic (5776_CR15) 2024; 31
F Zhuang (5776_CR6) 2021; 109
J Zhao (5776_CR8) 2022; 16
5776_CR41
5776_CR20
5776_CR42
5776_CR40
5776_CR23
5776_CR21
5776_CR43
5776_CR22
5776_CR1
5776_CR2
5776_CR28
5776_CR3
5776_CR25
5776_CR4
5776_CR26
F Fareez (5776_CR24) 2022; 9
5776_CR29
5776_CR9
5776_CR5
5776_CR7
References_xml – ident: 5776_CR41
  doi: 10.48550/arXiv.1912.06670
– ident: 5776_CR33
– volume: 109
  start-page: 43
  year: 2021
  ident: 5776_CR6
  publication-title: Proc. IEEE
  doi: 10.1109/JPROC.2020.3004555
– volume: 11
  start-page: 59297
  year: 2023
  ident: 5776_CR14
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2023.3284682
– ident: 5776_CR28
  doi: 10.48550/arXiv.2006.10220
– ident: 5776_CR42
  doi: 10.1109/SLT54892.2023.10023141
– ident: 5776_CR22
  doi: 10.1109/IALP.2018.8629259
– ident: 5776_CR38
  doi: 10.48550/arXiv.2410.03751
– ident: 5776_CR11
  doi: 10.21437/Interspeech.2018-40
– ident: 5776_CR7
  doi: 10.48550/arXiv.2305.10615
– ident: 5776_CR31
  doi: 10.23919/SPA59660.2023.10274442
– ident: 5776_CR34
  doi: 10.1109/SLT48900.2021.9383535
– ident: 5776_CR39
– ident: 5776_CR9
  doi: 10.30420/456164031
– ident: 5776_CR12
  doi: 10.1109/ASRU46091.2019.9003822
– ident: 5776_CR18
– ident: 5776_CR3
  doi: 10.1145/3630106.3658996
– ident: 5776_CR16
  doi: 10.48550/arXiv.2204.00333
– ident: 5776_CR29
  doi: 10.48550/arXiv.2410.01677
– ident: 5776_CR36
  doi: 10.21437/Interspeech.2021-299
– ident: 5776_CR2
– ident: 5776_CR4
  doi: 10.1007/978-3-030-01270-0_28
– ident: 5776_CR32
  doi: 10.34808/0pg7-2b80
– ident: 5776_CR26
– volume: 31
  start-page: 2040
  year: 2024
  ident: 5776_CR15
  publication-title: Journal of the American Medical Informatics Association
  doi: 10.1093/jamia/ocae157
– ident: 5776_CR30
  doi: 10.48550/arXiv.2311.18805
– ident: 5776_CR19
  doi: 10.18653/v1/2025.acl-industry.79
– ident: 5776_CR21
  doi: 10.48550/arXiv.2506.10896
– ident: 5776_CR43
  doi: 10.34740/KAGGLE/DS/6698205
– ident: 5776_CR35
  doi: 10.48550/arXiv.2308.04455
– ident: 5776_CR37
  doi: 10.21437/Interspeech.2024-1508
– volume: 16
  start-page: 1227
  year: 2022
  ident: 5776_CR8
  publication-title: IEEE J. Sel. Top. Signal Process.
  doi: 10.1109/JSTSP.2022.3184480
– ident: 5776_CR40
  doi: 10.1007/978-3-642-28267-6_1
– ident: 5776_CR13
  doi: 10.1109/ICASSPW59220.2023.10193570
– volume: 9
  year: 2022
  ident: 5776_CR24
  publication-title: Scientific Data
  doi: 10.1038/s41597-022-01423-1
– ident: 5776_CR1
– ident: 5776_CR17
  doi: 10.1162/tacl_a_00627
– ident: 5776_CR10
  doi: 10.1007/978-3-319-66429-3_51
– ident: 5776_CR20
  doi: 10.48550/arXiv.2404.05659
– ident: 5776_CR5
– ident: 5776_CR23
– ident: 5776_CR25
  doi: 10.1109/ASRU57964.2023.10389719
– volume: 17
  start-page: 192
  year: 2006
  ident: 5776_CR27
  publication-title: Psychological Science
  doi: 10.1111/j.1467-9280.2006.01684.x
SSID ssj0001340570
Score 2.342008
Snippet Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized...
Abstract Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However,...
SourceID doaj
pubmedcentral
proquest
pubmed
crossref
springer
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 1436
SubjectTerms 692/700/1538
692/700/228
Automation
Data Descriptor
Datasets
Documentation
Human performance
Humanities and Social Sciences
Humans
Language
Linguistics
Medical Subject Headings-MeSH
multidisciplinary
Multilingualism
Natural language processing
Neural networks
Poland
Science
Science (multidisciplinary)
Speech
Speech recognition
Speech Recognition Software
Voice recognition
SummonAdditionalLinks – databaseName: Health & Medical Collection
  dbid: 7X7
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELagXLggyjNQkJE4gCCqHSexfUILbVUhwQUq9mYlzqTpJVk22f_PjOPd1fJSbrGj2B6P57Nn_A1jrysD4Y5MKgrvKYWZT-vG1mnuwTRoPjUUdN_5y9fy8ir_vCyW8cBtjGGV2zUxLNTN4OmM_JSI4yRxn5sPq58pZY0i72pMoXGb3SHqMgrp0ku9P2NRBEdEvCsjlDkdc_p_SjlcsUSXqTywR4G2_29Y88-Qyd_8psEcXdxn9yKO5ItZ8MfsFvQP2HHU1JG_iXTSbx-yHwtOSr-Gbo5V5xTyNnY8umj4txWA7_hZNaFBmziCWH7ed0TD0V_zxWYaAqnrrvrZjZ-994_Y1cX590-XaUynkHpVFjL1FvGUkJUlVKfb1rTC-AJUVQljQSiQrWgyC5nWBcK-ugDb4gOKfGdVLdVjdtQPPTxlvKoL_F56aWuTA-5RvM2hKXMIxg9Ewt5tB9WtZtYMF7zdyrhZBA5F4IIInEzYRxr3XU1ivA4vhvW1iwrkfIbyRUMqlfS5gqYCaT2oBrBboLMiYSdbqbmohqPbT5qEvdoVowKRV6TqYdjMdRDzWI3teDILedeSnAAT2vaEmQPxHzT1sKS_6QJJN91B1lbYhL3fzpR9u_49Fs_-343n7G4WJi9u_csTdjStN_ACUdFUvwxT_xefagkB
  priority: 102
  providerName: ProQuest
– databaseName: Springer Nature OA Free Journals
  dbid: C6C
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB1V5cIFUT5TCjISBxBE2HGc2Mdl26pCggtU9GYlzqTpJVvtZv8_M06y1UI5oNySsTTxeDLPmZlngHeVxdgjk0oTAh9hFtK6cXWaB7QNhc8SDfc7f_teXFzmX6_M1QFkcy9MLNqPlJbxMz1Xh33e5Dww5cNXCWGURUo7ngdM3c6relks7_6raIYgcuqPkdreM3QvBkWq_vvw5d9lkn_kSmMIOn8MjybsKBajtkdwgP0TOJq8cyPeTxTSH57Cr4VgR19jN9anCy5z23RiSsuIH7eIoROn1UBBbBAEXMVZ3zH1Rn8tFtthFYlcd-KnN2HM2D-Dy_Ozn8uLdDpCIQ26MCoNjjCUVJVjJFe2rW2lDQZ1VUnrUGpUrWwyh1lZGoJ6tUHX0oWa82VVrfRzOOxXPb4EUdWGxqugXG1zpH1JcDk2RY4x4KFM4OM8qf52ZMrwMcOtrR9N4MkEPprAqwS-8LzvJJnlOt5Yra_9ZHUfMmdoQ2eVViHX2FSoXEDdIL0WlplJ4GS2mp9cb-OZYVAxSb5N4O3uMTkNZ0KqHlfbUYZwjitJjxejkXea5AySKJ4nYPfMv6fq_pP-povE3Nx3XDrpEvg0r5Q7vf49F8f_J_4KHmZxMdP2vziBw2G9xdeEjIb6TXSF3xxoBtc
  priority: 102
  providerName: Springer Nature
Title A Comprehensive Polish Medical Speech Dataset for Enhancing Automatic Medical Dictation
URI https://link.springer.com/article/10.1038/s41597-025-05776-1
https://www.ncbi.nlm.nih.gov/pubmed/40819123
https://www.proquest.com/docview/3240178818
https://www.proquest.com/docview/3240293971
https://pubmed.ncbi.nlm.nih.gov/PMC12357909
https://doaj.org/article/c2958718131c43edae19ce3dec90e725
Volume 12
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwEB5BuXBBlGdoWRmJAwii2nES28d0u1W1EhWiVOzNSpxJ00u26mb_f8dOdunyEBeUQ6TYUUbz0HzOeD4DvC81hh6ZmGfO-SPMXFzVpopTh7qm9Kkw8_3OX87zs8t0vsgW94768nvCBnrgQXFHLjEZgXotpHCpxLpEYRzKGp3hqJLAXko5795iKvxdkR6I8LFLhkt9tEr9l2N_eiuNqDwWO5koEPb_CWX-vlnyl4ppSESnT-HJiCBZMUi-Dw-wewb7Y4yu2IeRSPrjc_hRMB_ut9gOu9SZ3-y2atlYnGEXN4iuZSdlT6msZwRf2axrPQFHd8WKdb8MdK7b6SfXbqjbv4DL09n36Vk8HqQQO5lnIiZVccr7pfF4TjWNbrh2Gcqy5NoglygaXicGE6UyAnxVhqahC6WvmpWVkC9hr1t2-BpYWWX0vnDCVDpFWp04k2KdpxjSHvIIPm2Uam8Gvgwb6txS28EElkxggwmsiODY630703NdhwfkAXb0APsvD4jgcGM1OwbgynqeQeGp8nUE77bDFDq-HlJ2uFwPcwjtGEVyvBqMvJUk9VCJsnoEesf8O6LujnTXbaDn9t3HynATweeNp_yU6--6ePM_dHEAj5Pg4joW-SHs9bdrfEuoqa8m8FAt1AQeFcX8Yk7349n512_0dJpPJyF47gC2ChaI
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB6V7QEuiPIMFDASSCCIGsfJxj4gtGW32tJ2haAVvZnEmTS9JMs-hPhT_EY8TrKr5XWrcoudyPbMeD57XgDPU4kuRsYPYmOohJnxs1xlfmRQ5lZ9JhhTvPPJpD8-iz6cx-db8LOLhSG3ym5PdBt1Xhu6I9-jxHGccp_Ld9NvPlWNIutqV0KjYYsj_PHdHtnmbw-Hlr4vwvBgdPp-7LdVBXwj-jH3jbKwIuCpInCTFIUsAmliFGkaSIWBQF4EeagwTJLYop8sRlXYBwWZkNKMC_vfa7AdCXuU6cH2_mjy8dP6VkcQAAra6JxAyL15RDP2qWqsbUn6Pt_QgK5QwN_Q7Z9Omr9Zap0CPLgFN1vkygYNq-3AFla3YafdG-bsZZvA-tUd-DJgtM3MsGy84xk52c1L1hqF2OcpoinZMF1YFbpgFjazUVVS4o_qgg2Wi9qlkV11H16axl_gLpxdyVLfg15VV_gAWJrF9ntuuMpkhPZUZFSEeT9Cp24x8OB1t6h62uTp0M6-LqRuSKAtCbQjgeYe7NO6r3pSjm33op5d6FZktQlVbI-TkgtuIoF5ilwZFDnaaWESxh7sdlTTreDP9ZpNPXi2arYiS3aYtMJ62fSxKEsldhz3GyKvRhIRRLNowgO5Qf6NoW62VJelSwtOUc-JCpQHbzpOWY_r32vx8P_TeArXx6cnx_r4cHL0CG6EjpGlz_u70FvMlvjYYrJF9qQVBAZfr1r2fgH5gEWC
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFD4aQ0K8IMY1MMBIIIEgahzHtf2AUKGrNgYTEkz0zSTOybqXtPQixF_j1-HjJK3K7W3KW-1Gts_tc84N4EmuMeTIxIl0jlqYubgoTRFnDnXpzadCSfnOH076h6fZu7Ec78DPLheGwio7nRgUdTl19I28R4XjONU-172qDYv4OBy9nn2LqYMUeVq7dhoNixzjj-_--rZ4dTT0tH6apqODz28P47bDQOxEX_LYGQ8xEp4bAjqqqnSVaCdR5HmiDSYCeZWUqcFUKemRUCHRVP5BQe6kvODCv_cSXFZCcpIxNVab7zuCoFDS5ukkQvcWGe09pv6xfkT1Y75lC0PLgL_h3D_DNX_z2QZTOLoO11oMywYN0-3BDtY3YK_VEgv2rC1l_fwmfBkwUjhznDRx8ozC7RYT1rqH2KcZopuwYb70xnTJPIBmB_WESoDUZ2ywWk5DQdn19OG5ayIHbsHphRz0bditpzXeBZYX0v-fO24KnaG_HzmTYdnPMBheTCJ40R2qnTUVO2zwtAttGxJYTwIbSGB5BG_o3Nczqdp2-GE6P7Ot8FqXGukvlpoL7jKBZY7cOBQl-m2hSmUE-x3VbKsCFnbDsBE8Xg974SWPTF7jdNXM8XjLKL-OOw2R1yvJCKx5XBGB3iL_1lK3R-rzSSgQTvnPyiQmgpcdp2zW9e-zuPf_bTyCK17i7Pujk-P7cDUNfKxj3t-H3eV8hQ88OFsWD4MUMPh60WL3C7a0SFI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Comprehensive+Polish+Medical+Speech+Dataset+for+Enhancing+Automatic+Medical+Dictation&rft.jtitle=Scientific+data&rft.au=Andrzej+Czy%C5%BCewski&rft.au=Sebastian+Cygert&rft.au=Karolina+Marciniuk&rft.au=Maciej+Szczodrak&rft.date=2025-08-16&rft.pub=Nature+Portfolio&rft.eissn=2052-4463&rft.volume=12&rft.issue=1&rft.spage=1&rft.epage=13&rft_id=info:doi/10.1038%2Fs41597-025-05776-1&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_c2958718131c43edae19ce3dec90e725
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2052-4463&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2052-4463&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2052-4463&client=summon