A Comprehensive Polish Medical Speech Dataset for Enhancing Automatic Medical Dictation
Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, col...
Saved in:
Published in | Scientific data Vol. 12; no. 1; pp. 1436 - 13 |
---|---|
Main Authors | , , , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
London
Nature Publishing Group UK
16.08.2025
Nature Publishing Group Nature Portfolio |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition. |
---|---|
AbstractList | Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition. Abstract Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition. Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition.Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized domains like medical speech recognition still benefit from tailored datasets. We present ADMEDVOICE, a novel Polish medical speech dataset, collected using a high-quality text corpus and diverse recording conditions to reflect real-world scenarios. The dataset includes domain-specific vocabulary such as drug names and illnesses, with nearly 15 hours of audio from 28 speakers, including noisy environments. Additionally, we release two enhanced versions: one anonymized for privacy-sensitive use and another synthetic version created via text-to-speech, totaling over 83 hours and nearly 50,000 samples. Evaluating the Whisper model, we observe a 24.03 WER on our test set. Fine-tuning with human recordings reduces WER to 15.47, and incorporating anonymized and synthetic data further lowers it to 13.91. We open-source the dataset, fine-tuned model, and code on Kaggle to support continued research in medical speech recognition. |
ArticleNumber | 1436 |
Author | Budzisz, Mariusz Narkiewicz, Krzysztof Cygert, Sebastian Graff, Beata Czyżewski, Andrzej Harasimiuk, Arkadiusz Odya, Piotr Marciniuk, Karolina Galanina, Marina Szczuko, Piotr Kostek, Bożena Szplit, Dariusz Szczodrak, Maciej |
Author_xml | – sequence: 1 givenname: Andrzej surname: Czyżewski fullname: Czyżewski, Andrzej organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics – sequence: 2 givenname: Sebastian surname: Cygert fullname: Cygert, Sebastian organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics – sequence: 3 givenname: Karolina surname: Marciniuk fullname: Marciniuk, Karolina organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics – sequence: 4 givenname: Maciej surname: Szczodrak fullname: Szczodrak, Maciej organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics – sequence: 5 givenname: Arkadiusz surname: Harasimiuk fullname: Harasimiuk, Arkadiusz organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics – sequence: 6 givenname: Piotr orcidid: 0000-0003-0288-6178 surname: Odya fullname: Odya, Piotr organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics – sequence: 7 givenname: Marina surname: Galanina fullname: Galanina, Marina organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics – sequence: 8 givenname: Piotr surname: Szczuko fullname: Szczuko, Piotr organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics – sequence: 9 givenname: Bożena orcidid: 0000-0001-6288-2908 surname: Kostek fullname: Kostek, Bożena email: bozena.kostek@pg.edu.pl organization: Gdańsk University of Technology, Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics – sequence: 10 givenname: Beata surname: Graff fullname: Graff, Beata organization: Medical University of Gdańsk, Department of Hypertension and Diabetology – sequence: 11 givenname: Dariusz surname: Szplit fullname: Szplit, Dariusz organization: University Clinical Center, Medical University of Gdańsk, Department of Quality in Healthcare – sequence: 12 givenname: Mariusz surname: Budzisz fullname: Budzisz, Mariusz organization: eTrust Co. Ltd – sequence: 13 givenname: Krzysztof surname: Narkiewicz fullname: Narkiewicz, Krzysztof organization: Medical University of Gdańsk, Department of Hypertension and Diabetology |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40819123$$D View this record in MEDLINE/PubMed |
BookMark | eNp9ksFu1DAQhi1UREvpC3BAkbj0Ehjb8To-odW2QKUikABxtLzOZONVYi92Uom3x7spS8sBWZat8Tf__BrPc3Lig0dCXlJ4Q4HXb1NFhZIlMFGCkHJR0ifkjIFgZVUt-MmD-ym5SGkLAJRXGYVn5LSCmirK-Bn5sSxWYdhF7NAnd4fFl9C71BWfsHHW9MXXHaLtiiszmoRj0YZYXPvOeOv8plhOYxjM6OwRv3J2zIHgX5CnrekTXtyf5-T7--tvq4_l7ecPN6vlbWn5QtDSKgAF1OStmGzbuoXaCuTGQK0QONIWGqaQSSkU42uBqs0LOSgFZk35ObmZdZtgtnoX3WDiLx2M04dAiBttYnbYo7ZMiVrSmnJqK46NQaos8gazCZRMZK13s9ZuWg_YWPRjNP0j0ccv3nV6E-507qSQClRWuLxXiOHnhGnUg0sW-954DFPSnFXAFFdyb_z1P-g2TNHnXh0oKuua1pl69dDS0cufD8wAmwEbQ0oR2yNCQe8HRc-DovOg6MOg6H1tPielDPsNxr-1_5P1G-eQvgg |
Cites_doi | 10.48550/arXiv.1912.06670 10.1109/JPROC.2020.3004555 10.1109/ACCESS.2023.3284682 10.48550/arXiv.2006.10220 10.1109/SLT54892.2023.10023141 10.1109/IALP.2018.8629259 10.48550/arXiv.2410.03751 10.21437/Interspeech.2018-40 10.48550/arXiv.2305.10615 10.23919/SPA59660.2023.10274442 10.1109/SLT48900.2021.9383535 10.30420/456164031 10.1109/ASRU46091.2019.9003822 10.1145/3630106.3658996 10.48550/arXiv.2204.00333 10.48550/arXiv.2410.01677 10.21437/Interspeech.2021-299 10.1007/978-3-030-01270-0_28 10.34808/0pg7-2b80 10.1093/jamia/ocae157 10.48550/arXiv.2311.18805 10.18653/v1/2025.acl-industry.79 10.48550/arXiv.2506.10896 10.34740/KAGGLE/DS/6698205 10.48550/arXiv.2308.04455 10.21437/Interspeech.2024-1508 10.1109/JSTSP.2022.3184480 10.1007/978-3-642-28267-6_1 10.1109/ICASSPW59220.2023.10193570 10.1038/s41597-022-01423-1 10.1162/tacl_a_00627 10.1007/978-3-319-66429-3_51 10.48550/arXiv.2404.05659 10.1109/ASRU57964.2023.10389719 10.1111/j.1467-9280.2006.01684.x |
ContentType | Journal Article |
Copyright | The Author(s) 2025 2025. The Author(s). The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. The Author(s) 2025 2025 |
Copyright_xml | – notice: The Author(s) 2025 – notice: 2025. The Author(s). – notice: The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: The Author(s) 2025 2025 |
DBID | C6C AAYXX CITATION CGR CUY CVF ECM EIF NPM 3V. 7X7 7XB 88E 8FE 8FH 8FI 8FJ 8FK ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO FYUFA GHDGH GNUQQ HCIFZ K9. LK8 M0S M1P M7P PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI 7X8 5PM DOA |
DOI | 10.1038/s41597-025-05776-1 |
DatabaseName | Springer Nature OA Free Journals CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Central (Corporate) Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) ProQuest SciTech Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials Biological Science Collection ProQuest Central Natural Science Collection ProQuest One Community College ProQuest Central Korea Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student SciTech Premium Collection ProQuest Health & Medical Complete (Alumni) ProQuest Biological Science Collection Health & Medical Collection (Alumni Edition) Medical Database Biological Science Database ProQuest Central Premium ProQuest One Academic (New) ProQuest Publicly Available Content ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database ProQuest Central Student ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Central ProQuest One Applied & Life Sciences ProQuest Health & Medical Research Collection Health Research Premium Collection Health and Medicine Complete (Alumni Edition) Natural Science Collection ProQuest Central Korea Health & Medical Research Collection Biological Science Collection ProQuest Central (New) ProQuest Medical Library (Alumni) ProQuest Biological Science Collection ProQuest One Academic Eastern Edition ProQuest Hospital Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest SciTech Collection ProQuest Hospital Collection (Alumni) ProQuest Health & Medical Complete ProQuest Medical Library ProQuest One Academic UKI Edition ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | Publicly Available Content Database MEDLINE - Academic MEDLINE |
Database_xml | – sequence: 1 dbid: C6C name: Springer Nature OA Free Journals url: http://www.springeropen.com/ sourceTypes: Publisher – sequence: 2 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 3 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 4 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 5 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Sciences (General) |
EISSN | 2052-4463 |
EndPage | 13 |
ExternalDocumentID | oai_doaj_org_article_c2958718131c43edae19ce3dec90e725 PMC12357909 40819123 10_1038_s41597_025_05776_1 |
Genre | Dataset Journal Article |
GeographicLocations | Poland |
GeographicLocations_xml | – name: Poland |
GrantInformation_xml | – fundername: Ministry of Science and Higher Education | Narodowe Centrum Badań i Rozwoju (National Centre for Research and Development) grantid: INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; NFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022; INFOSTRATEG4/0003/2022 funderid: 501100005632 – fundername: Ministry of Science and Higher Education | Narodowe Centrum Badań i Rozwoju (National Centre for Research and Development) grantid: NFOSTRATEG4/0003/2022 – fundername: Ministry of Science and Higher Education | Narodowe Centrum Badań i Rozwoju (National Centre for Research and Development) grantid: INFOSTRATEG4/0003/2022 |
GroupedDBID | 0R~ 53G 5VS 7X7 88E 8FE 8FH 8FI 8FJ AAJSJ AASML ABUWG ACGFS ACSFO ADBBV ADRAZ AFKRA AGHDO ALMA_UNASSIGNED_HOLDINGS AOIJS BBNVY BCNDV BENPR BHPHI BPHCQ BVXVI C6C CCPQU DIK EBLON EBS EJD FYUFA GROUPED_DOAJ HCIFZ HMCUK HYE KQ8 LK8 M1P M7P M~E NAO OK1 PGMZT PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO RNT RNTTT RPM SNYQT UKHRP AAYXX CITATION CGR CUY CVF ECM EIF NPM PUEGO 3V. 7XB 8FK AZQEC DWQXO GNUQQ K9. M48 PKEHL PQEST PQUKI 7X8 5PM |
ID | FETCH-LOGICAL-c3651-c900901a901927ff8f08c5e3aa089e03e1f0d29e2775923b5e9f9f9e30990ab13 |
IEDL.DBID | DOA |
ISSN | 2052-4463 |
IngestDate | Wed Aug 27 01:21:18 EDT 2025 Thu Aug 21 18:23:29 EDT 2025 Sun Aug 17 23:51:48 EDT 2025 Sat Aug 23 14:21:45 EDT 2025 Fri Aug 29 02:30:24 EDT 2025 Thu Aug 21 00:15:59 EDT 2025 Sun Aug 17 01:10:37 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | 2025. The Author(s). Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c3651-c900901a901927ff8f08c5e3aa089e03e1f0d29e2775923b5e9f9f9e30990ab13 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ORCID | 0000-0001-6288-2908 0000-0003-0288-6178 |
OpenAccessLink | https://doaj.org/article/c2958718131c43edae19ce3dec90e725 |
PMID | 40819123 |
PQID | 3240178818 |
PQPubID | 2041912 |
PageCount | 13 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_c2958718131c43edae19ce3dec90e725 pubmedcentral_primary_oai_pubmedcentral_nih_gov_12357909 proquest_miscellaneous_3240293971 proquest_journals_3240178818 pubmed_primary_40819123 crossref_primary_10_1038_s41597_025_05776_1 springer_journals_10_1038_s41597_025_05776_1 |
PublicationCentury | 2000 |
PublicationDate | 20250816 |
PublicationDateYYYYMMDD | 2025-08-16 |
PublicationDate_xml | – month: 8 year: 2025 text: 20250816 day: 16 |
PublicationDecade | 2020 |
PublicationPlace | London |
PublicationPlace_xml | – name: London – name: England |
PublicationTitle | Scientific data |
PublicationTitleAbbrev | Sci Data |
PublicationTitleAlternate | Sci Data |
PublicationYear | 2025 |
Publisher | Nature Publishing Group UK Nature Publishing Group Nature Portfolio |
Publisher_xml | – name: Nature Publishing Group UK – name: Nature Publishing Group – name: Nature Portfolio |
References | TI Amosa (5776_CR14) 2023; 11 5776_CR30 5776_CR31 5776_CR12 5776_CR34 5776_CR13 5776_CR35 5776_CR10 5776_CR32 5776_CR11 5776_CR33 5776_CR16 5776_CR38 5776_CR17 5776_CR39 5776_CR36 K Rayner (5776_CR27) 2006; 17 5776_CR37 5776_CR18 5776_CR19 A Kugic (5776_CR15) 2024; 31 F Zhuang (5776_CR6) 2021; 109 J Zhao (5776_CR8) 2022; 16 5776_CR41 5776_CR20 5776_CR42 5776_CR40 5776_CR23 5776_CR21 5776_CR43 5776_CR22 5776_CR1 5776_CR2 5776_CR28 5776_CR3 5776_CR25 5776_CR4 5776_CR26 F Fareez (5776_CR24) 2022; 9 5776_CR29 5776_CR9 5776_CR5 5776_CR7 |
References_xml | – ident: 5776_CR41 doi: 10.48550/arXiv.1912.06670 – ident: 5776_CR33 – volume: 109 start-page: 43 year: 2021 ident: 5776_CR6 publication-title: Proc. IEEE doi: 10.1109/JPROC.2020.3004555 – volume: 11 start-page: 59297 year: 2023 ident: 5776_CR14 publication-title: IEEE Access doi: 10.1109/ACCESS.2023.3284682 – ident: 5776_CR28 doi: 10.48550/arXiv.2006.10220 – ident: 5776_CR42 doi: 10.1109/SLT54892.2023.10023141 – ident: 5776_CR22 doi: 10.1109/IALP.2018.8629259 – ident: 5776_CR38 doi: 10.48550/arXiv.2410.03751 – ident: 5776_CR11 doi: 10.21437/Interspeech.2018-40 – ident: 5776_CR7 doi: 10.48550/arXiv.2305.10615 – ident: 5776_CR31 doi: 10.23919/SPA59660.2023.10274442 – ident: 5776_CR34 doi: 10.1109/SLT48900.2021.9383535 – ident: 5776_CR39 – ident: 5776_CR9 doi: 10.30420/456164031 – ident: 5776_CR12 doi: 10.1109/ASRU46091.2019.9003822 – ident: 5776_CR18 – ident: 5776_CR3 doi: 10.1145/3630106.3658996 – ident: 5776_CR16 doi: 10.48550/arXiv.2204.00333 – ident: 5776_CR29 doi: 10.48550/arXiv.2410.01677 – ident: 5776_CR36 doi: 10.21437/Interspeech.2021-299 – ident: 5776_CR2 – ident: 5776_CR4 doi: 10.1007/978-3-030-01270-0_28 – ident: 5776_CR32 doi: 10.34808/0pg7-2b80 – ident: 5776_CR26 – volume: 31 start-page: 2040 year: 2024 ident: 5776_CR15 publication-title: Journal of the American Medical Informatics Association doi: 10.1093/jamia/ocae157 – ident: 5776_CR30 doi: 10.48550/arXiv.2311.18805 – ident: 5776_CR19 doi: 10.18653/v1/2025.acl-industry.79 – ident: 5776_CR21 doi: 10.48550/arXiv.2506.10896 – ident: 5776_CR43 doi: 10.34740/KAGGLE/DS/6698205 – ident: 5776_CR35 doi: 10.48550/arXiv.2308.04455 – ident: 5776_CR37 doi: 10.21437/Interspeech.2024-1508 – volume: 16 start-page: 1227 year: 2022 ident: 5776_CR8 publication-title: IEEE J. Sel. Top. Signal Process. doi: 10.1109/JSTSP.2022.3184480 – ident: 5776_CR40 doi: 10.1007/978-3-642-28267-6_1 – ident: 5776_CR13 doi: 10.1109/ICASSPW59220.2023.10193570 – volume: 9 year: 2022 ident: 5776_CR24 publication-title: Scientific Data doi: 10.1038/s41597-022-01423-1 – ident: 5776_CR1 – ident: 5776_CR17 doi: 10.1162/tacl_a_00627 – ident: 5776_CR10 doi: 10.1007/978-3-319-66429-3_51 – ident: 5776_CR20 doi: 10.48550/arXiv.2404.05659 – ident: 5776_CR5 – ident: 5776_CR23 – ident: 5776_CR25 doi: 10.1109/ASRU57964.2023.10389719 – volume: 17 start-page: 192 year: 2006 ident: 5776_CR27 publication-title: Psychological Science doi: 10.1111/j.1467-9280.2006.01684.x |
SSID | ssj0001340570 |
Score | 2.342008 |
Snippet | Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However, specialized... Abstract Pre-trained models have become widely adopted for their strong zero-shot performance, often minimizing the need for task-specific data. However,... |
SourceID | doaj pubmedcentral proquest pubmed crossref springer |
SourceType | Open Website Open Access Repository Aggregation Database Index Database Publisher |
StartPage | 1436 |
SubjectTerms | 692/700/1538 692/700/228 Automation Data Descriptor Datasets Documentation Human performance Humanities and Social Sciences Humans Language Linguistics Medical Subject Headings-MeSH multidisciplinary Multilingualism Natural language processing Neural networks Poland Science Science (multidisciplinary) Speech Speech recognition Speech Recognition Software Voice recognition |
SummonAdditionalLinks | – databaseName: Health & Medical Collection dbid: 7X7 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELagXLggyjNQkJE4gCCqHSexfUILbVUhwQUq9mYlzqTpJVk22f_PjOPd1fJSbrGj2B6P57Nn_A1jrysD4Y5MKgrvKYWZT-vG1mnuwTRoPjUUdN_5y9fy8ir_vCyW8cBtjGGV2zUxLNTN4OmM_JSI4yRxn5sPq58pZY0i72pMoXGb3SHqMgrp0ku9P2NRBEdEvCsjlDkdc_p_SjlcsUSXqTywR4G2_29Y88-Qyd_8psEcXdxn9yKO5ItZ8MfsFvQP2HHU1JG_iXTSbx-yHwtOSr-Gbo5V5xTyNnY8umj4txWA7_hZNaFBmziCWH7ed0TD0V_zxWYaAqnrrvrZjZ-994_Y1cX590-XaUynkHpVFjL1FvGUkJUlVKfb1rTC-AJUVQljQSiQrWgyC5nWBcK-ugDb4gOKfGdVLdVjdtQPPTxlvKoL_F56aWuTA-5RvM2hKXMIxg9Ewt5tB9WtZtYMF7zdyrhZBA5F4IIInEzYRxr3XU1ivA4vhvW1iwrkfIbyRUMqlfS5gqYCaT2oBrBboLMiYSdbqbmohqPbT5qEvdoVowKRV6TqYdjMdRDzWI3teDILedeSnAAT2vaEmQPxHzT1sKS_6QJJN91B1lbYhL3fzpR9u_49Fs_-343n7G4WJi9u_csTdjStN_ACUdFUvwxT_xefagkB priority: 102 providerName: ProQuest – databaseName: Springer Nature OA Free Journals dbid: C6C link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB1V5cIFUT5TCjISBxBE2HGc2Mdl26pCggtU9GYlzqTpJVvtZv8_M06y1UI5oNySsTTxeDLPmZlngHeVxdgjk0oTAh9hFtK6cXWaB7QNhc8SDfc7f_teXFzmX6_M1QFkcy9MLNqPlJbxMz1Xh33e5Dww5cNXCWGURUo7ngdM3c6relks7_6raIYgcuqPkdreM3QvBkWq_vvw5d9lkn_kSmMIOn8MjybsKBajtkdwgP0TOJq8cyPeTxTSH57Cr4VgR19jN9anCy5z23RiSsuIH7eIoROn1UBBbBAEXMVZ3zH1Rn8tFtthFYlcd-KnN2HM2D-Dy_Ozn8uLdDpCIQ26MCoNjjCUVJVjJFe2rW2lDQZ1VUnrUGpUrWwyh1lZGoJ6tUHX0oWa82VVrfRzOOxXPb4EUdWGxqugXG1zpH1JcDk2RY4x4KFM4OM8qf52ZMrwMcOtrR9N4MkEPprAqwS-8LzvJJnlOt5Yra_9ZHUfMmdoQ2eVViHX2FSoXEDdIL0WlplJ4GS2mp9cb-OZYVAxSb5N4O3uMTkNZ0KqHlfbUYZwjitJjxejkXea5AySKJ4nYPfMv6fq_pP-povE3Nx3XDrpEvg0r5Q7vf49F8f_J_4KHmZxMdP2vziBw2G9xdeEjIb6TXSF3xxoBtc priority: 102 providerName: Springer Nature |
Title | A Comprehensive Polish Medical Speech Dataset for Enhancing Automatic Medical Dictation |
URI | https://link.springer.com/article/10.1038/s41597-025-05776-1 https://www.ncbi.nlm.nih.gov/pubmed/40819123 https://www.proquest.com/docview/3240178818 https://www.proquest.com/docview/3240293971 https://pubmed.ncbi.nlm.nih.gov/PMC12357909 https://doaj.org/article/c2958718131c43edae19ce3dec90e725 |
Volume | 12 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwEB5BuXBBlGdoWRmJAwii2nES28d0u1W1EhWiVOzNSpxJ00u26mb_f8dOdunyEBeUQ6TYUUbz0HzOeD4DvC81hh6ZmGfO-SPMXFzVpopTh7qm9Kkw8_3OX87zs8t0vsgW94768nvCBnrgQXFHLjEZgXotpHCpxLpEYRzKGp3hqJLAXko5795iKvxdkR6I8LFLhkt9tEr9l2N_eiuNqDwWO5koEPb_CWX-vlnyl4ppSESnT-HJiCBZMUi-Dw-wewb7Y4yu2IeRSPrjc_hRMB_ut9gOu9SZ3-y2atlYnGEXN4iuZSdlT6msZwRf2axrPQFHd8WKdb8MdK7b6SfXbqjbv4DL09n36Vk8HqQQO5lnIiZVccr7pfF4TjWNbrh2Gcqy5NoglygaXicGE6UyAnxVhqahC6WvmpWVkC9hr1t2-BpYWWX0vnDCVDpFWp04k2KdpxjSHvIIPm2Uam8Gvgwb6txS28EElkxggwmsiODY630703NdhwfkAXb0APsvD4jgcGM1OwbgynqeQeGp8nUE77bDFDq-HlJ2uFwPcwjtGEVyvBqMvJUk9VCJsnoEesf8O6LujnTXbaDn9t3HynATweeNp_yU6--6ePM_dHEAj5Pg4joW-SHs9bdrfEuoqa8m8FAt1AQeFcX8Yk7349n512_0dJpPJyF47gC2ChaI |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB6V7QEuiPIMFDASSCCIGsfJxj4gtGW32tJ2haAVvZnEmTS9JMs-hPhT_EY8TrKr5XWrcoudyPbMeD57XgDPU4kuRsYPYmOohJnxs1xlfmRQ5lZ9JhhTvPPJpD8-iz6cx-db8LOLhSG3ym5PdBt1Xhu6I9-jxHGccp_Ld9NvPlWNIutqV0KjYYsj_PHdHtnmbw-Hlr4vwvBgdPp-7LdVBXwj-jH3jbKwIuCpInCTFIUsAmliFGkaSIWBQF4EeagwTJLYop8sRlXYBwWZkNKMC_vfa7AdCXuU6cH2_mjy8dP6VkcQAAra6JxAyL15RDP2qWqsbUn6Pt_QgK5QwN_Q7Z9Omr9Zap0CPLgFN1vkygYNq-3AFla3YafdG-bsZZvA-tUd-DJgtM3MsGy84xk52c1L1hqF2OcpoinZMF1YFbpgFjazUVVS4o_qgg2Wi9qlkV11H16axl_gLpxdyVLfg15VV_gAWJrF9ntuuMpkhPZUZFSEeT9Cp24x8OB1t6h62uTp0M6-LqRuSKAtCbQjgeYe7NO6r3pSjm33op5d6FZktQlVbI-TkgtuIoF5ilwZFDnaaWESxh7sdlTTreDP9ZpNPXi2arYiS3aYtMJ62fSxKEsldhz3GyKvRhIRRLNowgO5Qf6NoW62VJelSwtOUc-JCpQHbzpOWY_r32vx8P_TeArXx6cnx_r4cHL0CG6EjpGlz_u70FvMlvjYYrJF9qQVBAZfr1r2fgH5gEWC |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFD4aQ0K8IMY1MMBIIIEgahzHtf2AUKGrNgYTEkz0zSTOybqXtPQixF_j1-HjJK3K7W3KW-1Gts_tc84N4EmuMeTIxIl0jlqYubgoTRFnDnXpzadCSfnOH076h6fZu7Ec78DPLheGwio7nRgUdTl19I28R4XjONU-172qDYv4OBy9nn2LqYMUeVq7dhoNixzjj-_--rZ4dTT0tH6apqODz28P47bDQOxEX_LYGQ8xEp4bAjqqqnSVaCdR5HmiDSYCeZWUqcFUKemRUCHRVP5BQe6kvODCv_cSXFZCcpIxNVab7zuCoFDS5ukkQvcWGe09pv6xfkT1Y75lC0PLgL_h3D_DNX_z2QZTOLoO11oMywYN0-3BDtY3YK_VEgv2rC1l_fwmfBkwUjhznDRx8ozC7RYT1rqH2KcZopuwYb70xnTJPIBmB_WESoDUZ2ywWk5DQdn19OG5ayIHbsHphRz0bditpzXeBZYX0v-fO24KnaG_HzmTYdnPMBheTCJ40R2qnTUVO2zwtAttGxJYTwIbSGB5BG_o3Nczqdp2-GE6P7Ot8FqXGukvlpoL7jKBZY7cOBQl-m2hSmUE-x3VbKsCFnbDsBE8Xg974SWPTF7jdNXM8XjLKL-OOw2R1yvJCKx5XBGB3iL_1lK3R-rzSSgQTvnPyiQmgpcdp2zW9e-zuPf_bTyCK17i7Pujk-P7cDUNfKxj3t-H3eV8hQ88OFsWD4MUMPh60WL3C7a0SFI |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Comprehensive+Polish+Medical+Speech+Dataset+for+Enhancing+Automatic+Medical+Dictation&rft.jtitle=Scientific+data&rft.au=Andrzej+Czy%C5%BCewski&rft.au=Sebastian+Cygert&rft.au=Karolina+Marciniuk&rft.au=Maciej+Szczodrak&rft.date=2025-08-16&rft.pub=Nature+Portfolio&rft.eissn=2052-4463&rft.volume=12&rft.issue=1&rft.spage=1&rft.epage=13&rft_id=info:doi/10.1038%2Fs41597-025-05776-1&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_c2958718131c43edae19ce3dec90e725 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2052-4463&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2052-4463&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2052-4463&client=summon |