An improved data augmentation approach and its application in medical named entity recognition

Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical informatics and decision making Vol. 24; no. 1; pp. 221 - 13
Main Authors	Chen, Hongyu, Dan, Li, Lu, Yonghe, Chen, Minghong, Zhang, Jinxia
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 05.08.2024 BioMed Central BMC
Subjects	Algorithms Analysis Annotations Classification Computational linguistics Data augmentation Data mining Deep Learning Electronic health records Evaluation Humans Language Language processing Machine learning Medical named entity recognition Medical records Methods Natural language interfaces Natural Language Processing Neural networks Recognition Replacement augmentation Semantics Terminology Text categorization Text features Training China Deep learning Data augmentation Text features Replacement augmentation Medical named entity recognition
Online Access	Get full text

Cover

Loading…

Abstract	Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods-Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)-aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER.
AbstractList	Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods—Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)—aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER. Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods--Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)--aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER. Keywords: Data augmentation, Deep learning, Medical named entity recognition, Text features, Replacement augmentation Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods-Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)-aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER.Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods-Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)-aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER. Abstract Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods—Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)—aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER.
ArticleNumber	221
Audience	Academic
Author	Dan, Li Zhang, Jinxia Chen, Hongyu Chen, Minghong Lu, Yonghe
Author_xml	– sequence: 1 givenname: Hongyu surname: Chen fullname: Chen, Hongyu – sequence: 2 givenname: Li surname: Dan fullname: Dan, Li – sequence: 3 givenname: Yonghe surname: Lu fullname: Lu, Yonghe – sequence: 4 givenname: Minghong surname: Chen fullname: Chen, Minghong – sequence: 5 givenname: Jinxia surname: Zhang fullname: Zhang, Jinxia
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/39103849$$D View this record in MEDLINE/PubMed
BookMark	eNp9kktv1DAUhS1URB_wB1igSGzYTPG1ncReoVHFo1IlNrDFcmwn9SixByep2n_fO5MWOhVCURLn3nO_-CTnlBzFFD0hb4GeA8jq4whMAawoE3hWeL19QU5A1GxVKVEfPVkfk9Nx3FAKteTlK3LMFVAuhTohv9axCMM2pxvvCmcmU5i5G3yczBRSLMwWW8ZeFya6IkzjrtAHuzRDLAbv8KkvosFVgWNhuiuyt6mLYad5TV62ph_9m4f7Gfn55fOPi2-rq-9fLy_WVytbVmJaCc8ZZ20p2lY6znGbYJlgtgJhnbDWs5op6hoDRjVOVg0HWvtWNFA5A67mZ-Ry4bpkNnqbw2DynU4m6H0h5U6bPAXbe-0rK0oFDLiigjFoWsWcgrp1viotb5D1aWFt5wZdWXSVTX8APezEcK27dKMBOGWUciR8eCDk9Hv246SHMFrf9yb6NI-aU6lKEKJiKH3_TLpJc474rVCFu5RKleKvqjPoIMQ24YvtDqrXkuLflGzPOv-HCg_nh2AxO23A-sHAu6dO_1h8jAcK5CKwOY1j9q22YUkGkkOvgepdEvWSRI1J1Psk6lscZc9GH-n_GboHeVLfpg
CitedBy_id	crossref_primary_10_3390_app14198652 crossref_primary_10_1016_j_eswa_2025_126622 crossref_primary_10_1016_j_inffus_2025_103024
Cites_doi	10.1186/1472-6947-13-S1-S1 10.3390/s22165941 10.18653/v1/D15-1064 10.1093/bib/bbae067 10.1371/journal.pone.0194889 10.1177/0165551519860982 10.3390/app14010354 10.1007/s10586-017-1146-3 10.18653/v1/D18-1017 10.3390/app122010655 10.1007/s13042-023-02023-0 10.18653/v1/P18-1144 10.1111/coin.12599 10.1093/bib/bbac384 10.1093/bioinformatics/btad451 10.1016/j.neucom.2021.10.101 10.1007/978-3-319-96893-3_20 10.18653/v1/2021.findings-acl.84 10.1162/tacl_a_00104 10.18653/v1/P17-2090 10.1016/j.cosrev.2018.06.001 10.1109/ACCESS.2019.2942433 10.1109/WISA.2017.8 10.1186/s40537-021-00492-0 10.1016/j.jbi.2020.103395 10.21037/atm-22-3991 10.1016/j.neunet.2021.09.028 10.1016/j.ipm.2022.103041 10.1109/ACCESS.2023.3258179 10.3390/info11050255 10.1109/JBHI.2024.3383591 10.1007/s10489-023-04464-0 10.18653/v1/2020.wnut-1.26 10.3390/app10165711 10.1177/0165551521991037 10.18653/v1/P16-2025 10.1109/TCBB.2018.2868346 10.1145/3065386
ContentType	Journal Article
Copyright	2024. The Author(s). COPYRIGHT 2024 BioMed Central Ltd. 2024. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. The Author(s) 2024 2024
Copyright_xml	– notice: 2024. The Author(s). – notice: COPYRIGHT 2024 BioMed Central Ltd. – notice: 2024. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: The Author(s) 2024 2024
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM 3V. 7QO 7SC 7X7 7XB 88C 88E 8AL 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABUWG AFKRA ARAPS AZQEC BBNVY BENPR BGLVJ BHPHI CCPQU DWQXO FR3 FYUFA GHDGH GNUQQ HCIFZ JQ2 K7- K9. L7M LK8 L~C L~D M0N M0S M0T M1P M7P P5Z P62 P64 PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI PRINS Q9U 7X8 5PM DOA
DOI	10.1186/s12911-024-02624-x
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Central (Corporate) Biotechnology Research Abstracts Computer and Information Systems Abstracts Health & Medical Collection ProQuest Central (purchase pre-March 2016) Healthcare Administration Database (Alumni) Medical Database (Alumni Edition) Computing Database (Alumni Edition) Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection ProQuest Hospital Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials Biological Science Database ProQuest Central Technology Collection Natural Science Collection ProQuest One ProQuest Central Engineering Research Database Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Health & Medical Complete (Alumni) Advanced Technologies Database with Aerospace Biological Sciences Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database ProQuest Health & Medical Collection Healthcare Administration Database PML(ProQuest Medical Library) Biological Science Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Open Access Full Text
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database Computer Science Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest Central China ProQuest One Applied & Life Sciences Health Research Premium Collection Natural Science Collection Health & Medical Research Collection Biological Science Collection ProQuest Central (New) ProQuest Medical Library (Alumni) Advanced Technologies & Aerospace Collection ProQuest Biological Science Collection ProQuest One Academic Eastern Edition ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts ProQuest Health & Medical Complete ProQuest One Academic UKI Edition ProQuest Health Management (Alumni Edition) Engineering Research Database ProQuest One Academic ProQuest One Academic (New) Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Central ProQuest Health & Medical Research Collection Biotechnology Research Abstracts Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Advanced Technologies Database with Aerospace ProQuest Computing ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest Health Management ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest Medical Library ProQuest Central (Alumni) MEDLINE - Academic
DatabaseTitleList	Publicly Available Content Database MEDLINE MEDLINE - Academic
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 4 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine
EISSN	1472-6947
EndPage	13
ExternalDocumentID	oai_doaj_org_article_e6c4591213904221bf92d917fde65c3b PMC11302003 A803918262 39103849 10_1186_s12911_024_02624_x
Genre	Journal Article
GeographicLocations	China
GeographicLocations_xml	– name: China
GrantInformation_xml	– fundername: Guangzhou Science and Technology Planning Project grantid: 202002020036
GroupedDBID	--- 0R~ 23N 2WC 53G 5VS 6J9 6PF 7X7 88E 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC AASML AAWTL AAYXX ABDBF ABUWG ACGFO ACGFS ACIWK ACPRK ACUHS ADBBV ADUKV AENEX AFKRA AFPKN AFRAH AHBYD AHMBA AHYZX ALIPV ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS AQUVI ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C6C CCPQU CITATION CS3 DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBLON EBS EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 HCIFZ HMCUK HYE IAO IHR INH INR ITC K6V K7- KQ8 LK8 M0T M1P M48 M7P M~E O5R O5S OK1 OVT P2P P62 PGMZT PHGZM PHGZT PIMPY PQQKQ PROAC PSQYO RBZ RNS ROL RPM RSV SMD SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XSB CGR CUY CVF ECM EIF NPM PJZUB PPXIY PQGLB PMFND 3V. 7QO 7SC 7XB 8AL 8FD 8FK FR3 JQ2 K9. L7M L~C L~D M0N P64 PKEHL PQEST PQUKI PRINS Q9U 7X8 5PM PUEGO
ID	FETCH-LOGICAL-c564t-4e3232f54ff8d331781c242c614cd4cce27290dba1a9bd86b3107ef4b16da1d73
IEDL.DBID	M48
ISSN	1472-6947
IngestDate	Wed Aug 27 00:32:46 EDT 2025 Thu Aug 21 18:32:01 EDT 2025 Fri Jul 11 04:35:13 EDT 2025 Fri Jul 25 18:57:46 EDT 2025 Tue Jun 17 22:05:32 EDT 2025 Tue Jun 10 21:06:48 EDT 2025 Mon Jul 21 06:05:16 EDT 2025 Tue Jul 01 04:06:01 EDT 2025 Thu Apr 24 22:57:22 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	Deep learning Data augmentation Text features Replacement augmentation Medical named entity recognition
Language	English
License	2024. The Author(s). Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c564t-4e3232f54ff8d331781c242c614cd4cce27290dba1a9bd86b3107ef4b16da1d73
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
OpenAccessLink	http://journals.scholarsportal.info/openUrl.xqy?doi=10.1186/s12911-024-02624-x
PMID	39103849
PQID	3091289954
PQPubID	42572
PageCount	13
ParticipantIDs	doaj_primary_oai_doaj_org_article_e6c4591213904221bf92d917fde65c3b pubmedcentral_primary_oai_pubmedcentral_nih_gov_11302003 proquest_miscellaneous_3089514462 proquest_journals_3091289954 gale_infotracmisc_A803918262 gale_infotracacademiconefile_A803918262 pubmed_primary_39103849 crossref_citationtrail_10_1186_s12911_024_02624_x crossref_primary_10_1186_s12911_024_02624_x
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-08-05
PublicationDateYYYYMMDD	2024-08-05
PublicationDate_xml	– month: 08 year: 2024 text: 2024-08-05 day: 05
PublicationDecade	2020
PublicationPlace	England
PublicationPlace_xml	– name: England – name: London
PublicationTitle	BMC medical informatics and decision making
PublicationTitleAlternate	BMC Med Inform Decis Mak
PublicationYear	2024
Publisher	BioMed Central Ltd BioMed Central BMC
Publisher_xml	– name: BioMed Central Ltd – name: BioMed Central – name: BMC
References	S Kobayashi (2624_CR54) 2018 N Peng (2624_CR18) 2015 M Fadaee (2624_CR42) 2017 2624_CR55 2624_CR53 L Li (2624_CR69) 2020; 17 B Zhao (2624_CR71) 2023; 39 C Jia (2624_CR6) 2020 L Hu (2624_CR30) 2024; 25 J Wei (2624_CR50) 2019 A Krizhevsky (2624_CR4) 2017; 60 J Devlin (2624_CR23) 2019 C Mai (2624_CR27) 2022; 59 2624_CR46 2624_CR44 R Zhang (2624_CR60) 2020 HL Chieu (2624_CR1) 2002 Y Wang (2624_CR26) 2024; 15 R Grishman (2624_CR8) 1996 N Peng (2624_CR19) 2016 C Shorten (2624_CR48) 2021; 8 2624_CR43 B Parlak (2624_CR14) 2017 2624_CR40 B Parlak (2624_CR15) 2023; 39 P Cao (2624_CR21) 2018 B Parlak (2624_CR66) 2016 J Du (2624_CR49) 2021 Y Zhang (2624_CR20) 2018 P Liu (2624_CR64) 2022; 473 V Yadav (2624_CR5) 2019 B Shi (2624_CR52) 2022; 12 J Yoo (2624_CR45) 2023; 11 2624_CR35 2624_CR39 R Collobert (2624_CR11) 2011; 12 H Li (2624_CR13) 2014 GG Şahin (2624_CR56) 2018 JPC Chiu (2624_CR68) 2016; 4 2624_CR70 B Ding (2624_CR59) 2020 D Croce (2624_CR38) 2022 S Makridakis (2624_CR3) 2018; 13 A Kumar (2624_CR57) 2019 X Tian (2624_CR28) 2023; 53 Z Liu (2624_CR12) 2010 J He (2624_CR17) 2008 Y Guo (2624_CR29) 2023; 14 Y Li (2624_CR41) 2020; 11 B Parlak (2624_CR10) 2019; 46 B Parlak (2624_CR16) 2021; 49 A Goyal (2624_CR9) 2018; 29 R Chalapathy (2624_CR33) 2016 B Zhao (2624_CR37) 2024; 28 GA Levow (2624_CR2) 2006 2624_CR24 B Tang (2624_CR31) 2013; 13 K Liu (2624_CR36) 2017 BT Atmaja (2624_CR47) 2022; 22 S Song (2624_CR67) 2019; 22 Y Song (2624_CR65) 2018 Y Yang (2624_CR58) 2020 B Ji (2624_CR7) 2020; 104 2624_CR63 2624_CR62 Y Jin (2624_CR22) 2019; 7 Y Wang (2624_CR25) 2020; 10 Y Wu (2624_CR32) 2015; 216 A Wang (2624_CR51) 2022; 10 S Li (2624_CR61) 2022; 145 J Ravikumar (2624_CR34) 2021; 11 B Zhao (2624_CR72) 2022; 23
References_xml	– volume: 13 start-page: S1 issue: S1 year: 2013 ident: 2624_CR31 publication-title: BMC Med Inform Decis Making doi: 10.1186/1472-6947-13-S1-S1 – volume: 216 start-page: 624 year: 2015 ident: 2624_CR32 publication-title: PubMed – ident: 2624_CR46 – volume: 22 start-page: 5941 issue: 16 year: 2022 ident: 2624_CR47 publication-title: Sensors (Basel) doi: 10.3390/s22165941 – start-page: 452 volume-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 2 (Short Papers) year: 2018 ident: 2624_CR54 – volume-title: Proceedings of the 2019 Conference of the North year: 2019 ident: 2624_CR57 – start-page: 548 volume-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing year: 2015 ident: 2624_CR18 doi: 10.18653/v1/D15-1064 – volume: 25 start-page: bbae067 issue: 2 year: 2024 ident: 2624_CR30 publication-title: Brief Bioinform doi: 10.1093/bib/bbae067 – volume: 13 start-page: e0194889 issue: 3 year: 2018 ident: 2624_CR3 publication-title: PLoS One doi: 10.1371/journal.pone.0194889 – volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2624_CR56 – volume: 46 start-page: 648 issue: 5 year: 2019 ident: 2624_CR10 publication-title: J Inf Sci doi: 10.1177/0165551519860982 – volume-title: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies year: 2021 ident: 2624_CR49 – volume-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) year: 2018 ident: 2624_CR65 – volume: 14 start-page: 354 issue: 1 year: 2023 ident: 2624_CR29 publication-title: Appl Sci doi: 10.3390/app14010354 – start-page: 7 volume-title: Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP) year: 2016 ident: 2624_CR33 – volume-title: Proceedings of the 16th conference on Computational linguistics year: 1996 ident: 2624_CR8 – volume: 22 start-page: 5195 issue: S3 year: 2019 ident: 2624_CR67 publication-title: Cluster Comput doi: 10.1007/s10586-017-1146-3 – start-page: 182 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2624_CR21 doi: 10.18653/v1/D18-1017 – volume: 12 start-page: 10655 issue: 20 year: 2022 ident: 2624_CR52 publication-title: Appl Sci (Basel) doi: 10.3390/app122010655 – volume: 15 start-page: 2199 year: 2024 ident: 2624_CR26 publication-title: Int J Mach Learn Cybern doi: 10.1007/s13042-023-02023-0 – volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) year: 2020 ident: 2624_CR6 – start-page: 1554 volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) year: 2018 ident: 2624_CR20 doi: 10.18653/v1/P18-1144 – start-page: 269 volume-title: Studies in computational intelligence year: 2017 ident: 2624_CR14 – volume: 39 start-page: 900 issue: 5 year: 2023 ident: 2624_CR15 publication-title: Comput Intell doi: 10.1111/coin.12599 – volume: 23 start-page: bbac384 issue: 6 year: 2022 ident: 2624_CR72 publication-title: Brief Bioinform doi: 10.1093/bib/bbac384 – volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) year: 2020 ident: 2624_CR59 – volume: 39 start-page: btad451 issue: 8 year: 2023 ident: 2624_CR71 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btad451 – volume-title: Proceedings of the 19th international conference on computational linguistics year: 2002 ident: 2624_CR1 – volume: 473 start-page: 37 year: 2022 ident: 2624_CR64 publication-title: Neurocomputing doi: 10.1016/j.neucom.2021.10.101 – volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) year: 2020 ident: 2624_CR60 – ident: 2624_CR53 – ident: 2624_CR70 – ident: 2624_CR35 doi: 10.1007/978-3-319-96893-3_20 – ident: 2624_CR43 doi: 10.18653/v1/2021.findings-acl.84 – volume-title: Proceedings of 2016 11th Iberian Conference on Information Systems and Technologies (CISTI) year: 2016 ident: 2624_CR66 – volume: 4 start-page: 357 year: 2016 ident: 2624_CR68 publication-title: Trans Assoc Comput Linguist doi: 10.1162/tacl_a_00104 – volume: 11 start-page: 1689 issue: 2 year: 2021 ident: 2624_CR34 publication-title: Int J Power Electron Drive Syst Int J Electric Comput Eng – start-page: 567 volume-title: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) year: 2017 ident: 2624_CR42 doi: 10.18653/v1/P17-2090 – volume-title: A survey on recent advances in named entity recognition from deep learning models. arXiv [cs.CL] year: 2019 ident: 2624_CR5 – volume: 29 start-page: 21 year: 2018 ident: 2624_CR9 publication-title: Comput Sci Rev doi: 10.1016/j.cosrev.2018.06.001 – ident: 2624_CR44 – volume: 12 start-page: 2493 year: 2011 ident: 2624_CR11 publication-title: J Mach Learn Res – volume: 7 start-page: 136694 year: 2019 ident: 2624_CR22 publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2942433 – start-page: 105 volume-title: Proceedings of 2017 14th Web Information Systems and Applications Conference (WISA) year: 2017 ident: 2624_CR36 doi: 10.1109/WISA.2017.8 – volume: 8 start-page: 101 issue: 1 year: 2021 ident: 2624_CR48 publication-title: J. Big Data doi: 10.1186/s40537-021-00492-0 – volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) year: 2019 ident: 2624_CR50 – start-page: 4587 volume-title: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies year: 2022 ident: 2624_CR38 – ident: 2624_CR63 – ident: 2624_CR40 – volume: 104 start-page: 103395 year: 2020 ident: 2624_CR7 publication-title: J Biomed Inform doi: 10.1016/j.jbi.2020.103395 – volume: 10 start-page: 1061 issue: 19 year: 2022 ident: 2624_CR51 publication-title: Ann Transl Med doi: 10.21037/atm-22-3991 – volume: 145 start-page: 121 year: 2022 ident: 2624_CR61 publication-title: Neural Netw doi: 10.1016/j.neunet.2021.09.028 – ident: 2624_CR24 – volume: 59 start-page: 103041 issue: 5 year: 2022 ident: 2624_CR27 publication-title: Inf Process Manage doi: 10.1016/j.ipm.2022.103041 – volume: 11 start-page: 26393 year: 2023 ident: 2624_CR45 publication-title: IEEE Access doi: 10.1109/ACCESS.2023.3258179 – start-page: 108 volume-title: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing year: 2006 ident: 2624_CR2 – volume: 11 start-page: 255 issue: 5 year: 2020 ident: 2624_CR41 publication-title: Information doi: 10.3390/info11050255 – volume: 28 start-page: 4281 issue: 7 year: 2024 ident: 2624_CR37 publication-title: IEEE J Biomed Health Inform doi: 10.1109/JBHI.2024.3383591 – volume-title: Findings of the Association for Computational Linguistics: EMNLP 2020 year: 2020 ident: 2624_CR58 – volume-title: Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing year: 2008 ident: 2624_CR17 – volume: 53 start-page: 19028 issue: 16 year: 2023 ident: 2624_CR28 publication-title: Appl Intell doi: 10.1007/s10489-023-04464-0 – start-page: 2532 volume-title: Proceedings of International Conference on Language Resources and Evaluation year: 2014 ident: 2624_CR13 – start-page: 4171 volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies year: 2019 ident: 2624_CR23 – ident: 2624_CR62 – start-page: 634 volume-title: Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing year: 2010 ident: 2624_CR12 – ident: 2624_CR39 doi: 10.18653/v1/2020.wnut-1.26 – volume: 10 start-page: 5711 issue: 16 year: 2020 ident: 2624_CR25 publication-title: Appl Sci doi: 10.3390/app10165711 – volume: 49 start-page: 59 issue: 1 year: 2021 ident: 2624_CR16 publication-title: J Inf Sci doi: 10.1177/0165551521991037 – ident: 2624_CR55 – start-page: 149 volume-title: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) year: 2016 ident: 2624_CR19 doi: 10.18653/v1/P16-2025 – volume: 17 start-page: 841 issue: 3 year: 2020 ident: 2624_CR69 publication-title: IEEE/ACM Trans Comput Biol Bioinform doi: 10.1109/TCBB.2018.2868346 – volume: 60 start-page: 84 issue: 6 year: 2017 ident: 2624_CR4 publication-title: Commun ACM doi: 10.1145/3065386
SSID	ssj0017835
Score	2.390567
Snippet	Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is... Abstract Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is...
SourceID	doaj pubmedcentral proquest gale pubmed crossref
SourceType	Open Website Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	221
SubjectTerms	Algorithms Analysis Annotations Classification Computational linguistics Data augmentation Data mining Deep Learning Electronic health records Evaluation Humans Language Language processing Machine learning Medical named entity recognition Medical records Methods Natural language interfaces Natural Language Processing Neural networks Recognition Replacement augmentation Semantics Terminology Text categorization Text features Training
SummonAdditionalLinks	– databaseName: DOAJ Open Access Full Text dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3di9QwEA9yD-KLqOdHz1MiCD5IuaZNs-njnngcB-eTB_dkSCapLnhZcXfh7r-_mTQtWwR98aFL2aRpk5nJzGQmvzD2nrAmgwBVCpBVKbXVpQWnSkL4cdI3oFNE9_KLOr-SF9ft9d5RX5QTNsADDwN3EhTItiPgsY7gqoTru9qjj9H7oFpoHM2--L7RmcrxA1rPGLfIaHWyQa1GS4G1xEvh7-1MDSW0_j_n5D2lNE-Y3NNAZ0_Y42w68uXwyU_ZgxCfsYeXOTh-yL4tI1-lNYLgOWV-crv7fpP3FkU-oodzGz1fbTd8L3bNV5HfDCEbHi3e8bR9945P-UXr-JxdnX3--um8zMcnlNAquS1laNBc6lvZ99o3aCdoAaiQARUyeAkQajSsK--ssJ3zWjm09Bahl04ob4VfNC_YQVzH8Ipx2dmqgsb2vqLmFhY8mk5VS3B2fa19wcQ4mgYytjgdcfHTJB9DKzNQwCAFTKKAuS3Yx-mZXwOyxl9rnxKRppqEip3-QF4xmVfMv3ilYB-IxIZkFz8PbN6CgJ0kFCyz1ISXjw5XXbDjWU2UOZgXj0xissxvTINsSO5rKwv2biqmJymPLYb1jupoNGnRBccmXg48NXUJ264aLbuC6Rm3zfo8L4mrHwkRXFD4Gefno_8xSq_ZozpJii6r9pgdbH_vwhu0vLbubRKye1bzKOQ priority: 102 providerName: Directory of Open Access Journals – databaseName: ProQuest Technology Collection dbid: 8FG link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3Ni9QwFA-6gngRv62uEkHwIGGbNs2kJxnFcRHWkwt7MuSr64CbrjszsP73vpemdYqwh5bSpCHpy8v7zC-EvEWsycCdZNyJkgllFDPOSoYIP1b42qkU0T35Jo9Pxdez5iw73DY5rXJcE9NC7XuHPvKjGhpF46ARHy5_Mzw1CqOr-QiN2-QOB0mDKV1q9WWKIqBXY9woo-TRBmQbOgQrAZeE-_VMGCXM_v9X5j3RNE-b3JNDqwfkflYg6XKg-ENyK8RH5O5JDpE_Jj-Wka6TpyB4ivmf1OzOL_IOo0hHDHFqoqfr7YbuRbDpOtKLIXBDo4Enmjbx_qFTllEfn5DT1efvn45ZPkSBuUaKLROhBqWpa0TXKV-DtqC4A7HsQCw7L5wLFajXpbeGm9Z6JS3oe4vQCculN9wv6qfkIPYxPCdUtKYsXW06X2JzC-M8KFBlg6B2XaV8Qfj4N7XLCON40MUvnSwNJfVAAQ0U0IkC-rog76dvLgd8jRtrf0QiTTURGzu96K_OdWY1HaQTTYtQdS0CnHHbtZUHq7TzQTautgV5hyTWyMHQPWfyRgQYJGJh6aVC1Hwwu6qCHM5qAue5efE4SXTm_I3-N08L8mYqxi8xmy2Gfod1FCi2YIhDE8-GOTUNCdouayXagqjZbJuNeV4S1z8TLjjHIDSs0i9u7tdLcq9KPKBY2RySg-3VLrwCzWprXyf2-QuViiEa priority: 102 providerName: ProQuest
Title	An improved data augmentation approach and its application in medical named entity recognition
URI	https://www.ncbi.nlm.nih.gov/pubmed/39103849 https://www.proquest.com/docview/3091289954 https://www.proquest.com/docview/3089514462 https://pubmed.ncbi.nlm.nih.gov/PMC11302003 https://doaj.org/article/e6c4591213904221bf92d917fde65c3b
Volume	24
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3da9swED-6FsZexr7nrQsaDPYwvPlDluWHMZLRrgxSRlkg7GFCluQu0CpbPqD973en2FnMSh8igiULy3fnu9PpfgfwhrAmXWpEnBqexFxqGWtTi5gQfmpucyNDRHd8Kk4m_Ou0mO5BV-6ofYHLG107qic1WVy8v_pz_QkF_mMQeCk-LFFn0UZfxvEnsEWb8gA1U0kVDcb8X1SBdjlCtlGZxaLiZZdEc-McPUUV8Pz__2rvqK3-kcodHXX8AO63xiUbbrjhIew5_wjujtvw-WP4OfRsFnYRnGV0NpTp9fllm33kWYcvzrS3bLZasp3oNpt5drkJ6jCv8R8LCb7XbHsCae6fwOT46Pvnk7gtsBCbQvBVzF2OBlVT8KaRNkdLQqYGVbZBlW0sN8ZlaHonttaprmorRY22YOkaXqfC6tSW-VPY93PvngPjlU4Sk-vGJjRdqY1F4yopCPCuyaSNIO3epjIt-jgVwbhQwQuRQm0ooJACKlBAXUXwbnvP7w32xq2jR0Sk7UjCzQ4X5otz1YqhcsLwoiIYu4rAz9K6qTKLHmtjnShMXkfwlkisiN_w8YxukxRwkYSTpYaSEPXRJcsiOOyNRKk0_e6OSVTH1CpHwSAHt-ARvN5205100s27-ZrGSDR60UnHKZ5teGq7JJw7ySWvIpA9buutud_jZ78CZnhKAWr8gr-4_bFfwr0syICMk-IQ9leLtXuFVteqHsCdclpiK4-_DOBgdHT67WwQdjAGQciwPRv9-AtuNCt5
linkProvider	Scholars Portal
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB6VIgEXxJtAASOBOKCoeThZ54DQ8li2tNtTK_WEcWynrESd0t0V7Z_iNzLjPNgIqbcesorWzijOeDwzHs83AK8Ia9LGOg9jzaOQCyVCpcs8JISfkptUCx_Rne3n00P-9Sg72oA_XS4MHavs1kS_UJta0x75dopEyTnI-PvTXyFVjaLoaldCo5kWu_biN7psi3c7n5C_r5Nk8vng4zRsqwqEOsv5MuQ2RSuiynhVCZOi-hSxRj2lUU9pw7W2CdqbkSlVrIrSiLxEA2hkK17GuVGxGaVI9xpc5ylqcspMn3zpoxa0i9Il5oh8e4G6lDYgE45Xjr_nA-XnawT8rwnWVOHwmOaa3pvcgdutwcrGzQy7CxvW3YMbszYkfx--jR2b-50JaxidN2VqdXzSZjQ51mGWM-UMmy8XbC1izuaOnTSBIuYU3jGfNHzB-lNNtXsAh1fyeR_CpqudfQyMFyqKdKoqExG5kdIGDbYoIxC9KhEmgLj7mlK3iOZUWOOn9J6NyGXDAYkckJ4D8jyAt_0zpw2ex6W9PxCT-p6Exe3_qM-OZSva0uaaZwVB4xUEqBaXVZEY9IIrY_NMp2UAb4jFklYMfD2t2sQHHCRhb8mxIJR-dPOSALYGPVHS9bC5mySyXWkW8p9cBPCyb6Yn6fScs_WK-gg0pNHxRxKPmjnVDwlpR6ngRQBiMNsGYx62uPkPj0MeU9AbtcKTy9_rBdycHsz25N7O_u5TuJV4eRBhlG3B5vJsZZ-hVbcsn3tRYvD9qmX3L4kLXgI
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+improved+data+augmentation+approach+and+its+application+in+medical+named+entity+recognition&rft.jtitle=BMC+medical+informatics+and+decision+making&rft.au=Chen%2C+Hongyu&rft.au=Dan%2C+Li&rft.au=Lu%2C+Yonghe&rft.au=Chen%2C+Minghong&rft.date=2024-08-05&rft.pub=BioMed+Central+Ltd&rft.issn=1472-6947&rft.eissn=1472-6947&rft.volume=24&rft.issue=1&rft_id=info:doi/10.1186%2Fs12911-024-02624-x&rft.externalDocID=A803918262
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1472-6947&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1472-6947&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1472-6947&client=summon