The quest for better clinical word vectors: Ontology based and lexical vector augmentation versus clinical contextual embeddings
Word vectors or word embeddings are n-dimensional representations of words and form the backbone of Natural Language Processing of textual data. This research experiments with algorithms that augment word vectors with lexical constraints that are popular in NLP research and clinical domain constrain...
Saved in:
Published in | Computers in biology and medicine Vol. 134; p. 104433 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Ltd
01.07.2021
Elsevier Limited |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Word vectors or word embeddings are n-dimensional representations of words and form the backbone of Natural Language Processing of textual data. This research experiments with algorithms that augment word vectors with lexical constraints that are popular in NLP research and clinical domain constraints derived from the Unified Medical Language System (UMLS). It also compares the performance of the augmented vectors with Bio + Clinical BERT vectors which have been trained and fine-tuned on clinical datasets.
Word2vec vectors are generated for words in a publicly available de-identified Electronic Health Records (EHR) dataset and augmented by ontologies using three algorithms that have fundamentally different approaches to vector augmentation. The augmented vectors are then evaluated alongside publicly available Bio + Clinical BERT on their correlation with human-annotated lists using Spearman's correlation coefficient. They are also evaluated on the downstream task of Named Entity Recognition (NER). Quantitative and empirical evaluations are used to highlight the strengths and weaknesses of the different approaches.
The counter-fitted word2vec vectors augmented with information from the UMLS ontology produced the best correlation overall with human-annotated evaluation lists (Spearman's correlation of 0.733 with mini mayo-doctors’ annotation) while Bio + Clinical BERT produces the best results in the NER task (F1 of 0.87 and 0.811 on the i2b2 2010 and i2b2 2012 datasets respectively) in our experiments.
Clinically adapted word2vec vectors successfully encapsulate concepts of lexical and clinical synonymy and antonymy and to a smaller extent, hyponymy and hypernymy. Bio + Clinical BERT vectors perform better at NER and avoid out-of-vocabulary words.
•Evaluated different styles of word2vec vector generation in the clinical context.•Evaluated various linguistic and domain adaptation algorithms and constraints.•Evaluated publicly available vector space models.•Spearman’s correlation of 0.73 with mini mayo doctors’ list for adapted vectors.•Bio+Clinical BERT gives best results on NER task using Bi-LSTM CRF. |
---|---|
AbstractList | BACKGROUNDWord vectors or word embeddings are n-dimensional representations of words and form the backbone of Natural Language Processing of textual data. This research experiments with algorithms that augment word vectors with lexical constraints that are popular in NLP research and clinical domain constraints derived from the Unified Medical Language System (UMLS). It also compares the performance of the augmented vectors with Bio + Clinical BERT vectors which have been trained and fine-tuned on clinical datasets. METHODSWord2vec vectors are generated for words in a publicly available de-identified Electronic Health Records (EHR) dataset and augmented by ontologies using three algorithms that have fundamentally different approaches to vector augmentation. The augmented vectors are then evaluated alongside publicly available Bio + Clinical BERT on their correlation with human-annotated lists using Spearman's correlation coefficient. They are also evaluated on the downstream task of Named Entity Recognition (NER). Quantitative and empirical evaluations are used to highlight the strengths and weaknesses of the different approaches. RESULTSThe counter-fitted word2vec vectors augmented with information from the UMLS ontology produced the best correlation overall with human-annotated evaluation lists (Spearman's correlation of 0.733 with mini mayo-doctors' annotation) while Bio + Clinical BERT produces the best results in the NER task (F1 of 0.87 and 0.811 on the i2b2 2010 and i2b2 2012 datasets respectively) in our experiments. CONCLUSIONClinically adapted word2vec vectors successfully encapsulate concepts of lexical and clinical synonymy and antonymy and to a smaller extent, hyponymy and hypernymy. Bio + Clinical BERT vectors perform better at NER and avoid out-of-vocabulary words. Word vectors or word embeddings are n-dimensional representations of words and form the backbone of Natural Language Processing of textual data. This research experiments with algorithms that augment word vectors with lexical constraints that are popular in NLP research and clinical domain constraints derived from the Unified Medical Language System (UMLS). It also compares the performance of the augmented vectors with Bio + Clinical BERT vectors which have been trained and fine-tuned on clinical datasets. Word2vec vectors are generated for words in a publicly available de-identified Electronic Health Records (EHR) dataset and augmented by ontologies using three algorithms that have fundamentally different approaches to vector augmentation. The augmented vectors are then evaluated alongside publicly available Bio + Clinical BERT on their correlation with human-annotated lists using Spearman's correlation coefficient. They are also evaluated on the downstream task of Named Entity Recognition (NER). Quantitative and empirical evaluations are used to highlight the strengths and weaknesses of the different approaches. The counter-fitted word2vec vectors augmented with information from the UMLS ontology produced the best correlation overall with human-annotated evaluation lists (Spearman's correlation of 0.733 with mini mayo-doctors’ annotation) while Bio + Clinical BERT produces the best results in the NER task (F1 of 0.87 and 0.811 on the i2b2 2010 and i2b2 2012 datasets respectively) in our experiments. Clinically adapted word2vec vectors successfully encapsulate concepts of lexical and clinical synonymy and antonymy and to a smaller extent, hyponymy and hypernymy. Bio + Clinical BERT vectors perform better at NER and avoid out-of-vocabulary words. •Evaluated different styles of word2vec vector generation in the clinical context.•Evaluated various linguistic and domain adaptation algorithms and constraints.•Evaluated publicly available vector space models.•Spearman’s correlation of 0.73 with mini mayo doctors’ list for adapted vectors.•Bio+Clinical BERT gives best results on NER task using Bi-LSTM CRF. Word vectors or word embeddings are n-dimensional representations of words and form the backbone of Natural Language Processing of textual data. This research experiments with algorithms that augment word vectors with lexical constraints that are popular in NLP research and clinical domain constraints derived from the Unified Medical Language System (UMLS). It also compares the performance of the augmented vectors with Bio + Clinical BERT vectors which have been trained and fine-tuned on clinical datasets. Word2vec vectors are generated for words in a publicly available de-identified Electronic Health Records (EHR) dataset and augmented by ontologies using three algorithms that have fundamentally different approaches to vector augmentation. The augmented vectors are then evaluated alongside publicly available Bio + Clinical BERT on their correlation with human-annotated lists using Spearman's correlation coefficient. They are also evaluated on the downstream task of Named Entity Recognition (NER). Quantitative and empirical evaluations are used to highlight the strengths and weaknesses of the different approaches. The counter-fitted word2vec vectors augmented with information from the UMLS ontology produced the best correlation overall with human-annotated evaluation lists (Spearman's correlation of 0.733 with mini mayo-doctors' annotation) while Bio + Clinical BERT produces the best results in the NER task (F1 of 0.87 and 0.811 on the i2b2 2010 and i2b2 2012 datasets respectively) in our experiments. Clinically adapted word2vec vectors successfully encapsulate concepts of lexical and clinical synonymy and antonymy and to a smaller extent, hyponymy and hypernymy. Bio + Clinical BERT vectors perform better at NER and avoid out-of-vocabulary words. |
ArticleNumber | 104433 |
Author | Lee, Sang-Heon Lee, Ivan McDonnell, Mark D. Nath, Namrata |
Author_xml | – sequence: 1 givenname: Namrata orcidid: 0000-0001-5793-3731 surname: Nath fullname: Nath, Namrata email: namrata.nath@mymail.unisa.edu.au – sequence: 2 givenname: Sang-Heon orcidid: 0000-0002-3655-7981 surname: Lee fullname: Lee, Sang-Heon – sequence: 3 givenname: Mark D. surname: McDonnell fullname: McDonnell, Mark D. – sequence: 4 givenname: Ivan surname: Lee fullname: Lee, Ivan |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34004575$$D View this record in MEDLINE/PubMed |
BookMark | eNqFkU1v1DAQhi1URLeFv4AsceGSxXZsJ-EGFV9SpV7K2fLHZPEqsRfb6ceNn463aVWJCyePZp55ZzzvGToJMQBCmJItJVR-2G9tnA_GxxnclhFGa5rztn2BNrTvhoaIlp-gDSGUNLxn4hSd5bwnhHDSklfotOU1FJ3YoD_XvwD_XiAXPMaEDZQCCdvJB2_1hG9jcvgGbIkpf8RXocQp7u6x0Rkc1sHhCe4ewJXBetnNEIouPoaaS3nJz2I2hgJ3ZakhzAac82GXX6OXo54yvHl8z9HPr1-uL743l1ffflx8umwsJ6w0fDQSOOkGGK0cgDPTaWkHwzomXS2OA-3EMPZg2WDFCNIJpkcmtWkp06xrz9H7VfeQ4sN_1eyzhWnSAeKSFROsHyiRpK_ou3_QfVxSqNtVineyFYwfBfuVsinmnGBUh-Rnne4VJeroktqrZ5fU0SW1ulRb3z4OWMyx9tT4ZEsFPq8A1IvceEgqWw_BgvOpHlq56P8_5S-TyqyZ |
CitedBy_id | crossref_primary_10_1016_j_ijmedinf_2023_105122 crossref_primary_10_3390_electronics12132846 crossref_primary_10_1007_s41870_022_01145_y crossref_primary_10_1016_j_jbi_2023_104400 crossref_primary_10_1016_j_jbi_2022_104092 crossref_primary_10_1016_j_compbiomed_2023_107422 crossref_primary_10_1186_s12911_022_02049_4 |
Cites_doi | 10.1080/00437956.1954.11659520 10.1093/nar/gkh061 10.1145/219717.219748 10.1162/COLI_a_00237 10.1136/amiajnl-2013-001628 10.1016/j.yjbinx.2019.100057 10.1093/jamia/ocz096 10.1016/j.jbi.2018.09.008 10.1038/sdata.2016.35 10.1162/tacl_a_00143 10.1136/amiajnl-2011-000203 |
ContentType | Journal Article |
Copyright | 2021 Copyright © 2021. Published by Elsevier Ltd. Copyright Elsevier Limited Jul 2021 |
Copyright_xml | – notice: 2021 – notice: Copyright © 2021. Published by Elsevier Ltd. – notice: Copyright Elsevier Limited Jul 2021 |
DBID | NPM AAYXX CITATION 3V. 7RV 7X7 7XB 88E 8AL 8AO 8FD 8FE 8FG 8FH 8FI 8FJ 8FK 8G5 ABUWG AFKRA ARAPS AZQEC BBNVY BENPR BGLVJ BHPHI CCPQU DWQXO FR3 FYUFA GHDGH GNUQQ GUQSH HCIFZ JQ2 K7- K9. KB0 LK8 M0N M0S M1P M2O M7P M7Z MBDVC NAPCQ P5Z P62 P64 PQEST PQQKQ PQUKI PRINS Q9U 7X8 |
DOI | 10.1016/j.compbiomed.2021.104433 |
DatabaseName | PubMed CrossRef ProQuest Central (Corporate) ProQuest Nursing and Allied Health Journals ProQuest Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) Research Library (Alumni Edition) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Database (1962 - current) ProQuest Central Essentials Biological Science Collection AUTh Library subscriptions: ProQuest Central Technology Collection ProQuest Natural Science Collection ProQuest One Community College ProQuest Central Engineering Research Database Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student Research Library Prep SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Computer Science Collection Computer Science Database ProQuest Health & Medical Complete (Alumni) Nursing & Allied Health Database (Alumni Edition) Biological Sciences Computing Database Health & Medical Collection (Alumni Edition) PML(ProQuest Medical Library) ProQuest research library Biological Science Database Biochemistry Abstracts 1 Research Library (Corporate) Nursing & Allied Health Premium ProQuest advanced technologies & aerospace journals ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic MEDLINE - Academic |
DatabaseTitle | PubMed CrossRef Research Library Prep Computer Science Database ProQuest Central Student Technology Collection Technology Research Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College Research Library (Alumni Edition) ProQuest Natural Science Collection ProQuest Pharma Collection ProQuest Central China ProQuest Central Health Research Premium Collection Health and Medicine Complete (Alumni Edition) Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest Research Library ProQuest Medical Library (Alumni) Advanced Technologies & Aerospace Collection ProQuest Computing ProQuest Biological Science Collection ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Nursing & Allied Health Source ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest SciTech Collection ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts Advanced Technologies & Aerospace Database Nursing & Allied Health Premium ProQuest Health & Medical Complete ProQuest Medical Library ProQuest One Academic UKI Edition Biochemistry Abstracts 1 ProQuest Nursing & Allied Health Source (Alumni) Engineering Research Database ProQuest One Academic ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic Research Library Prep PubMed |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine |
EISSN | 1879-0534 |
EndPage | 104433 |
ExternalDocumentID | 10_1016_j_compbiomed_2021_104433 34004575 S0010482521002274 |
Genre | Journal Article |
GroupedDBID | --- --K --M --Z -~X .1- .55 .DC .FO .GJ .~1 0R~ 1B1 1P~ 1RT 1~. 1~5 29F 3V. 4.4 457 4G. 53G 5GY 5VS 7-5 71M 7RV 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ 8G5 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABJNI ABLVK ABMAC ABMZM ABOCM ABUWG ABXDB ABYKQ ACDAQ ACGFS ACIUM ACIWK ACNNM ACPRK ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD AEBSH AEKER AENEX AEVXI AFKRA AFKWA AFRAH AFRHN AFTJW AFXIZ AGHFR AGUBO AGYEJ AHHHB AHMBA AHPSJ AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV AJRQY AJUYK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ ANZVX AOUOD ARAPS ASPBG AVWKF AXJTR AZFZN AZQEC BBNVY BENPR BGLVJ BHPHI BKEYQ BKOJK BLXMC BNPGV BPHCQ BVXVI CCPQU CS3 DU5 DWQXO EBS EFJIC EFLBG EJD EMOBN EO8 EO9 EP2 EP3 EX3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN FYUFA G-2 G-Q GBLVA GBOLZ GNUQQ GUQSH HCIFZ HLZ HMCUK HMK HMO HVGLF HZ~ IHE J1W K6V K7- KOM LCYCR LK8 LX9 M0N M1P M29 M2O M41 M7P MO0 N9A NAPCQ O-L O9- OAUVE OZT P-8 P-9 P2P P62 PC. PQQKQ PROAC PSQYO Q38 R2- RIG ROL RPZ RXW SAE SBC SCC SDF SDG SDP SEL SES SEW SPC SPCBC SSH SSV SSZ SV3 T5K TAE UAP UKHRP WOW WUQ X7M XPP Z5R ZGI ~G- AAXKI AFCTW AKRWK ALIPV NPM AAYXX AFJKZ CITATION 7XB 8AL 8FD 8FK FR3 JQ2 K9. M7Z MBDVC P64 PQEST PQUKI PRINS Q9U 7X8 |
ID | FETCH-LOGICAL-c402t-4fb6e4079efc69e42b7a6c9b2726d4fbf91759f8ec29c5fe6d52af26ab312a273 |
IEDL.DBID | BENPR |
ISSN | 0010-4825 |
IngestDate | Sat Oct 05 05:49:38 EDT 2024 Thu Oct 10 22:47:51 EDT 2024 Thu Sep 26 18:29:02 EDT 2024 Sat Sep 28 08:26:47 EDT 2024 Fri Feb 23 02:40:25 EST 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Augmentation Clinical word vectors Word embedding Antonymy |
Language | English |
License | Copyright © 2021. Published by Elsevier Ltd. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c402t-4fb6e4079efc69e42b7a6c9b2726d4fbf91759f8ec29c5fe6d52af26ab312a273 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0002-3655-7981 0000-0001-5793-3731 |
PMID | 34004575 |
PQID | 2547635247 |
PQPubID | 1226355 |
PageCount | 1 |
ParticipantIDs | proquest_miscellaneous_2528910608 proquest_journals_2547635247 crossref_primary_10_1016_j_compbiomed_2021_104433 pubmed_primary_34004575 elsevier_sciencedirect_doi_10_1016_j_compbiomed_2021_104433 |
PublicationCentury | 2000 |
PublicationDate | 2021-07-01 |
PublicationDateYYYYMMDD | 2021-07-01 |
PublicationDate_xml | – month: 07 year: 2021 text: 2021-07-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States – name: Oxford |
PublicationTitle | Computers in biology and medicine |
PublicationTitleAlternate | Comput Biol Med |
PublicationYear | 2021 |
Publisher | Elsevier Ltd Elsevier Limited |
Publisher_xml | – name: Elsevier Ltd – name: Elsevier Limited |
References | Bodenreider (bib30) Jan. 2004; 32 Miller (bib31) 1995; 38 Uzuner, South, Shen, DuVall (bib25) Sep. 2011; 18 Baroni, Dinu, Kruszewski (bib6) 2014; vol. 1 Wang (bib1) Nov. 2018; 87 Wieting, Bansal, Gimpel, Livescu, Roth (bib21) 2015; 3 Bian, Gao, Liu (bib15) 2014 Alsentzer (bib10) 2019 Si, Wang, Xu, Roberts (bib28) 2019 Johnson (bib29) May 2016; 3 Harris (bib3) Aug. 1954; 10 Vashishth, Bhandari, Yadav, Rai, Bhattacharyya, Talukdar (bib22) 2019 Zwillinger, Kokoska (bib24) 2000 Xiao (bib33) 2018 Khattak, Jeblee, Pou-Prom, Abdalla, Meaney, Rudzicz (bib2) Dec.2019; 4 Sun, Rumshisky, Uzuner (bib26) 2013; 20 Mrkšić, Séaghdha, Thomson (bib12) 2016 Mohammad, Dorr, Hirst (bib9) 2008 Yih, Zweig, Platt (bib18) 2012 Chalapathy, Borzeshi, Piccardi (bib34) 2016 Hill, Reichart, Korhonen (bib8) Dec. 2015; 41 Pavlick, Rastogi, Ganitkevitch, Van Durme, Callison-Burch (bib13) 2015; vol. 2 Xu, Bai, Bian (bib16) 2014 Mikolov, Chen, Corrado, Dean (bib5) 2013 Chang, Yih, Meek (bib17) 2013 Vaswani, Shazeer, Parmar (bib23) 2017; 2017-Decem Boag, Kané (bib20) 2017 Faruqui, Dodge, Jauhar, Dyer, Hovy, Smith (bib11) 2015 Pennington, Socher, Manning (bib4) 2014 Devlin, Chang, Lee, Toutanova (bib7) 2019 Gardner, Grus, Neumann (bib32) 2017 Yu, Dredze (bib14) 2015; vol. 2 (Short Papers) Levy, Goldberg (bib19) 2014; vol. 2 Z. Huang, W. Xu, and K. Yu, Bidirectional LSTM-CRF Models for Sequence Tagging, arXiv preprint arXiv:1508.01991. Mohammad (10.1016/j.compbiomed.2021.104433_bib9) 2008 Bian (10.1016/j.compbiomed.2021.104433_bib15) 2014 Levy (10.1016/j.compbiomed.2021.104433_bib19) 2014; vol. 2 Gardner (10.1016/j.compbiomed.2021.104433_bib32) 2017 Wieting (10.1016/j.compbiomed.2021.104433_bib21) 2015; 3 Harris (10.1016/j.compbiomed.2021.104433_bib3) 1954; 10 Hill (10.1016/j.compbiomed.2021.104433_bib8) 2015; 41 Faruqui (10.1016/j.compbiomed.2021.104433_bib11) 2015 Vashishth (10.1016/j.compbiomed.2021.104433_bib22) 2019 Uzuner (10.1016/j.compbiomed.2021.104433_bib25) 2011; 18 Chang (10.1016/j.compbiomed.2021.104433_bib17) 2013 Boag (10.1016/j.compbiomed.2021.104433_bib20) 2017 10.1016/j.compbiomed.2021.104433_bib27 Xu (10.1016/j.compbiomed.2021.104433_bib16) 2014 Si (10.1016/j.compbiomed.2021.104433_bib28) 2019 Chalapathy (10.1016/j.compbiomed.2021.104433_bib34) 2016 Johnson (10.1016/j.compbiomed.2021.104433_bib29) 2016; 3 Devlin (10.1016/j.compbiomed.2021.104433_bib7) 2019 Pavlick (10.1016/j.compbiomed.2021.104433_bib13) 2015; vol. 2 Vaswani (10.1016/j.compbiomed.2021.104433_bib23) 2017; 2017-Decem Alsentzer (10.1016/j.compbiomed.2021.104433_bib10) 2019 Baroni (10.1016/j.compbiomed.2021.104433_bib6) 2014; vol. 1 Khattak (10.1016/j.compbiomed.2021.104433_bib2) 2019; 4 Xiao (10.1016/j.compbiomed.2021.104433_bib33) 2018 Yu (10.1016/j.compbiomed.2021.104433_bib14) 2015; vol. 2 (Short Papers) Miller (10.1016/j.compbiomed.2021.104433_bib31) 1995; 38 Zwillinger (10.1016/j.compbiomed.2021.104433_bib24) 2000 Sun (10.1016/j.compbiomed.2021.104433_bib26) 2013; 20 Pennington (10.1016/j.compbiomed.2021.104433_bib4) 2014 Mrkšić (10.1016/j.compbiomed.2021.104433_bib12) 2016 Yih (10.1016/j.compbiomed.2021.104433_bib18) 2012 Wang (10.1016/j.compbiomed.2021.104433_bib1) 2018; 87 Mikolov (10.1016/j.compbiomed.2021.104433_bib5) 2013 Bodenreider (10.1016/j.compbiomed.2021.104433_bib30) 2004; 32 |
References_xml | – start-page: 132 year: 2014 end-page: 148 ident: bib15 article-title: Knowledge-powered deep learning for word embedding publication-title: Proceedings of the 2014th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I contributor: fullname: Liu – volume: 3 start-page: 345 year: 2015 end-page: 358 ident: bib21 article-title: From paraphrase database to compositional paraphrase model and back publication-title: Trans. Assoc. Comput. Linguist. contributor: fullname: Roth – start-page: 3308 year: 2019 end-page: 3318 ident: bib22 article-title: Incorporating syntactic and semantic information in word embeddings using graph convolutional networks publication-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics contributor: fullname: Talukdar – start-page: 12 year: 2012 end-page: 14 ident: bib18 article-title: Polarity inducing latent semantic analysis publication-title: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning contributor: fullname: Platt – volume: 41 start-page: 665 year: Dec. 2015 end-page: 695 ident: bib8 article-title: Simlex-999: evaluating semantic models with (Genuine) similarity estimation publication-title: Comput. Ling. contributor: fullname: Korhonen – start-page: 1219 year: 2014 end-page: 1228 ident: bib16 article-title: RC-NET: a general framework for incorporating knowledge into word representations publication-title: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management contributor: fullname: Bian – volume: 87 start-page: 12 year: Nov. 2018 end-page: 20 ident: bib1 article-title: A comparison of word embeddings for the biomedical natural language processing publication-title: Journal of Biomedical Informatics contributor: fullname: Wang – volume: 4 start-page: 100057 year: Dec.2019 ident: bib2 article-title: A survey of word embeddings for clinical text publication-title: J. Biomed. Inf.: X contributor: fullname: Rudzicz – volume: vol. 2 (Short Papers) start-page: 545 year: 2015 end-page: 550 ident: bib14 article-title: Improving lexical embeddings with semantic knowledge publication-title: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics contributor: fullname: Dredze – volume: 18 start-page: 552 year: Sep. 2011 end-page: 556 ident: bib25 article-title: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text publication-title: J. Am. Med. Inf. Assoc. contributor: fullname: DuVall – year: 2018 ident: bib33 article-title: Bert-As-Service contributor: fullname: Xiao – start-page: 7 year: 2016 end-page: 12 ident: bib34 article-title: Bidirectional LSTM-CRF for clinical concept extraction publication-title: Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP) contributor: fullname: Piccardi – year: 2000 ident: bib24 article-title: CRC Standard Probability and Statistics Tables and Formulae contributor: fullname: Kokoska – volume: 3 start-page: 160035 year: May 2016 ident: bib29 article-title: MIMIC-III, a freely accessible critical care database publication-title: Sci. Data contributor: fullname: Johnson – start-page: 72 year: 2019 end-page: 78 ident: bib10 article-title: Publicly available clinical BERT embeddings publication-title: Proceedings of the 2nd Clinical Natural Language Processing Workshop contributor: fullname: Alsentzer – start-page: 1 year: 2013 end-page: 12 ident: bib5 article-title: Efficient estimation of word representations in vector space publication-title: Proceedings of the International Conference on Learning Representation (ICLR 2013) contributor: fullname: Dean – start-page: 1602 year: 2013 end-page: 1612 ident: bib17 article-title: Multi-relational latent semantic analysis publication-title: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing contributor: fullname: Meek – year: 2019 ident: bib28 article-title: Enhancing Clinical Concept Extraction with Contextual Embeddings publication-title: J. Am. Med. Inf. Assoc. contributor: fullname: Roberts – volume: 20 start-page: 806 year: 2013 end-page: 813 ident: bib26 article-title: Evaluating temporal relations in clinical text: 2012 i2b2 Challenge publication-title: J. Am. Med. Inf. Assoc. contributor: fullname: Uzuner – start-page: 1 year: 2017 end-page: 6 ident: bib32 article-title: AllenNLP: A Deep Semantic Natural Language Processing Platform contributor: fullname: Neumann – volume: vol. 2 start-page: 425 year: 2015 end-page: 430 ident: bib13 article-title: Ppdb 2.0: better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification publication-title: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing contributor: fullname: Callison-Burch – volume: 32 year: Jan. 2004 ident: bib30 article-title: The unified Medical Language system (UMLS): integrating biomedical terminology publication-title: Nucleic Acids Res. contributor: fullname: Bodenreider – volume: 38 start-page: 39 year: 1995 end-page: 41 ident: bib31 article-title: WordNet: a lexical database for English publication-title: Commun. ACM contributor: fullname: Miller – volume: 10 start-page: 146 year: Aug. 1954 end-page: 162 ident: bib3 article-title: Distributional structure publication-title: Word contributor: fullname: Harris – start-page: 4171 year: 2019 end-page: 4186 ident: bib7 article-title: BERT: pre-training of deep bidirectional transformers for language understanding publication-title: Proceedings of NAACL-HLT contributor: fullname: Toutanova – start-page: 1606 year: 2015 end-page: 1615 ident: bib11 article-title: Retrofitting word vectors to semantic lexicons publication-title: Proceedings of the NAACL-HLT contributor: fullname: Smith – volume: vol. 2 start-page: 302 year: 2014 end-page: 308 ident: bib19 article-title: Dependency-based word embeddings publication-title: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics contributor: fullname: Goldberg – volume: 2017-Decem start-page: 5999 year: 2017 end-page: 6009 ident: bib23 article-title: Attention is all you need publication-title: Advances in Neural Information Processing Systems contributor: fullname: Parmar – start-page: 1532 year: 2014 end-page: 1543 ident: bib4 article-title: Glove: global vectors for word representation publication-title: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP) contributor: fullname: Manning – year: 2017 ident: bib20 article-title: AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus publication-title: 31 Conference on Neural Information Processing contributor: fullname: Kané – volume: vol. 1 start-page: 238 year: 2014 end-page: 247 ident: bib6 article-title: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors publication-title: 52nd Annu. Meet. Assoc. Comput. Linguist. ACL 2014 - Proc. Conf. contributor: fullname: Kruszewski – start-page: 142 year: 2016 end-page: 148 ident: bib12 article-title: Counter-fitting word vectors to linguistic constraints publication-title: Proceedings of the NAACL-HLT contributor: fullname: Thomson – start-page: 982 year: 2008 end-page: 991 ident: bib9 article-title: Computing word-pair antonymy publication-title: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP) contributor: fullname: Hirst – volume: 10 start-page: 146 issue: 2–3 year: 1954 ident: 10.1016/j.compbiomed.2021.104433_bib3 article-title: Distributional structure publication-title: Word doi: 10.1080/00437956.1954.11659520 contributor: fullname: Harris – year: 2000 ident: 10.1016/j.compbiomed.2021.104433_bib24 contributor: fullname: Zwillinger – volume: 32 year: 2004 ident: 10.1016/j.compbiomed.2021.104433_bib30 article-title: The unified Medical Language system (UMLS): integrating biomedical terminology publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkh061 contributor: fullname: Bodenreider – volume: vol. 2 start-page: 425 year: 2015 ident: 10.1016/j.compbiomed.2021.104433_bib13 article-title: Ppdb 2.0: better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification contributor: fullname: Pavlick – volume: 38 start-page: 39 issue: 11 year: 1995 ident: 10.1016/j.compbiomed.2021.104433_bib31 article-title: WordNet: a lexical database for English publication-title: Commun. ACM doi: 10.1145/219717.219748 contributor: fullname: Miller – start-page: 1532 year: 2014 ident: 10.1016/j.compbiomed.2021.104433_bib4 article-title: Glove: global vectors for word representation contributor: fullname: Pennington – start-page: 132 year: 2014 ident: 10.1016/j.compbiomed.2021.104433_bib15 article-title: Knowledge-powered deep learning for word embedding contributor: fullname: Bian – start-page: 72 year: 2019 ident: 10.1016/j.compbiomed.2021.104433_bib10 article-title: Publicly available clinical BERT embeddings contributor: fullname: Alsentzer – start-page: 7 year: 2016 ident: 10.1016/j.compbiomed.2021.104433_bib34 article-title: Bidirectional LSTM-CRF for clinical concept extraction contributor: fullname: Chalapathy – start-page: 3308 year: 2019 ident: 10.1016/j.compbiomed.2021.104433_bib22 article-title: Incorporating syntactic and semantic information in word embeddings using graph convolutional networks contributor: fullname: Vashishth – volume: 2017-Decem start-page: 5999 year: 2017 ident: 10.1016/j.compbiomed.2021.104433_bib23 article-title: Attention is all you need contributor: fullname: Vaswani – start-page: 1219 year: 2014 ident: 10.1016/j.compbiomed.2021.104433_bib16 article-title: RC-NET: a general framework for incorporating knowledge into word representations contributor: fullname: Xu – volume: vol. 2 (Short Papers) start-page: 545 year: 2015 ident: 10.1016/j.compbiomed.2021.104433_bib14 article-title: Improving lexical embeddings with semantic knowledge contributor: fullname: Yu – volume: 41 start-page: 665 issue: 4 year: 2015 ident: 10.1016/j.compbiomed.2021.104433_bib8 article-title: Simlex-999: evaluating semantic models with (Genuine) similarity estimation publication-title: Comput. Ling. doi: 10.1162/COLI_a_00237 contributor: fullname: Hill – start-page: 982 year: 2008 ident: 10.1016/j.compbiomed.2021.104433_bib9 article-title: Computing word-pair antonymy contributor: fullname: Mohammad – start-page: 4171 year: 2019 ident: 10.1016/j.compbiomed.2021.104433_bib7 article-title: BERT: pre-training of deep bidirectional transformers for language understanding contributor: fullname: Devlin – start-page: 1602 year: 2013 ident: 10.1016/j.compbiomed.2021.104433_bib17 article-title: Multi-relational latent semantic analysis contributor: fullname: Chang – volume: vol. 2 start-page: 302 year: 2014 ident: 10.1016/j.compbiomed.2021.104433_bib19 article-title: Dependency-based word embeddings contributor: fullname: Levy – ident: 10.1016/j.compbiomed.2021.104433_bib27 – year: 2017 ident: 10.1016/j.compbiomed.2021.104433_bib20 article-title: AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus contributor: fullname: Boag – start-page: 1606 year: 2015 ident: 10.1016/j.compbiomed.2021.104433_bib11 article-title: Retrofitting word vectors to semantic lexicons contributor: fullname: Faruqui – start-page: 1 year: 2017 ident: 10.1016/j.compbiomed.2021.104433_bib32 contributor: fullname: Gardner – volume: 20 start-page: 806 issue: 5 year: 2013 ident: 10.1016/j.compbiomed.2021.104433_bib26 article-title: Evaluating temporal relations in clinical text: 2012 i2b2 Challenge publication-title: J. Am. Med. Inf. Assoc. doi: 10.1136/amiajnl-2013-001628 contributor: fullname: Sun – start-page: 1 year: 2013 ident: 10.1016/j.compbiomed.2021.104433_bib5 article-title: Efficient estimation of word representations in vector space contributor: fullname: Mikolov – volume: vol. 1 start-page: 238 year: 2014 ident: 10.1016/j.compbiomed.2021.104433_bib6 article-title: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors contributor: fullname: Baroni – volume: 4 start-page: 100057 year: 2019 ident: 10.1016/j.compbiomed.2021.104433_bib2 article-title: A survey of word embeddings for clinical text publication-title: J. Biomed. Inf.: X doi: 10.1016/j.yjbinx.2019.100057 contributor: fullname: Khattak – start-page: 12 year: 2012 ident: 10.1016/j.compbiomed.2021.104433_bib18 article-title: Polarity inducing latent semantic analysis contributor: fullname: Yih – start-page: 142 year: 2016 ident: 10.1016/j.compbiomed.2021.104433_bib12 article-title: Counter-fitting word vectors to linguistic constraints contributor: fullname: Mrkšić – year: 2019 ident: 10.1016/j.compbiomed.2021.104433_bib28 article-title: Enhancing Clinical Concept Extraction with Contextual Embeddings publication-title: J. Am. Med. Inf. Assoc. doi: 10.1093/jamia/ocz096 contributor: fullname: Si – volume: 87 start-page: 12 year: 2018 ident: 10.1016/j.compbiomed.2021.104433_bib1 article-title: A comparison of word embeddings for the biomedical natural language processing publication-title: Journal of Biomedical Informatics doi: 10.1016/j.jbi.2018.09.008 contributor: fullname: Wang – volume: 3 start-page: 160035 year: 2016 ident: 10.1016/j.compbiomed.2021.104433_bib29 article-title: MIMIC-III, a freely accessible critical care database publication-title: Sci. Data doi: 10.1038/sdata.2016.35 contributor: fullname: Johnson – year: 2018 ident: 10.1016/j.compbiomed.2021.104433_bib33 contributor: fullname: Xiao – volume: 3 start-page: 345 year: 2015 ident: 10.1016/j.compbiomed.2021.104433_bib21 article-title: From paraphrase database to compositional paraphrase model and back publication-title: Trans. Assoc. Comput. Linguist. doi: 10.1162/tacl_a_00143 contributor: fullname: Wieting – volume: 18 start-page: 552 issue: 5 year: 2011 ident: 10.1016/j.compbiomed.2021.104433_bib25 article-title: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text publication-title: J. Am. Med. Inf. Assoc. doi: 10.1136/amiajnl-2011-000203 contributor: fullname: Uzuner |
SSID | ssj0004030 |
Score | 2.41152 |
Snippet | Word vectors or word embeddings are n-dimensional representations of words and form the backbone of Natural Language Processing of textual data. This research... BackgroundWord vectors or word embeddings are n-dimensional representations of words and form the backbone of Natural Language Processing of textual data. This... BACKGROUNDWord vectors or word embeddings are n-dimensional representations of words and form the backbone of Natural Language Processing of textual data. This... |
SourceID | proquest crossref pubmed elsevier |
SourceType | Aggregation Database Index Database Publisher |
StartPage | 104433 |
SubjectTerms | Algorithms Annotations Antonymy Augmentation Clinical word vectors Correlation coefficient Correlation coefficients Datasets Electronic health records Electronic medical records Hypotheses Language Linguistics Medical research Natural language processing Neural networks Ontology Physicians Semantics Synonymy Word embedding |
SummonAdditionalLinks | – databaseName: ScienceDirect Freedom Collection 2013 dbid: .~1 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7iQbyIb9cXEbxWzWPTRk8iyiKsXhS8hSZNRcG6uFvFi_jTnUlSFw-CYE99pCVkki_zNd9MCNkXjjHvRJ1J5kQmtWAZYCDPqpJbxZSHA38NDK_U4FZe3vXvZshZFwuDssqE_RHTA1qnO4epNQ9HDw8Y4wtUAggOZyEPHuYElXABffrgYyrzkEcihqEA3mDppOaJGi-Ubccwd2CKnOGCpxTitynqNxc0TEUXi2Qh-ZD0NFZzicz4ZpnMDdMq-Qr5BNvT8AkKLim1IWKHdjGQ9A34Jn0Nf-vHx_S6CTvYvlOczypaNhXFHJlYMJahZXv_lCKUGooqjnY8_RhK3QHfWzj1T9ZXYSlrldxenN-cDbK000LmgD9OMllb5YHaaV87pb3kNi-V05bnXFXwsAZS19d14R3XKE9TVZ-XNVelFYyX4AGtkdnmufEbhOqqyK20_bxWhdScWxj0gCpCeO1cXRQ9wrrGNaOYUMN0SrNHMzWIQYOYaJAeOemsYH50DgO4_4e3tzvDmTRAxwZ4Mabi4zLvkb3vxzC0cL2kbPxzi2WAjQJlPoJKr0eDf1dZIPaBq7v5r6ptkXm8ivLfbTI7eWn9Djg5E7sbevEXpvT8hg priority: 102 providerName: Elsevier |
Title | The quest for better clinical word vectors: Ontology based and lexical vector augmentation versus clinical contextual embeddings |
URI | https://dx.doi.org/10.1016/j.compbiomed.2021.104433 https://www.ncbi.nlm.nih.gov/pubmed/34004575 https://www.proquest.com/docview/2547635247 https://search.proquest.com/docview/2528910608 |
Volume | 134 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3daxQxEB_aHogvRVs_TusRwdfV5uOyG_sgVXq9Kr2KWLi3sMlmhUL3qner9KX0T3dmk-09VdyHsJBsCDvJZH4zv0kA3kjPefCyzhT3MlNG8gx1oMiqUjjNdcCHXAOnMz09V5_n43lyuC0TrbLXiZ2irhaefOTvEMjQ2WlC5R-ufmZ0axRFV9MVGpswEFxRmHbw8Wj29ds6M3JfxiQU1DYKwVDi8kSGF5G2Y5I74kTBKdyppLxvg7rPAO02oskj2E4WJDuMIn8MG6HZgQenKUa-C7coedZ1wdAgZa7L12F9BiT7g2iT_e589cv37Kzp7q-9ZrSbVaxsKkYnZFLD2IaV7Y_LlJ_UMOJwtMt1Z0R0R-3e4mu4dKHqAllP4Hxy9P3TNEv3LGQe0eMqU7XTAYGdCbXXJijh8lJ740QudIWVNUK6samL4IUhcpquxqKshS6d5KJE--cpbDWLJjwHZqoid8qN81oXygjhcMmjTpEyGO_rohgC73-uvYrHadieZ3Zh1wKxJBAbBTKEg14KNpkFcbu3qPX_4-u9XnA2Lc-lXU-mIby-q8aFRdGSsgmLltogFkXAvI-DfhYFfjdkSZoPDd0X_-78JTykkUR27x5srX614RXaMCs3gs23NxzLfJ5jWUyORzA4PPkynY3SFP4LJdr3OA |
link.rule.ids | 315,786,790,4521,12083,12792,21416,24144,27955,27956,31752,31753,33406,33407,33777,33778,43343,43633,43838,45618,45712,74100,74390,74657 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEA4-QL2Ib9dnBK9Fm2TTRg8i4ro-Vi8K3kKTpoKwXXW3ijd_ujNN6p4UeyokDaGTTOab-SZDyD63cewsLyIRWx4JxeMIdCCL8owZGUsHD7oGerey-yCuHtuPweE2DLTKRifWijofWPSRHwCQwbvTmEhOXl4jrBqF0dVQQmOSTAsuOa7ztHMxzos85D4FBXSNACgUmDye34WUbZ_iDiiRxRjsFJz_djz9Zn7Wx1BngcwH-5GeeoEvkglXLpGZXoiQL5MvkDuth6BgjlJTZ-vQJv-RfgDWpO-1p354RO_KunrtJ8WzLKdZmVO8HxM7-j40q576ITuppMjgqIbjwZDmDrq9glfXNy6vw1gr5KFzfn_WjUKVhcgCdhxFojDSAaxTrrBSOcFMkkmrDEuYzKGxAEDXVkXqLFNITZN5m2UFk5nhMcvA-lklU-WgdOuEqjxNjDDtpJCpUIwZ2PCgUTh3ytoiTVskbn6ufvGXaeiGZfasxwLRKBDtBdIix40UdDAK_GGvQef_4-utRnA6bM6hHi-lFtn7aYZthbGSrHSDCvsAEgW4fAiTXvMC_5kyR70HZu7G34Pvktnufe9G31zeXm-SOZyV5_lukanRW-W2wZoZmZ16yX4DwMb0Zg |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwELZgkVZcqvJqt92CkbhGS2yvE7eHChWW5c0BJG5W7NiVKpEFdkPFjZ_emdjZPVGRU6Q4luWxZ-bzfOMhZI_bNHWW-0SklidC8TQBHciSsmBGptLBg0cDF5dyfCtO74Z3kf80jbTKVic2irqcWDwjHwCQwbvTmMgGPtIirg9HPx8eE6wghZHWWE5jmaygk41lHPLR8SJHcp-HdBTQOwJgUWT1BK4X0rdDujsgRpZi4FNw_papessVbUzS6CP5EH1JehCEv0aWXLVOuhcxWr5BXmEN0KYLCq4pNU3mDm1zIelfwJ30uTm1n36nV1VTyfaFol0raVGVFO_KxIahDS3q3_cxU6miyOaop4vOkPIOk1XDq7s3rmxCWpvkdnR082ucxIoLiQUcOUuEN9IBxFPOW6mcYCYrpFWGZUyW8NEDuBsqnzvLFNLUZDlkhWeyMDxlBXhCW6RTTSr3mVBV5pkRZph5mQvFmIHND9qFc6es9XneI2k7ufohXKyhW8bZH70QiEaB6CCQHvnRSkFHByEYfg36_x1_91vB6bhRp3qxrHpkd_4ZthjGTYrKTWpsA6gUoPM-DPpTEPh8yBx1ILi8X_7f-Q7pwmrV5yeXZ1_JKg4qUH77pDN7qt03cGxmZrtZsf8AFTP4mw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+quest+for+better+clinical+word+vectors%3A+Ontology+based+and+lexical+vector+augmentation+versus+clinical+contextual+embeddings&rft.jtitle=Computers+in+biology+and+medicine&rft.au=Nath%2C+Namrata&rft.au=Lee%2C+Sang-Heon&rft.au=McDonnell%2C+Mark&rft.au=Lee%2C+Ivan&rft.date=2021-07-01&rft.eissn=1879-0534&rft.volume=134&rft.spage=104433&rft_id=info:doi/10.1016%2Fj.compbiomed.2021.104433&rft_id=info%3Apmid%2F34004575&rft.externalDocID=34004575 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0010-4825&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0010-4825&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0010-4825&client=summon |