Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach
Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers hav...
Saved in:
Published in | PeerJ (San Francisco, CA) Vol. 3; p. e1279 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
United States
PeerJ. Ltd
29.09.2015
PeerJ, Inc PeerJ Inc |
Subjects | |
Online Access | Get full text |
ISSN | 2167-8359 2167-8359 |
DOI | 10.7717/peerj.1279 |
Cover
Loading…
Abstract | Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria—that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text—thus suffering from synonymy and polysemy—and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge—concretely Wikipedia—in order to create bag-of-concepts (BoC) representations of documents, understanding concept as “unit of meaning”, and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus. |
---|---|
AbstractList | Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria—that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text—thus suffering from synonymy and polysemy—and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge—concretely Wikipedia—in order to create bag-of-concepts (BoC) representations of documents, understanding concept as “unit of meaning”, and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus. Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria-that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text-thus suffering from synonymy and polysemy-and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge-concretely Wikipedia-in order to create bag-of-concepts (BoC) representations of documents, understanding concept as "unit of meaning", and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria-that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text-thus suffering from synonymy and polysemy-and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge-concretely Wikipedia-in order to create bag-of-concepts (BoC) representations of documents, understanding concept as "unit of meaning", and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus. |
ArticleNumber | e1279 |
Audience | Academic |
Author | Mouriño García, Marcos Antonio Anido Rifón, Luis E. Pérez Rodríguez, Roberto |
Author_xml | – sequence: 1 givenname: Marcos Antonio surname: Mouriño García fullname: Mouriño García, Marcos Antonio organization: Department of Telematics Engineering, University of Vigo, Vigo, Spain – sequence: 2 givenname: Roberto surname: Pérez Rodríguez fullname: Pérez Rodríguez, Roberto organization: Department of Telematics Engineering, University of Vigo, Vigo, Spain – sequence: 3 givenname: Luis E. surname: Anido Rifón fullname: Anido Rifón, Luis E. organization: Department of Telematics Engineering, University of Vigo, Vigo, Spain |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/26468436$$D View this record in MEDLINE/PubMed |
BookMark | eNptkt9v1SAUxxsz4-bci3-AaWJijEmvhUIBH0zm4o8lS3zR-EgoPe3ljkIHrWb_vfTeTe9dBg-Qw-d8Dxy-z7Mj5x1k2UtUrhhD7P0IEDYrhJl4kp1gVLOCV1Qc7e2Ps7MYN2UaHNclr55lx7gmNSdVfZINn4wfoDVa2dyaCYKa5gC5tipG06XwZLzL52hcn4PTt9r6ccHza-f_WGh7-JCr_Je5NktYFY2K0OaN6gvfFdo7DeMUczWOwSu9fpE97ZSNcHa3nmY_v3z-cfGtuPr-9fLi_KrQlJVTgTVCWADgWqO6xa2oypoiQkDzmgiiBelK1rQdaXgDGBpcoVZXgrO2AkpLVJ1mlzvd1quNHIMZVLiVXhm5DfjQSxUmoy3IBhQD2oBIdQjlSLWYCEGVwLhDiOuk9XGnNc5N6pQGNwVlD0QPT5xZy97_loQKjChNAm_vBIK_mSFOcjBRg7XKgZ-jRAxjgSmv6oS-foBu_BxcapVEgjLOCSmr_1Sv0gOM63yqqxdReU5JRQjhokzU6hEqzRYGk34GOpPiBwlv9hLWoOy0jt7OiwPiIfhqvyP_WnHvqgSUO0AHH2OATmozbZ2UrmCsRKVcvCu33pWLd1PKuwcp96qPwH8BSObvVg |
CitedBy_id | crossref_primary_10_1007_s11517_021_02456_1 crossref_primary_10_1016_j_artmed_2018_04_007 crossref_primary_10_3414_ME17_01_0028 crossref_primary_10_1016_j_ins_2017_04_024 crossref_primary_10_1038_s41538_024_00299_2 crossref_primary_10_1186_s13326_016_0109_6 crossref_primary_10_1016_j_jbi_2016_03_026 crossref_primary_10_1093_bioinformatics_btab331 crossref_primary_10_1103_PhysRevE_94_022409 crossref_primary_10_1007_s00500_018_3101_5 |
Cites_doi | 10.1305/ndjfl/1093634995 10.1613/jair.2669 10.1093/nar/gkh061 10.1001/jama.1994.03510380059038 10.1007/BFb0026683 10.4018/jdwm.2007070101 10.1016/j.jbi.2011.12.009 10.7551/mitpress/6393.001.0001 10.1037/0033-295X.104.2.211 10.1007/11899402_10 10.1002/asi.21382 10.1007/978-3-540-24775-3_5 10.1023/A:1009982220290 10.1023/A:1007649029923 10.1186/1471-2105-7-58 10.1108/eb046814 10.1109/5254.708428 10.1145/361219.361220 10.1007/s10115-008-0152-4 10.3115/1220355.1220425 10.1007/978-3-540-36668-3_150 10.1145/1961209.1961211 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 10.1016/j.artint.2012.06.007 10.1145/505282.505283 10.1109/WAINA.2008.137 10.1002/asi.22689 |
ContentType | Journal Article |
Copyright | COPYRIGHT 2015 PeerJ. Ltd. 2015 Mouriño García et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. 2015 Mouriño García et al. 2015 Mouriño García et al. |
Copyright_xml | – notice: COPYRIGHT 2015 PeerJ. Ltd. – notice: 2015 Mouriño García et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: 2015 Mouriño García et al. 2015 Mouriño García et al. |
DBID | AAYXX CITATION NPM 3V. 7XB 88I 8FE 8FH 8FK ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M2P M7P PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U 7X8 5PM DOA |
DOI | 10.7717/peerj.1279 |
DatabaseName | CrossRef PubMed ProQuest Central (Corporate) ProQuest Central (purchase pre-March 2016) Science Database (Alumni Edition) ProQuest SciTech Collection ProQuest Natural Science Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni Edition) ProQuest Central UK/Ireland ProQuest Central Essentials Biological Science Collection ProQuest Central Natural Science Collection ProQuest One Community College ProQuest Central Korea ProQuest Central Student SciTech Premium Collection ProQuest Biological Science Collection Science Database (ProQuest) Biological Science Database ProQuest Central Premium ProQuest One Academic (New) ProQuest Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef PubMed Publicly Available Content Database ProQuest Central Student ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest Central (New) ProQuest Science Journals (Alumni Edition) ProQuest Biological Science Collection ProQuest Central Basic ProQuest Science Journals ProQuest One Academic Eastern Edition Biological Science Database ProQuest SciTech Collection ProQuest One Academic UKI Edition ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | CrossRef MEDLINE - Academic Publicly Available Content Database PubMed |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine |
EISSN | 2167-8359 |
ExternalDocumentID | oai_doaj_org_article_bea7e5be9d2d4581ad24995a922f118c PMC4592155 A543444890 26468436 10_7717_peerj_1279 |
Genre | Journal Article |
GrantInformation_xml | – fundername: Galician Regional Government grantid: GRC2013-006 – fundername: REDPLIR (Red Gallega de Procesamiento del Lenguaje y Recuperacion de Informacion) grantid: R2014/034 |
GroupedDBID | 53G 5VS 88I 8FE 8FH AAFWJ AAYXX ABUWG ADBBV ADRAZ AENEX AFKRA AFPKN ALMA_UNASSIGNED_HOLDINGS AOIJS AZQEC BAWUL BBNVY BCNDV BENPR BHPHI BPHCQ CCPQU CITATION DIK DWQXO GNUQQ GROUPED_DOAJ GX1 H13 HCIFZ HYE IAO IEA IHR IHW ITC KQ8 LK8 M2P M48 M7P M~E OK1 PHGZM PHGZT PIMPY PQQKQ PROAC RPM W2D YAO 3V. ECGQY NPM PMFND 7XB 8FK PKEHL PQEST PQGLB PQUKI PRINS Q9U 7X8 5PM PUEGO |
ID | FETCH-LOGICAL-c570t-2c1129ee26c16d2d93065144ec86494c94f07bdf4b8be2eb231dc3987d3e55013 |
IEDL.DBID | M48 |
ISSN | 2167-8359 |
IngestDate | Wed Aug 27 01:28:03 EDT 2025 Thu Aug 21 18:24:57 EDT 2025 Fri Jul 11 07:58:47 EDT 2025 Fri Jul 25 11:57:52 EDT 2025 Tue Jun 17 21:17:56 EDT 2025 Tue Jun 10 20:48:36 EDT 2025 Thu May 22 21:19:56 EDT 2025 Thu Jan 02 22:20:22 EST 2025 Tue Jul 01 02:29:45 EDT 2025 Thu Apr 24 23:03:55 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Document representation Encyclopedic knowledge OHSUMED Biomedical literature Classification Bag-of-words Wikipedia Bag-of-concepts |
Language | English |
License | http://creativecommons.org/licenses/by/4.0 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c570t-2c1129ee26c16d2d93065144ec86494c94f07bdf4b8be2eb231dc3987d3e55013 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
OpenAccessLink | https://www.proquest.com/docview/1957884403?pq-origsite=%requestingapplication% |
PMID | 26468436 |
PQID | 1957884403 |
PQPubID | 2045935 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_bea7e5be9d2d4581ad24995a922f118c pubmedcentral_primary_oai_pubmedcentral_nih_gov_4592155 proquest_miscellaneous_1722925836 proquest_journals_1957884403 gale_infotracmisc_A543444890 gale_infotracacademiconefile_A543444890 gale_healthsolutions_A543444890 pubmed_primary_26468436 crossref_citationtrail_10_7717_peerj_1279 crossref_primary_10_7717_peerj_1279 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2015-09-29 |
PublicationDateYYYYMMDD | 2015-09-29 |
PublicationDate_xml | – month: 09 year: 2015 text: 2015-09-29 day: 29 |
PublicationDecade | 2010 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States – name: San Diego – name: San Francisco, USA |
PublicationTitle | PeerJ (San Francisco, CA) |
PublicationTitleAlternate | PeerJ |
PublicationYear | 2015 |
Publisher | PeerJ. Ltd PeerJ, Inc PeerJ Inc |
Publisher_xml | – name: PeerJ. Ltd – name: PeerJ, Inc – name: PeerJ Inc |
References | Jonquet (10.7717/peerj.1279/ref-17) 2009; 2009 Landauer (10.7717/peerj.1279/ref-20) 1997; 104 Bodenreider (10.7717/peerj.1279/ref-5) 2004; 32 Lipscomb (10.7717/peerj.1279/ref-22) 2000; 88 Tsao (10.7717/peerj.1279/ref-38) 2013 Vivaldi (10.7717/peerj.1279/ref-40) 2010; 45 Yang (10.7717/peerj.1279/ref-43) 1999; 1 Kang (10.7717/peerj.1279/ref-18) 2012; 45 Zheng (10.7717/peerj.1279/ref-46) 2006; 7 Joachims (10.7717/peerj.1279/ref-16) 1998; Vol. 1398 Hearst (10.7717/peerj.1279/ref-14) 1998; 13 Pedregosa (10.7717/peerj.1279/ref-26) 2012; 12 Deerwester (10.7717/peerj.1279/ref-7) 1990; 41 Lowe (10.7717/peerj.1279/ref-23) 1994; 271 Zhou (10.7717/peerj.1279/ref-47) 2006; vol. 4099 Sahlgren (10.7717/peerj.1279/ref-30) 2008; 20 Settles (10.7717/peerj.1279/ref-35) 2010; 15 Wang (10.7717/peerj.1279/ref-42) 2007 Levelt (10.7717/peerj.1279/ref-21) 1993; vol. 1 Bloehdorn (10.7717/peerj.1279/ref-4) 2004; Vol. 3932 Gabrilovich (10.7717/peerj.1279/ref-10) 2007 Stock (10.7717/peerj.1279/ref-36) 2010; 61 Blizard (10.7717/peerj.1279/ref-3) 1988; 30 Blei (10.7717/peerj.1279/ref-2) 2003; 3 Godbole (10.7717/peerj.1279/ref-12) 2004; vol. 3056 Milne (10.7717/peerj.1279/ref-25) 2013; 194 Zhou (10.7717/peerj.1279/ref-49) 2008 Rigutini (10.7717/peerj.1279/ref-29) 2005; 2005 Zhang (10.7717/peerj.1279/ref-45) 2008 Elkin (10.7717/peerj.1279/ref-9) 1988 Harris (10.7717/peerj.1279/ref-13) 1968 Täckström (10.7717/peerj.1279/ref-37) 2005 Phan (10.7717/peerj.1279/ref-27) 2008 Dai (10.7717/peerj.1279/ref-6) 2008; 21 Egozi (10.7717/peerj.1279/ref-8) 2011; 29 Sebastiani (10.7717/peerj.1279/ref-34) 2002; 34 Yetisgen-Yildiz (10.7717/peerj.1279/ref-44) 2005 Kim (10.7717/peerj.1279/ref-19) 2005; 6 Porter (10.7717/peerj.1279/ref-28) 1980; 14 Salton (10.7717/peerj.1279/ref-32) 1975; 18 Aronson (10.7717/peerj.1279/ref-1) 2001 Gabrilovich (10.7717/peerj.1279/ref-11) 2009; 34 Schapire (10.7717/peerj.1279/ref-33) 2000; 39 Zhou (10.7717/peerj.1279/ref-48) 2008 Huang (10.7717/peerj.1279/ref-15) 2012; 63 Sahlgren (10.7717/peerj.1279/ref-31) 2004 Medelyan (10.7717/peerj.1279/ref-24) 2008 Tsoumakas (10.7717/peerj.1279/ref-39) 2007; 3 Wang (10.7717/peerj.1279/ref-41) 2008; 19 22239956 - J Biomed Inform. 2012 Jun;45(3):423-8 16466569 - BMC Bioinformatics. 2006 Feb 08;7:58 8151853 - JAMA. 1994 Apr 13;271(14):1103-8 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70 10928714 - Bull Med Libr Assoc. 2000 Jul;88(3):265-6 21347171 - Summit Transl Bioinform. 2009 Mar 01;2009:56-60 16779160 - AMIA Annu Symp Proc. 2005;:849-53 11825149 - Proc AMIA Symp. 2001;:17-21 |
References_xml | – volume: 3 start-page: 993 year: 2003 ident: 10.7717/peerj.1279/ref-2 article-title: Latent Dirichlet Allocation publication-title: Journal of Machine Learning Research – volume: 30 start-page: 36 issue: 1 year: 1988 ident: 10.7717/peerj.1279/ref-3 article-title: Multiset theory publication-title: Notre Dame Journal of Formal Logic doi: 10.1305/ndjfl/1093634995 – volume: 34 start-page: 443 year: 2009 ident: 10.7717/peerj.1279/ref-11 article-title: Wikipedia-based semantic interpretation for natural language processing publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.2669 – volume: 32 start-page: D267 year: 2004 ident: 10.7717/peerj.1279/ref-5 article-title: The Unified Medical Language System (UMLS): integrating biomedical terminology publication-title: Nucleic Acids Research doi: 10.1093/nar/gkh061 – volume: 271 start-page: 1103 year: 1994 ident: 10.7717/peerj.1279/ref-23 article-title: Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches publication-title: Journal of the American Medical Association doi: 10.1001/jama.1994.03510380059038 – start-page: 185 year: 1988 ident: 10.7717/peerj.1279/ref-9 article-title: Mapping to MeSH: the art of trapping MeSH equivalence from within narrative text – volume: Vol. 1398 start-page: 137 volume-title: Machine learning: ECML-98 year: 1998 ident: 10.7717/peerj.1279/ref-16 article-title: Text categorization with support vector machines: learning with many relevant features doi: 10.1007/BFb0026683 – volume: 45 start-page: 251 year: 2010 ident: 10.7717/peerj.1279/ref-40 article-title: Using Wikipedia for term extraction in the biomedical domain: first experiences publication-title: Procesamiento del Lenguaje Natural – volume: 3 start-page: 1 year: 2007 ident: 10.7717/peerj.1279/ref-39 article-title: Multi-label classification: an overview publication-title: International Journal of Data Warehousing and Mining doi: 10.4018/jdwm.2007070101 – volume: 45 start-page: 423 year: 2012 ident: 10.7717/peerj.1279/ref-18 article-title: Using an ensemble system to improve concept extraction from clinical records publication-title: Journal of Biomedical Informatics doi: 10.1016/j.jbi.2011.12.009 – volume: vol. 1 volume-title: Speaking: from intention to articulation year: 1993 ident: 10.7717/peerj.1279/ref-21 doi: 10.7551/mitpress/6393.001.0001 – start-page: 1606 year: 2007 ident: 10.7717/peerj.1279/ref-10 article-title: Computing semantic relatedness using wikipedia-based explicit semantic analysis – volume: 104 start-page: 211 issue: 2 year: 1997 ident: 10.7717/peerj.1279/ref-20 article-title: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge publication-title: Psychological Review doi: 10.1037/0033-295X.104.2.211 – volume: 15 start-page: 201 issue: 2 year: 2010 ident: 10.7717/peerj.1279/ref-35 article-title: Active learning literature survey publication-title: Machine Learning – volume: Vol. 3932 start-page: 149 volume-title: WebKDD year: 2004 ident: 10.7717/peerj.1279/ref-4 article-title: Boosting for text classification with semantic features doi: 10.1007/11899402_10 – start-page: 1 year: 2005 ident: 10.7717/peerj.1279/ref-37 article-title: An evaluation of bag-of-concepts representations in automatic text classification publication-title: Doctoral dissertation, KTH – start-page: 289 year: 2008 ident: 10.7717/peerj.1279/ref-49 article-title: Semantic smoothing for Bayesian text classification with small training data – volume: 61 start-page: 1951 issue: 10 year: 2010 ident: 10.7717/peerj.1279/ref-36 article-title: Concepts and semantic relations in information science publication-title: Journal of the American Society for Information Science and Technology doi: 10.1002/asi.21382 – volume: vol. 3056 start-page: 22 volume-title: Advances in knowledge discovery and data year: 2004 ident: 10.7717/peerj.1279/ref-12 article-title: Discriminative methods for multi-labeled classification doi: 10.1007/978-3-540-24775-3_5 – volume: 6 start-page: 37 year: 2005 ident: 10.7717/peerj.1279/ref-19 article-title: Dimension reduction in text classification with support vector machines publication-title: Journal of Machine Learning Research – volume: 1 start-page: 69 issue: 1 year: 1999 ident: 10.7717/peerj.1279/ref-43 article-title: An evaluation of statistical approaches to text categorization publication-title: Information Retrieval doi: 10.1023/A:1009982220290 – volume: 2005 start-page: 529 year: 2005 ident: 10.7717/peerj.1279/ref-29 article-title: An EM based training algorithm for cross-language text categorization – volume: 39 start-page: 135 year: 2000 ident: 10.7717/peerj.1279/ref-33 article-title: BoosTexter: a boosting-based system for text categorization publication-title: Machine Learning doi: 10.1023/A:1007649029923 – volume: 2009 start-page: 56 year: 2009 ident: 10.7717/peerj.1279/ref-17 article-title: The open biomedical annotator publication-title: Summit on Translational Bioinformatics – volume: 7 start-page: 58 year: 2006 ident: 10.7717/peerj.1279/ref-46 article-title: Identifying biological concepts from a protein-related corpus with a probabilistic topic model publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-7-58 – start-page: 289 year: 2008 ident: 10.7717/peerj.1279/ref-48 article-title: Semantic smoothing for Bayesian text classification with small training data – start-page: 1117 year: 2013 ident: 10.7717/peerj.1279/ref-38 article-title: Semantic naïve Bayes classifier for document classification – volume: 88 start-page: 265 issue: 3 year: 2000 ident: 10.7717/peerj.1279/ref-22 article-title: Medical subject headings (MeSH) publication-title: Bulletin of the Medical Library Association – volume: 14 start-page: 130 issue: 3 year: 1980 ident: 10.7717/peerj.1279/ref-28 article-title: An algorithm for suffix stripping publication-title: Program doi: 10.1108/eb046814 – start-page: 849 year: 2005 ident: 10.7717/peerj.1279/ref-44 article-title: The effect of feature representation on MEDLINE document classification publication-title: AMIA Annual Symposium Proceedings – volume: 13 start-page: 18 issue: 4 year: 1998 ident: 10.7717/peerj.1279/ref-14 article-title: Support vector machines publication-title: Intelligent Systems and their Applications, IEEE doi: 10.1109/5254.708428 – volume: 18 start-page: 613 issue: 11 year: 1975 ident: 10.7717/peerj.1279/ref-32 article-title: A vector space model for automatic indexing publication-title: Communications of the ACM doi: 10.1145/361219.361220 – volume: 20 start-page: 33 issue: 1 year: 2008 ident: 10.7717/peerj.1279/ref-30 article-title: The distributional hypothesis publication-title: Italian Journal of Linguistics – volume: 19 start-page: 265 issue: 3 year: 2008 ident: 10.7717/peerj.1279/ref-41 article-title: Using Wikipedia knowledge to improve text classification publication-title: Knowledge and Information Systems doi: 10.1007/s10115-008-0152-4 – start-page: 332 year: 2007 ident: 10.7717/peerj.1279/ref-42 article-title: Improving text classification by using encyclopedia knowledge – volume-title: Mathematical structures of language year: 1968 ident: 10.7717/peerj.1279/ref-13 – start-page: 17 year: 2001 ident: 10.7717/peerj.1279/ref-1 article-title: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program publication-title: AMIA Annual Symposium Proceedings – year: 2004 ident: 10.7717/peerj.1279/ref-31 article-title: Using bag-of-concepts to improve the performance of support vector machines in text categorization doi: 10.3115/1220355.1220425 – volume: vol. 4099 start-page: 1145 volume-title: PRICAI 2006: trends in artificial intelligence year: 2006 ident: 10.7717/peerj.1279/ref-47 article-title: MaxMatcher: biological concept extraction using approximate dictionary lookup doi: 10.1007/978-3-540-36668-3_150 – volume: 29 start-page: 1 issue: 2 year: 2011 ident: 10.7717/peerj.1279/ref-8 article-title: Concept-based information retrieval using explicit semantic analysis publication-title: ACM Transactions on Information Systems doi: 10.1145/1961209.1961211 – volume: 41 start-page: 391 year: 1990 ident: 10.7717/peerj.1279/ref-7 article-title: Indexing by latent semantic analysis publication-title: Journal of the American Society for Information Science doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 – volume: 194 start-page: 222 year: 2013 ident: 10.7717/peerj.1279/ref-25 article-title: An open-source toolkit for mining Wikipedia publication-title: Artificial Intelligence doi: 10.1016/j.artint.2012.06.007 – volume: 34 start-page: 1 issue: 1 year: 2002 ident: 10.7717/peerj.1279/ref-34 article-title: Machine learning in automated text categorization publication-title: ACM Computing Surveys doi: 10.1145/505282.505283 – volume: 12 start-page: 2825 year: 2012 ident: 10.7717/peerj.1279/ref-26 article-title: Scikit-learn: machine learning in python publication-title: Journal of Machine Learning Research – start-page: 19 year: 2008 ident: 10.7717/peerj.1279/ref-24 article-title: Topic indexing with Wikipedia – year: 2008 ident: 10.7717/peerj.1279/ref-45 article-title: An efficient feature selection using hidden topic in text categorization doi: 10.1109/WAINA.2008.137 – volume: 63 start-page: 1593 year: 2012 ident: 10.7717/peerj.1279/ref-15 article-title: Learning a concept-based document similarity measure publication-title: Journal of the American Society for Information Science and Technology doi: 10.1002/asi.22689 – start-page: 91 year: 2008 ident: 10.7717/peerj.1279/ref-27 article-title: Learning to classify short and sparse text & web with hidden topics from large-scale data collections – volume: 21 year: 2008 ident: 10.7717/peerj.1279/ref-6 article-title: An efficient solution for mapping free text to ontology terms – reference: 10928714 - Bull Med Libr Assoc. 2000 Jul;88(3):265-6 – reference: 16779160 - AMIA Annu Symp Proc. 2005;:849-53 – reference: 21347171 - Summit Transl Bioinform. 2009 Mar 01;2009:56-60 – reference: 8151853 - JAMA. 1994 Apr 13;271(14):1103-8 – reference: 11825149 - Proc AMIA Symp. 2001;:17-21 – reference: 16466569 - BMC Bioinformatics. 2006 Feb 08;7:58 – reference: 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70 – reference: 22239956 - J Biomed Inform. 2012 Jun;45(3):423-8 |
SSID | ssj0000826083 |
Score | 2.10247 |
Snippet | Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of... |
SourceID | doaj pubmedcentral proquest gale pubmed crossref |
SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source |
StartPage | e1279 |
SubjectTerms | Algorithms Analysis Artificial intelligence Automatic classification Bag-of-concepts Bioinformatics Biomedical literature Classification Computational Science Data mining Document representation Encyclopedic knowledge Human-Computer Interaction Information science International conferences Knowledge Language Linguistics Medical Subject Headings-MeSH Natural language processing Neural networks Science and Medical Education Semantic analysis Semantics Synonymy Text categorization Text editing Wikipedia |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwELZQD4gLojxDHxiBhDhETfyI7d5aRFUhlRMVvVnJ2CmrLtlVu_3_nXGyUSKQuHCNJ1IyM575vmRmzNhHE6pYGBfyVgckKJhAcqdB5CitEXCoIFO1-8X36vxSfbvSV5OjvqgmrB8P3CvuqIm1ibqJLoigtC3rgITB6doJ0SI4Boq-mPMmZCrFYETNCC76eaQGKcvROsZbjAuCarYmGSgN6v8zHE_y0bxWcpJ8zp6xpwNq5Cf90-6yR7F7zh5fDP_FX7Dfp6mLnhTOl-OgZA4EjakWKKmfU437NadeS1iu1iTOx09qx7zmPxc3C7pc55TbAm_q63zV5tB3Nt7x7fzxl-zy7OuPL-f5cJBCDtoUm1wAoaoYRQVlhXp0dFw8MqkItlJOgVNtYZrQqsY2USDXlmUA6awJMiKDKeUrttOtuviGcQ2lhsK6aKpCRelqWVgIyDkcBFc2VcY-b5XrYZgyToddLD2yDTKET4bwZIiMfRhl1_1sjb9KnZKNRgmah50uoJf4wUv8v7wkY-_Iwr5vLh13tT-hzlpkqK7I2KckQfsaHxjqoT0BX5smZM0k92eSuB9hvrz1Ij_EgztfOoyMVqlCZuz9uEx3Uo1bF1f3KGOEcEJbiUp83Tvd-NIIWyuraMXM3HGmlflKt_iVpoUr7RDW6bf_Q4177AkCRk31MsLts53N7X08QFC2aQ7T_nsAWmI32Q priority: 102 providerName: Directory of Open Access Journals – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3da9UwFA-6gfgy5nd1akRBfAhr06RJfJFd2RjChojDvYU2Sa-X3bV39979_zunza23KL42p9DkfP7S80HIB-WLkCrjWS09ABRwIMxIxxlQSwg4hM-7bPez8-L0Qny7lJfxwm0V0yo3NrEz1L51eEd-mBmQLS1Emn9Z3DCcGoV_V-MIjftkF0ywBgnfnRyff_8x3LKAgysgyOj7kiqALoeLEJZgHzjmbm15oq5h_99mecsvjXMmt5zQyT7Zi9EjPerZ_YjcC81j8uAs_h9_Qq4nXTU9HjydDw2TqcMQGXOCOjZQzHWfUqy5dPN2geR0uFr7TEv6a3Y1w8clQx_naVVOWVsz11c4ruimD_lTcnFy_PPrKYsDFZiTKl0z7jC6CoEXLis89wbHxgOiCk4XwghnRJ2qytei0lXggLnzzLvcaOXzAEgmy5-RnaZtwgtCpcukS7UJqkhFyE2Zp9p5wB7GeZNVRUI-bQ7XuthtHIdezC2gDmSE7RhhkREJeT_QLvoeG_-kmiCPBgrsi909aJdTG9XMVqFUQVbBwO6E1FnpAV4aWRrOa4BSLiFvkcO2LzIdtNseYYUtIFWTJuRjR4H6DR_sylimANvGTlkjyoMRJeilGy9vpMhGu7Cyf6Q4Ie-GZXwTc92a0N4CjeLccKlzOMTnvdANm4bwtdACV9RIHEenMl5pZr-7ruFCGgjv5Mv_f9Yr8hBCQokZMdwckJ318ja8hrBrXb2JunUHssEwRw priority: 102 providerName: ProQuest |
Title | Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach |
URI | https://www.ncbi.nlm.nih.gov/pubmed/26468436 https://www.proquest.com/docview/1957884403 https://www.proquest.com/docview/1722925836 https://pubmed.ncbi.nlm.nih.gov/PMC4592155 https://doaj.org/article/bea7e5be9d2d4581ad24995a922f118c |
Volume | 3 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3db9MwED-NTUK8IL4JjGIEEuIhI3HsOOYFrWhjQuqEEBV9sxLbKRWlHV0nwX_PnfPBAuOlD8mlqn13vvuld78DeKFc7hOlXVxLhwAFA0ispeUxSktMOITLQrX75DQ_mYoPMznbgW5-Z7uB51dCO5onNd0sD37--PUWHR7z1wOFaOT1mfcbdHmu9DXYw4ikaITDpE3zw4mMOXQSGDk50Xxj0qEbptK_Hidm4FzkhQiEzX_CVGDz__fMvhS0hgWVlyLU8S242aaW7LCxhduw41d34Pqk_fP8Lnwfh1Z70gpb9mzKzFL-TAVDQUeMCuHnjBoy7RK3AsVZ_97tDSvZl8W3BV0uYwqAjlXlPF7XsW3aH89ZR1J-D6bHR5_fncTttIXYSpVsY24p9fKe5zbNHXeaZsoj3PK2yIUWVos6UZWrRVVUniMgz1JnM10ol3mEOWl2H3ZX65V_CEzaVNqk0F7lifCZLrOksA6BibZOp1Uewatuc41tqchpIsbSICQhnZigE0M6ieB5L3vWEHBcKTUmHfUSRJodLqw3c9P6oKl8qbysvMbVCVmkpUPsqWWpOa8RZ9kInpKGTdOB2ru-OaT2W4SxOongZZAgc8QfbMu2hwGXTTRaA8n9gSQ6rR3e7qzIdDZvUo3HZyFEkkXwrL9NT1Ih3MqvL1BGca65LDLcxAeN0fWL7mw3AjUwx8GuDO-sFl8DpbiQGnM_-ei_3_kYbmCqKKlShut92N1uLvwTTMe21Qj2xkenHz-NwusM_Hw_S0fB_34DZT42kg |
linkProvider | Scholars Portal |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwED-NTgJeEN8UBjMChHiIljh2EiMhtMKmjq0VQpu2N5PYbletNKXthPin-Bu5yxeNQLztNb5Wsc---_2c-wB4GdvI-bGy3khaJCjoQDwlDfdQWiLgEDYsot0Hw6h_Ij6dybMN-FXnwlBYZW0TC0Ntc0N35DuBwr2VCOGH7-ffPeoaRV9X6xYa5bY4dD9_IGVbvjv4iPp9xfn-3vGHvld1FfCMjP2Vxw1BDOd4ZILIcquodzrSCmeSSChhlBj5cWZHIksyx5F4hoE1IVJzGzqE80GI_3sNNkWIUKEDm7294ecvza0OOtQIQU1ZBzVGqrQzd26B9ohTrNia5ysaBPztBtb8YDtGc83p7d-GWxVaZbvl9roDG252F64Pqu_x9-Bbr8jeJ0WzaVOgmRmC5BSDVKidUWz9mFGOp5nmcxJnzVXeW5ay08nFhB6nHvlUy7J07OUjz5QZlUtW1z2_DydXstQPoDPLZ-4RMGkCafxEuTjyhQtVGvqJsch1lLEqyKIuvKkXV5uqujk12ZhqZDmkCF0oQpMiuvCikZ2XNT3-KdUjHTUSVIe7eJAvxro61jpzaexk5hTOTsgkSC3SWSVTxfkIqZvpwjZpWJdJrY010buU0YvMWPldeF1IkD3BFzZplRaB06bKXC3JrZYk2gHTHq53ka7s0FL_OTVdeN4M0y8ptm7m8kuUiTlXXCYhLuLDctM1k0a4HCWCRuLWdmytSntkNjkvqpQLqRBOysf_f61tuNE_Hhzpo4Ph4RO4iXBUUjQOV1vQWS0u3VOEfKvsWXXOGHy96qP9Gw81bCo |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFD4anTTxgrhTGMwIEOIhauLYSYyE0Mo2bYxVE2Jib15iO6WiNKXthPhr_DrOyY1GIN72Gp9Wsc_1c84F4HlsI-fHynq5tAhQ0IF4ShruIbXEgEPYsMx2PxlFh2fi_bk834BfTS0MpVU2NrE01LYwdEc-CBTKViKEHw7yOi3idO_g7fy7RxOk6EtrM06jEpFj9_MHwrflm6M95PULzg_2P7079OoJA56Rsb_yuKFwwzkemSCy3Cqao44Qw5kkEkoYJXI_zmwusiRzHEFoGFgTIky3ocPQPgjxf6_BZkzloz3YHO6PTj-2NzzoXCMMcKqeqDHCpsHcuQXaJk55Y2tesBwW8LdLWPOJ3XzNNQd4cBNu1JEr261E7RZsuNlt2Dqpv83fgW_DspKfmM6mbbNmZig8p3ykUgQY5dmPGdV7mmkxJ3LWXuu9Zin7PPk6ocepR_7Vsiwde0Xumaq6csmaHuh34exKjvoe9GbFzD0AJk0gjZ8oF0e-cKFKQz8xFnGPMlYFWdSHV83halN3OqeBG1ONiIcYoUtGaGJEH561tPOqv8c_qYbEo5aCenKXD4rFWNcqrjOXxk5mTuHuhEyC1CK0VTJVnOcI40wfdojDuipwbS2L3qXqXkTJyu_Dy5KCbAu-sEnrEgncNnXp6lBudyjRJpjuciNFurZJS_1Hg_rwtF2mX1Ke3cwVl0gTc664TEI8xPuV0LWbxtA5SgStxB1x7JxKd2U2-VJ2LBdSYWgpH_7_tXZgC1VafzgaHT-C6xiZSkrM4WobeqvFpXuM0d8qe1KrGYOLq9bs3wwYcGg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Biomedical+literature+classification+using+encyclopedic+knowledge%3A+a+Wikipedia-based+bag-of-concepts+approach&rft.jtitle=PeerJ+%28San+Francisco%2C+CA%29&rft.au=Mouri%C3%B1o+Garc%C3%ADa%2C+Marcos+Antonio&rft.au=P%C3%A9rez+Rodr%C3%ADguez%2C+Roberto&rft.au=Anido+Rif%C3%B3n%2C+Luis+E&rft.date=2015-09-29&rft.issn=2167-8359&rft.eissn=2167-8359&rft.volume=3&rft.spage=e1279&rft_id=info:doi/10.7717%2Fpeerj.1279&rft_id=info%3Apmid%2F26468436&rft.externalDocID=26468436 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-8359&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-8359&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-8359&client=summon |