Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach

Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers hav...

Full description

Saved in:
Bibliographic Details
Published inPeerJ (San Francisco, CA) Vol. 3; p. e1279
Main Authors Mouriño García, Marcos Antonio, Pérez Rodríguez, Roberto, Anido Rifón, Luis E.
Format Journal Article
LanguageEnglish
Published United States PeerJ. Ltd 29.09.2015
PeerJ, Inc
PeerJ Inc
Subjects
Online AccessGet full text
ISSN2167-8359
2167-8359
DOI10.7717/peerj.1279

Cover

Loading…
Abstract Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria—that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text—thus suffering from synonymy and polysemy—and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge—concretely Wikipedia—in order to create bag-of-concepts (BoC) representations of documents, understanding concept as “unit of meaning”, and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.
AbstractList Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria—that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text—thus suffering from synonymy and polysemy—and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge—concretely Wikipedia—in order to create bag-of-concepts (BoC) representations of documents, understanding concept as “unit of meaning”, and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.
Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria-that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text-thus suffering from synonymy and polysemy-and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge-concretely Wikipedia-in order to create bag-of-concepts (BoC) representations of documents, understanding concept as "unit of meaning", and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria-that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text-thus suffering from synonymy and polysemy-and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge-concretely Wikipedia-in order to create bag-of-concepts (BoC) representations of documents, understanding concept as "unit of meaning", and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.
ArticleNumber e1279
Audience Academic
Author Mouriño García, Marcos Antonio
Anido Rifón, Luis E.
Pérez Rodríguez, Roberto
Author_xml – sequence: 1
  givenname: Marcos Antonio
  surname: Mouriño García
  fullname: Mouriño García, Marcos Antonio
  organization: Department of Telematics Engineering, University of Vigo, Vigo, Spain
– sequence: 2
  givenname: Roberto
  surname: Pérez Rodríguez
  fullname: Pérez Rodríguez, Roberto
  organization: Department of Telematics Engineering, University of Vigo, Vigo, Spain
– sequence: 3
  givenname: Luis E.
  surname: Anido Rifón
  fullname: Anido Rifón, Luis E.
  organization: Department of Telematics Engineering, University of Vigo, Vigo, Spain
BackLink https://www.ncbi.nlm.nih.gov/pubmed/26468436$$D View this record in MEDLINE/PubMed
BookMark eNptkt9v1SAUxxsz4-bci3-AaWJijEmvhUIBH0zm4o8lS3zR-EgoPe3ljkIHrWb_vfTeTe9dBg-Qw-d8Dxy-z7Mj5x1k2UtUrhhD7P0IEDYrhJl4kp1gVLOCV1Qc7e2Ps7MYN2UaHNclr55lx7gmNSdVfZINn4wfoDVa2dyaCYKa5gC5tipG06XwZLzL52hcn4PTt9r6ccHza-f_WGh7-JCr_Je5NktYFY2K0OaN6gvfFdo7DeMUczWOwSu9fpE97ZSNcHa3nmY_v3z-cfGtuPr-9fLi_KrQlJVTgTVCWADgWqO6xa2oypoiQkDzmgiiBelK1rQdaXgDGBpcoVZXgrO2AkpLVJ1mlzvd1quNHIMZVLiVXhm5DfjQSxUmoy3IBhQD2oBIdQjlSLWYCEGVwLhDiOuk9XGnNc5N6pQGNwVlD0QPT5xZy97_loQKjChNAm_vBIK_mSFOcjBRg7XKgZ-jRAxjgSmv6oS-foBu_BxcapVEgjLOCSmr_1Sv0gOM63yqqxdReU5JRQjhokzU6hEqzRYGk34GOpPiBwlv9hLWoOy0jt7OiwPiIfhqvyP_WnHvqgSUO0AHH2OATmozbZ2UrmCsRKVcvCu33pWLd1PKuwcp96qPwH8BSObvVg
CitedBy_id crossref_primary_10_1007_s11517_021_02456_1
crossref_primary_10_1016_j_artmed_2018_04_007
crossref_primary_10_3414_ME17_01_0028
crossref_primary_10_1016_j_ins_2017_04_024
crossref_primary_10_1038_s41538_024_00299_2
crossref_primary_10_1186_s13326_016_0109_6
crossref_primary_10_1016_j_jbi_2016_03_026
crossref_primary_10_1093_bioinformatics_btab331
crossref_primary_10_1103_PhysRevE_94_022409
crossref_primary_10_1007_s00500_018_3101_5
Cites_doi 10.1305/ndjfl/1093634995
10.1613/jair.2669
10.1093/nar/gkh061
10.1001/jama.1994.03510380059038
10.1007/BFb0026683
10.4018/jdwm.2007070101
10.1016/j.jbi.2011.12.009
10.7551/mitpress/6393.001.0001
10.1037/0033-295X.104.2.211
10.1007/11899402_10
10.1002/asi.21382
10.1007/978-3-540-24775-3_5
10.1023/A:1009982220290
10.1023/A:1007649029923
10.1186/1471-2105-7-58
10.1108/eb046814
10.1109/5254.708428
10.1145/361219.361220
10.1007/s10115-008-0152-4
10.3115/1220355.1220425
10.1007/978-3-540-36668-3_150
10.1145/1961209.1961211
10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
10.1016/j.artint.2012.06.007
10.1145/505282.505283
10.1109/WAINA.2008.137
10.1002/asi.22689
ContentType Journal Article
Copyright COPYRIGHT 2015 PeerJ. Ltd.
2015 Mouriño García et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2015 Mouriño García et al. 2015 Mouriño García et al.
Copyright_xml – notice: COPYRIGHT 2015 PeerJ. Ltd.
– notice: 2015 Mouriño García et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2015 Mouriño García et al. 2015 Mouriño García et al.
DBID AAYXX
CITATION
NPM
3V.
7XB
88I
8FE
8FH
8FK
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M2P
M7P
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
7X8
5PM
DOA
DOI 10.7717/peerj.1279
DatabaseName CrossRef
PubMed
ProQuest Central (Corporate)
ProQuest Central (purchase pre-March 2016)
Science Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni Edition)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One Community College
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
ProQuest Biological Science Collection
Science Database (ProQuest)
Biological Science Database
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
PubMed
Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest Central (New)
ProQuest Science Journals (Alumni Edition)
ProQuest Biological Science Collection
ProQuest Central Basic
ProQuest Science Journals
ProQuest One Academic Eastern Edition
Biological Science Database
ProQuest SciTech Collection
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList CrossRef
MEDLINE - Academic
Publicly Available Content Database


PubMed


Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 2167-8359
ExternalDocumentID oai_doaj_org_article_bea7e5be9d2d4581ad24995a922f118c
PMC4592155
A543444890
26468436
10_7717_peerj_1279
Genre Journal Article
GrantInformation_xml – fundername: Galician Regional Government
  grantid: GRC2013-006
– fundername: REDPLIR (Red Gallega de Procesamiento del Lenguaje y Recuperacion de Informacion)
  grantid: R2014/034
GroupedDBID 53G
5VS
88I
8FE
8FH
AAFWJ
AAYXX
ABUWG
ADBBV
ADRAZ
AENEX
AFKRA
AFPKN
ALMA_UNASSIGNED_HOLDINGS
AOIJS
AZQEC
BAWUL
BBNVY
BCNDV
BENPR
BHPHI
BPHCQ
CCPQU
CITATION
DIK
DWQXO
GNUQQ
GROUPED_DOAJ
GX1
H13
HCIFZ
HYE
IAO
IEA
IHR
IHW
ITC
KQ8
LK8
M2P
M48
M7P
M~E
OK1
PHGZM
PHGZT
PIMPY
PQQKQ
PROAC
RPM
W2D
YAO
3V.
ECGQY
NPM
PMFND
7XB
8FK
PKEHL
PQEST
PQGLB
PQUKI
PRINS
Q9U
7X8
5PM
PUEGO
ID FETCH-LOGICAL-c570t-2c1129ee26c16d2d93065144ec86494c94f07bdf4b8be2eb231dc3987d3e55013
IEDL.DBID M48
ISSN 2167-8359
IngestDate Wed Aug 27 01:28:03 EDT 2025
Thu Aug 21 18:24:57 EDT 2025
Fri Jul 11 07:58:47 EDT 2025
Fri Jul 25 11:57:52 EDT 2025
Tue Jun 17 21:17:56 EDT 2025
Tue Jun 10 20:48:36 EDT 2025
Thu May 22 21:19:56 EDT 2025
Thu Jan 02 22:20:22 EST 2025
Tue Jul 01 02:29:45 EDT 2025
Thu Apr 24 23:03:55 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Document representation
Encyclopedic knowledge
OHSUMED
Biomedical literature
Classification
Bag-of-words
Wikipedia
Bag-of-concepts
Language English
License http://creativecommons.org/licenses/by/4.0
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c570t-2c1129ee26c16d2d93065144ec86494c94f07bdf4b8be2eb231dc3987d3e55013
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
OpenAccessLink https://www.proquest.com/docview/1957884403?pq-origsite=%requestingapplication%
PMID 26468436
PQID 1957884403
PQPubID 2045935
ParticipantIDs doaj_primary_oai_doaj_org_article_bea7e5be9d2d4581ad24995a922f118c
pubmedcentral_primary_oai_pubmedcentral_nih_gov_4592155
proquest_miscellaneous_1722925836
proquest_journals_1957884403
gale_infotracmisc_A543444890
gale_infotracacademiconefile_A543444890
gale_healthsolutions_A543444890
pubmed_primary_26468436
crossref_citationtrail_10_7717_peerj_1279
crossref_primary_10_7717_peerj_1279
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2015-09-29
PublicationDateYYYYMMDD 2015-09-29
PublicationDate_xml – month: 09
  year: 2015
  text: 2015-09-29
  day: 29
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: San Diego
– name: San Francisco, USA
PublicationTitle PeerJ (San Francisco, CA)
PublicationTitleAlternate PeerJ
PublicationYear 2015
Publisher PeerJ. Ltd
PeerJ, Inc
PeerJ Inc
Publisher_xml – name: PeerJ. Ltd
– name: PeerJ, Inc
– name: PeerJ Inc
References Jonquet (10.7717/peerj.1279/ref-17) 2009; 2009
Landauer (10.7717/peerj.1279/ref-20) 1997; 104
Bodenreider (10.7717/peerj.1279/ref-5) 2004; 32
Lipscomb (10.7717/peerj.1279/ref-22) 2000; 88
Tsao (10.7717/peerj.1279/ref-38) 2013
Vivaldi (10.7717/peerj.1279/ref-40) 2010; 45
Yang (10.7717/peerj.1279/ref-43) 1999; 1
Kang (10.7717/peerj.1279/ref-18) 2012; 45
Zheng (10.7717/peerj.1279/ref-46) 2006; 7
Joachims (10.7717/peerj.1279/ref-16) 1998; Vol. 1398
Hearst (10.7717/peerj.1279/ref-14) 1998; 13
Pedregosa (10.7717/peerj.1279/ref-26) 2012; 12
Deerwester (10.7717/peerj.1279/ref-7) 1990; 41
Lowe (10.7717/peerj.1279/ref-23) 1994; 271
Zhou (10.7717/peerj.1279/ref-47) 2006; vol. 4099
Sahlgren (10.7717/peerj.1279/ref-30) 2008; 20
Settles (10.7717/peerj.1279/ref-35) 2010; 15
Wang (10.7717/peerj.1279/ref-42) 2007
Levelt (10.7717/peerj.1279/ref-21) 1993; vol. 1
Bloehdorn (10.7717/peerj.1279/ref-4) 2004; Vol. 3932
Gabrilovich (10.7717/peerj.1279/ref-10) 2007
Stock (10.7717/peerj.1279/ref-36) 2010; 61
Blizard (10.7717/peerj.1279/ref-3) 1988; 30
Blei (10.7717/peerj.1279/ref-2) 2003; 3
Godbole (10.7717/peerj.1279/ref-12) 2004; vol. 3056
Milne (10.7717/peerj.1279/ref-25) 2013; 194
Zhou (10.7717/peerj.1279/ref-49) 2008
Rigutini (10.7717/peerj.1279/ref-29) 2005; 2005
Zhang (10.7717/peerj.1279/ref-45) 2008
Elkin (10.7717/peerj.1279/ref-9) 1988
Harris (10.7717/peerj.1279/ref-13) 1968
Täckström (10.7717/peerj.1279/ref-37) 2005
Phan (10.7717/peerj.1279/ref-27) 2008
Dai (10.7717/peerj.1279/ref-6) 2008; 21
Egozi (10.7717/peerj.1279/ref-8) 2011; 29
Sebastiani (10.7717/peerj.1279/ref-34) 2002; 34
Yetisgen-Yildiz (10.7717/peerj.1279/ref-44) 2005
Kim (10.7717/peerj.1279/ref-19) 2005; 6
Porter (10.7717/peerj.1279/ref-28) 1980; 14
Salton (10.7717/peerj.1279/ref-32) 1975; 18
Aronson (10.7717/peerj.1279/ref-1) 2001
Gabrilovich (10.7717/peerj.1279/ref-11) 2009; 34
Schapire (10.7717/peerj.1279/ref-33) 2000; 39
Zhou (10.7717/peerj.1279/ref-48) 2008
Huang (10.7717/peerj.1279/ref-15) 2012; 63
Sahlgren (10.7717/peerj.1279/ref-31) 2004
Medelyan (10.7717/peerj.1279/ref-24) 2008
Tsoumakas (10.7717/peerj.1279/ref-39) 2007; 3
Wang (10.7717/peerj.1279/ref-41) 2008; 19
22239956 - J Biomed Inform. 2012 Jun;45(3):423-8
16466569 - BMC Bioinformatics. 2006 Feb 08;7:58
8151853 - JAMA. 1994 Apr 13;271(14):1103-8
14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
10928714 - Bull Med Libr Assoc. 2000 Jul;88(3):265-6
21347171 - Summit Transl Bioinform. 2009 Mar 01;2009:56-60
16779160 - AMIA Annu Symp Proc. 2005;:849-53
11825149 - Proc AMIA Symp. 2001;:17-21
References_xml – volume: 3
  start-page: 993
  year: 2003
  ident: 10.7717/peerj.1279/ref-2
  article-title: Latent Dirichlet Allocation
  publication-title: Journal of Machine Learning Research
– volume: 30
  start-page: 36
  issue: 1
  year: 1988
  ident: 10.7717/peerj.1279/ref-3
  article-title: Multiset theory
  publication-title: Notre Dame Journal of Formal Logic
  doi: 10.1305/ndjfl/1093634995
– volume: 34
  start-page: 443
  year: 2009
  ident: 10.7717/peerj.1279/ref-11
  article-title: Wikipedia-based semantic interpretation for natural language processing
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.2669
– volume: 32
  start-page: D267
  year: 2004
  ident: 10.7717/peerj.1279/ref-5
  article-title: The Unified Medical Language System (UMLS): integrating biomedical terminology
  publication-title: Nucleic Acids Research
  doi: 10.1093/nar/gkh061
– volume: 271
  start-page: 1103
  year: 1994
  ident: 10.7717/peerj.1279/ref-23
  article-title: Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches
  publication-title: Journal of the American Medical Association
  doi: 10.1001/jama.1994.03510380059038
– start-page: 185
  year: 1988
  ident: 10.7717/peerj.1279/ref-9
  article-title: Mapping to MeSH: the art of trapping MeSH equivalence from within narrative text
– volume: Vol. 1398
  start-page: 137
  volume-title: Machine learning: ECML-98
  year: 1998
  ident: 10.7717/peerj.1279/ref-16
  article-title: Text categorization with support vector machines: learning with many relevant features
  doi: 10.1007/BFb0026683
– volume: 45
  start-page: 251
  year: 2010
  ident: 10.7717/peerj.1279/ref-40
  article-title: Using Wikipedia for term extraction in the biomedical domain: first experiences
  publication-title: Procesamiento del Lenguaje Natural
– volume: 3
  start-page: 1
  year: 2007
  ident: 10.7717/peerj.1279/ref-39
  article-title: Multi-label classification: an overview
  publication-title: International Journal of Data Warehousing and Mining
  doi: 10.4018/jdwm.2007070101
– volume: 45
  start-page: 423
  year: 2012
  ident: 10.7717/peerj.1279/ref-18
  article-title: Using an ensemble system to improve concept extraction from clinical records
  publication-title: Journal of Biomedical Informatics
  doi: 10.1016/j.jbi.2011.12.009
– volume: vol. 1
  volume-title: Speaking: from intention to articulation
  year: 1993
  ident: 10.7717/peerj.1279/ref-21
  doi: 10.7551/mitpress/6393.001.0001
– start-page: 1606
  year: 2007
  ident: 10.7717/peerj.1279/ref-10
  article-title: Computing semantic relatedness using wikipedia-based explicit semantic analysis
– volume: 104
  start-page: 211
  issue: 2
  year: 1997
  ident: 10.7717/peerj.1279/ref-20
  article-title: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge
  publication-title: Psychological Review
  doi: 10.1037/0033-295X.104.2.211
– volume: 15
  start-page: 201
  issue: 2
  year: 2010
  ident: 10.7717/peerj.1279/ref-35
  article-title: Active learning literature survey
  publication-title: Machine Learning
– volume: Vol. 3932
  start-page: 149
  volume-title: WebKDD
  year: 2004
  ident: 10.7717/peerj.1279/ref-4
  article-title: Boosting for text classification with semantic features
  doi: 10.1007/11899402_10
– start-page: 1
  year: 2005
  ident: 10.7717/peerj.1279/ref-37
  article-title: An evaluation of bag-of-concepts representations in automatic text classification
  publication-title: Doctoral dissertation, KTH
– start-page: 289
  year: 2008
  ident: 10.7717/peerj.1279/ref-49
  article-title: Semantic smoothing for Bayesian text classification with small training data
– volume: 61
  start-page: 1951
  issue: 10
  year: 2010
  ident: 10.7717/peerj.1279/ref-36
  article-title: Concepts and semantic relations in information science
  publication-title: Journal of the American Society for Information Science and Technology
  doi: 10.1002/asi.21382
– volume: vol. 3056
  start-page: 22
  volume-title: Advances in knowledge discovery and data
  year: 2004
  ident: 10.7717/peerj.1279/ref-12
  article-title: Discriminative methods for multi-labeled classification
  doi: 10.1007/978-3-540-24775-3_5
– volume: 6
  start-page: 37
  year: 2005
  ident: 10.7717/peerj.1279/ref-19
  article-title: Dimension reduction in text classification with support vector machines
  publication-title: Journal of Machine Learning Research
– volume: 1
  start-page: 69
  issue: 1
  year: 1999
  ident: 10.7717/peerj.1279/ref-43
  article-title: An evaluation of statistical approaches to text categorization
  publication-title: Information Retrieval
  doi: 10.1023/A:1009982220290
– volume: 2005
  start-page: 529
  year: 2005
  ident: 10.7717/peerj.1279/ref-29
  article-title: An EM based training algorithm for cross-language text categorization
– volume: 39
  start-page: 135
  year: 2000
  ident: 10.7717/peerj.1279/ref-33
  article-title: BoosTexter: a boosting-based system for text categorization
  publication-title: Machine Learning
  doi: 10.1023/A:1007649029923
– volume: 2009
  start-page: 56
  year: 2009
  ident: 10.7717/peerj.1279/ref-17
  article-title: The open biomedical annotator
  publication-title: Summit on Translational Bioinformatics
– volume: 7
  start-page: 58
  year: 2006
  ident: 10.7717/peerj.1279/ref-46
  article-title: Identifying biological concepts from a protein-related corpus with a probabilistic topic model
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-7-58
– start-page: 289
  year: 2008
  ident: 10.7717/peerj.1279/ref-48
  article-title: Semantic smoothing for Bayesian text classification with small training data
– start-page: 1117
  year: 2013
  ident: 10.7717/peerj.1279/ref-38
  article-title: Semantic naïve Bayes classifier for document classification
– volume: 88
  start-page: 265
  issue: 3
  year: 2000
  ident: 10.7717/peerj.1279/ref-22
  article-title: Medical subject headings (MeSH)
  publication-title: Bulletin of the Medical Library Association
– volume: 14
  start-page: 130
  issue: 3
  year: 1980
  ident: 10.7717/peerj.1279/ref-28
  article-title: An algorithm for suffix stripping
  publication-title: Program
  doi: 10.1108/eb046814
– start-page: 849
  year: 2005
  ident: 10.7717/peerj.1279/ref-44
  article-title: The effect of feature representation on MEDLINE document classification
  publication-title: AMIA Annual Symposium Proceedings
– volume: 13
  start-page: 18
  issue: 4
  year: 1998
  ident: 10.7717/peerj.1279/ref-14
  article-title: Support vector machines
  publication-title: Intelligent Systems and their Applications, IEEE
  doi: 10.1109/5254.708428
– volume: 18
  start-page: 613
  issue: 11
  year: 1975
  ident: 10.7717/peerj.1279/ref-32
  article-title: A vector space model for automatic indexing
  publication-title: Communications of the ACM
  doi: 10.1145/361219.361220
– volume: 20
  start-page: 33
  issue: 1
  year: 2008
  ident: 10.7717/peerj.1279/ref-30
  article-title: The distributional hypothesis
  publication-title: Italian Journal of Linguistics
– volume: 19
  start-page: 265
  issue: 3
  year: 2008
  ident: 10.7717/peerj.1279/ref-41
  article-title: Using Wikipedia knowledge to improve text classification
  publication-title: Knowledge and Information Systems
  doi: 10.1007/s10115-008-0152-4
– start-page: 332
  year: 2007
  ident: 10.7717/peerj.1279/ref-42
  article-title: Improving text classification by using encyclopedia knowledge
– volume-title: Mathematical structures of language
  year: 1968
  ident: 10.7717/peerj.1279/ref-13
– start-page: 17
  year: 2001
  ident: 10.7717/peerj.1279/ref-1
  article-title: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program
  publication-title: AMIA Annual Symposium Proceedings
– year: 2004
  ident: 10.7717/peerj.1279/ref-31
  article-title: Using bag-of-concepts to improve the performance of support vector machines in text categorization
  doi: 10.3115/1220355.1220425
– volume: vol. 4099
  start-page: 1145
  volume-title: PRICAI 2006: trends in artificial intelligence
  year: 2006
  ident: 10.7717/peerj.1279/ref-47
  article-title: MaxMatcher: biological concept extraction using approximate dictionary lookup
  doi: 10.1007/978-3-540-36668-3_150
– volume: 29
  start-page: 1
  issue: 2
  year: 2011
  ident: 10.7717/peerj.1279/ref-8
  article-title: Concept-based information retrieval using explicit semantic analysis
  publication-title: ACM Transactions on Information Systems
  doi: 10.1145/1961209.1961211
– volume: 41
  start-page: 391
  year: 1990
  ident: 10.7717/peerj.1279/ref-7
  article-title: Indexing by latent semantic analysis
  publication-title: Journal of the American Society for Information Science
  doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
– volume: 194
  start-page: 222
  year: 2013
  ident: 10.7717/peerj.1279/ref-25
  article-title: An open-source toolkit for mining Wikipedia
  publication-title: Artificial Intelligence
  doi: 10.1016/j.artint.2012.06.007
– volume: 34
  start-page: 1
  issue: 1
  year: 2002
  ident: 10.7717/peerj.1279/ref-34
  article-title: Machine learning in automated text categorization
  publication-title: ACM Computing Surveys
  doi: 10.1145/505282.505283
– volume: 12
  start-page: 2825
  year: 2012
  ident: 10.7717/peerj.1279/ref-26
  article-title: Scikit-learn: machine learning in python
  publication-title: Journal of Machine Learning Research
– start-page: 19
  year: 2008
  ident: 10.7717/peerj.1279/ref-24
  article-title: Topic indexing with Wikipedia
– year: 2008
  ident: 10.7717/peerj.1279/ref-45
  article-title: An efficient feature selection using hidden topic in text categorization
  doi: 10.1109/WAINA.2008.137
– volume: 63
  start-page: 1593
  year: 2012
  ident: 10.7717/peerj.1279/ref-15
  article-title: Learning a concept-based document similarity measure
  publication-title: Journal of the American Society for Information Science and Technology
  doi: 10.1002/asi.22689
– start-page: 91
  year: 2008
  ident: 10.7717/peerj.1279/ref-27
  article-title: Learning to classify short and sparse text & web with hidden topics from large-scale data collections
– volume: 21
  year: 2008
  ident: 10.7717/peerj.1279/ref-6
  article-title: An efficient solution for mapping free text to ontology terms
– reference: 10928714 - Bull Med Libr Assoc. 2000 Jul;88(3):265-6
– reference: 16779160 - AMIA Annu Symp Proc. 2005;:849-53
– reference: 21347171 - Summit Transl Bioinform. 2009 Mar 01;2009:56-60
– reference: 8151853 - JAMA. 1994 Apr 13;271(14):1103-8
– reference: 11825149 - Proc AMIA Symp. 2001;:17-21
– reference: 16466569 - BMC Bioinformatics. 2006 Feb 08;7:58
– reference: 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
– reference: 22239956 - J Biomed Inform. 2012 Jun;45(3):423-8
SSID ssj0000826083
Score 2.10247
Snippet Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of...
SourceID doaj
pubmedcentral
proquest
gale
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e1279
SubjectTerms Algorithms
Analysis
Artificial intelligence
Automatic classification
Bag-of-concepts
Bioinformatics
Biomedical literature
Classification
Computational Science
Data mining
Document representation
Encyclopedic knowledge
Human-Computer Interaction
Information science
International conferences
Knowledge
Language
Linguistics
Medical Subject Headings-MeSH
Natural language processing
Neural networks
Science and Medical Education
Semantic analysis
Semantics
Synonymy
Text categorization
Text editing
Wikipedia
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwELZQD4gLojxDHxiBhDhETfyI7d5aRFUhlRMVvVnJ2CmrLtlVu_3_nXGyUSKQuHCNJ1IyM575vmRmzNhHE6pYGBfyVgckKJhAcqdB5CitEXCoIFO1-8X36vxSfbvSV5OjvqgmrB8P3CvuqIm1ibqJLoigtC3rgITB6doJ0SI4Boq-mPMmZCrFYETNCC76eaQGKcvROsZbjAuCarYmGSgN6v8zHE_y0bxWcpJ8zp6xpwNq5Cf90-6yR7F7zh5fDP_FX7Dfp6mLnhTOl-OgZA4EjakWKKmfU437NadeS1iu1iTOx09qx7zmPxc3C7pc55TbAm_q63zV5tB3Nt7x7fzxl-zy7OuPL-f5cJBCDtoUm1wAoaoYRQVlhXp0dFw8MqkItlJOgVNtYZrQqsY2USDXlmUA6awJMiKDKeUrttOtuviGcQ2lhsK6aKpCRelqWVgIyDkcBFc2VcY-b5XrYZgyToddLD2yDTKET4bwZIiMfRhl1_1sjb9KnZKNRgmah50uoJf4wUv8v7wkY-_Iwr5vLh13tT-hzlpkqK7I2KckQfsaHxjqoT0BX5smZM0k92eSuB9hvrz1Ij_EgztfOoyMVqlCZuz9uEx3Uo1bF1f3KGOEcEJbiUp83Tvd-NIIWyuraMXM3HGmlflKt_iVpoUr7RDW6bf_Q4177AkCRk31MsLts53N7X08QFC2aQ7T_nsAWmI32Q
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3da9UwFA-6gfgy5nd1akRBfAhr06RJfJFd2RjChojDvYU2Sa-X3bV39979_zunza23KL42p9DkfP7S80HIB-WLkCrjWS09ABRwIMxIxxlQSwg4hM-7bPez8-L0Qny7lJfxwm0V0yo3NrEz1L51eEd-mBmQLS1Emn9Z3DCcGoV_V-MIjftkF0ywBgnfnRyff_8x3LKAgysgyOj7kiqALoeLEJZgHzjmbm15oq5h_99mecsvjXMmt5zQyT7Zi9EjPerZ_YjcC81j8uAs_h9_Qq4nXTU9HjydDw2TqcMQGXOCOjZQzHWfUqy5dPN2geR0uFr7TEv6a3Y1w8clQx_naVVOWVsz11c4ruimD_lTcnFy_PPrKYsDFZiTKl0z7jC6CoEXLis89wbHxgOiCk4XwghnRJ2qytei0lXggLnzzLvcaOXzAEgmy5-RnaZtwgtCpcukS7UJqkhFyE2Zp9p5wB7GeZNVRUI-bQ7XuthtHIdezC2gDmSE7RhhkREJeT_QLvoeG_-kmiCPBgrsi909aJdTG9XMVqFUQVbBwO6E1FnpAV4aWRrOa4BSLiFvkcO2LzIdtNseYYUtIFWTJuRjR4H6DR_sylimANvGTlkjyoMRJeilGy9vpMhGu7Cyf6Q4Ie-GZXwTc92a0N4CjeLccKlzOMTnvdANm4bwtdACV9RIHEenMl5pZr-7ruFCGgjv5Mv_f9Yr8hBCQokZMdwckJ318ja8hrBrXb2JunUHssEwRw
  priority: 102
  providerName: ProQuest
Title Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach
URI https://www.ncbi.nlm.nih.gov/pubmed/26468436
https://www.proquest.com/docview/1957884403
https://www.proquest.com/docview/1722925836
https://pubmed.ncbi.nlm.nih.gov/PMC4592155
https://doaj.org/article/bea7e5be9d2d4581ad24995a922f118c
Volume 3
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3db9MwED-NTUK8IL4JjGIEEuIhI3HsOOYFrWhjQuqEEBV9sxLbKRWlHV0nwX_PnfPBAuOlD8mlqn13vvuld78DeKFc7hOlXVxLhwAFA0ispeUxSktMOITLQrX75DQ_mYoPMznbgW5-Z7uB51dCO5onNd0sD37--PUWHR7z1wOFaOT1mfcbdHmu9DXYw4ikaITDpE3zw4mMOXQSGDk50Xxj0qEbptK_Hidm4FzkhQiEzX_CVGDz__fMvhS0hgWVlyLU8S242aaW7LCxhduw41d34Pqk_fP8Lnwfh1Z70gpb9mzKzFL-TAVDQUeMCuHnjBoy7RK3AsVZ_97tDSvZl8W3BV0uYwqAjlXlPF7XsW3aH89ZR1J-D6bHR5_fncTttIXYSpVsY24p9fKe5zbNHXeaZsoj3PK2yIUWVos6UZWrRVVUniMgz1JnM10ol3mEOWl2H3ZX65V_CEzaVNqk0F7lifCZLrOksA6BibZOp1Uewatuc41tqchpIsbSICQhnZigE0M6ieB5L3vWEHBcKTUmHfUSRJodLqw3c9P6oKl8qbysvMbVCVmkpUPsqWWpOa8RZ9kInpKGTdOB2ru-OaT2W4SxOongZZAgc8QfbMu2hwGXTTRaA8n9gSQ6rR3e7qzIdDZvUo3HZyFEkkXwrL9NT1Ih3MqvL1BGca65LDLcxAeN0fWL7mw3AjUwx8GuDO-sFl8DpbiQGnM_-ei_3_kYbmCqKKlShut92N1uLvwTTMe21Qj2xkenHz-NwusM_Hw_S0fB_34DZT42kg
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwED-NTgJeEN8UBjMChHiIljh2EiMhtMKmjq0VQpu2N5PYbletNKXthPin-Bu5yxeNQLztNb5Wsc---_2c-wB4GdvI-bGy3khaJCjoQDwlDfdQWiLgEDYsot0Hw6h_Ij6dybMN-FXnwlBYZW0TC0Ntc0N35DuBwr2VCOGH7-ffPeoaRV9X6xYa5bY4dD9_IGVbvjv4iPp9xfn-3vGHvld1FfCMjP2Vxw1BDOd4ZILIcquodzrSCmeSSChhlBj5cWZHIksyx5F4hoE1IVJzGzqE80GI_3sNNkWIUKEDm7294ecvza0OOtQIQU1ZBzVGqrQzd26B9ohTrNia5ysaBPztBtb8YDtGc83p7d-GWxVaZbvl9roDG252F64Pqu_x9-Bbr8jeJ0WzaVOgmRmC5BSDVKidUWz9mFGOp5nmcxJnzVXeW5ay08nFhB6nHvlUy7J07OUjz5QZlUtW1z2_DydXstQPoDPLZ-4RMGkCafxEuTjyhQtVGvqJsch1lLEqyKIuvKkXV5uqujk12ZhqZDmkCF0oQpMiuvCikZ2XNT3-KdUjHTUSVIe7eJAvxro61jpzaexk5hTOTsgkSC3SWSVTxfkIqZvpwjZpWJdJrY010buU0YvMWPldeF1IkD3BFzZplRaB06bKXC3JrZYk2gHTHq53ka7s0FL_OTVdeN4M0y8ptm7m8kuUiTlXXCYhLuLDctM1k0a4HCWCRuLWdmytSntkNjkvqpQLqRBOysf_f61tuNE_Hhzpo4Ph4RO4iXBUUjQOV1vQWS0u3VOEfKvsWXXOGHy96qP9Gw81bCo
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFD4anTTxgrhTGMwIEOIhauLYSYyE0Mo2bYxVE2Jib15iO6WiNKXthPhr_DrOyY1GIN72Gp9Wsc_1c84F4HlsI-fHynq5tAhQ0IF4ShruIbXEgEPYsMx2PxlFh2fi_bk834BfTS0MpVU2NrE01LYwdEc-CBTKViKEHw7yOi3idO_g7fy7RxOk6EtrM06jEpFj9_MHwrflm6M95PULzg_2P7079OoJA56Rsb_yuKFwwzkemSCy3Cqao44Qw5kkEkoYJXI_zmwusiRzHEFoGFgTIky3ocPQPgjxf6_BZkzloz3YHO6PTj-2NzzoXCMMcKqeqDHCpsHcuQXaJk55Y2tesBwW8LdLWPOJ3XzNNQd4cBNu1JEr261E7RZsuNlt2Dqpv83fgW_DspKfmM6mbbNmZig8p3ykUgQY5dmPGdV7mmkxJ3LWXuu9Zin7PPk6ocepR_7Vsiwde0Xumaq6csmaHuh34exKjvoe9GbFzD0AJk0gjZ8oF0e-cKFKQz8xFnGPMlYFWdSHV83halN3OqeBG1ONiIcYoUtGaGJEH561tPOqv8c_qYbEo5aCenKXD4rFWNcqrjOXxk5mTuHuhEyC1CK0VTJVnOcI40wfdojDuipwbS2L3qXqXkTJyu_Dy5KCbAu-sEnrEgncNnXp6lBudyjRJpjuciNFurZJS_1Hg_rwtF2mX1Ke3cwVl0gTc664TEI8xPuV0LWbxtA5SgStxB1x7JxKd2U2-VJ2LBdSYWgpH_7_tXZgC1VafzgaHT-C6xiZSkrM4WobeqvFpXuM0d8qe1KrGYOLq9bs3wwYcGg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Biomedical+literature+classification+using+encyclopedic+knowledge%3A+a+Wikipedia-based+bag-of-concepts+approach&rft.jtitle=PeerJ+%28San+Francisco%2C+CA%29&rft.au=Mouri%C3%B1o+Garc%C3%ADa%2C+Marcos+Antonio&rft.au=P%C3%A9rez+Rodr%C3%ADguez%2C+Roberto&rft.au=Anido+Rif%C3%B3n%2C+Luis+E&rft.date=2015-09-29&rft.issn=2167-8359&rft.eissn=2167-8359&rft.volume=3&rft.spage=e1279&rft_id=info:doi/10.7717%2Fpeerj.1279&rft_id=info%3Apmid%2F26468436&rft.externalDocID=26468436
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-8359&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-8359&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-8359&client=summon