The Role of Different Thesauri Terms and Captions in Automated Subject Classification

The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination wi...

Full description

Saved in:
Bibliographic Details
Published in2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 main conference proceedings) : (WI '06) : proceedings : 18-22 December, 2006, Hong Kong, China pp. 961 - 965
Main Author Golub, K.
Format Conference Proceeding Book Chapter
LanguageEnglish
Published IEEE 01.12.2006
Subjects
Online AccessGet full text
ISBN9780769527475
0769527477
DOI10.1109/WI.2006.169

Cover

Loading…
Abstract The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination with a stemmer and a stop-word list. The algorithm comprises string-to-string matching between words in the documents to be classified and words in term lists derived from the Ei thesaurus and classification scheme. The data collection for evaluation consists of some 35000 scientific paper abstracts from the compendex database. A subset of the Ei thesaurus and classification scheme is used, comprising 92 classes at up to five hierarchical levels from general engineering. The results show that preferred terms perform best, whereas captions perform worst. Stemming in most cases shows performance improvement, whereas the stop-word list does not have a significant impact
AbstractList The paper aims to explore to what degree different types of terms in Engineering Information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination with a stemmer and a stop-word list. The algorithm comprises string-to-string matching between words in the documents to be classified and words in term lists derived from the Ei thesaurus and classification scheme. The data collection for evaluation consists of some 35000 scientific paper abstracts from the Compendex database. A subset of the Ei thesaurus and classification scheme is used, comprising 92 classes at up to five hierarchical levels from General Engineering. The results show that preferred terms perform best, whereas captions perform worst. Stemming in most cases shows to improve performance, whereas the stop-word list does not have a significant impact.
The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination with a stemmer and a stop-word list. The algorithm comprises string-to-string matching between words in the documents to be classified and words in term lists derived from the Ei thesaurus and classification scheme. The data collection for evaluation consists of some 35000 scientific paper abstracts from the compendex database. A subset of the Ei thesaurus and classification scheme is used, comprising 92 classes at up to five hierarchical levels from general engineering. The results show that preferred terms perform best, whereas captions perform worst. Stemming in most cases shows performance improvement, whereas the stop-word list does not have a significant impact
Author Golub, K.
Author_xml – sequence: 1
  givenname: K.
  surname: Golub
  fullname: Golub, K.
  organization: Dept. of Inf. Technol., Lund Univ
BackLink https://lup.lub.lu.se/record/617040$$DView record from Swedish Publication Index
oai:portal.research.lu.se:publications/d0aaf5e3-2bc8-4a02-b519-75040f8d8788$$DView record from Swedish Publication Index
BookMark eNqNjk2LFDEQhgMqrO7OaY9ecvI2Y6XzfVzGr4UBQWfxGNLpCpsl02mTbsR_vy0zeBMsKApeHp633pCXYxmRkFsGO8bAvv9xv-sA1I4p-4JsrDaglZWdFlpekU1rT7AOt0pr85o8HB-RfisZaYn0Q4oRK44zXdPml5roEeupUT8OdO-nOZWx0TTSu2UuJz_jQL8v_ROGme6zby3FFPwf6Ia8ij433FzuNXn49PG4_7I9fP18v787bBNXct7agUnUvR4MsyC5hj6iQRGlV2bQvTGaiw6M6IJAZkNQwpooQYI1QUg58Gviz972C6eld1NNJ19_u-KTm0qdfXYVG_oaHl1eXEO3UvnyZHMDeB8lctf1wTjhoXO9ZNZpCQKiGYw2Zu04_LMjL9O6_cX9n7p3Z91Uy88F2-xOqQXM2Y9YluY4U5zLrlvBt2cwIeLfVgGKSeD8GQfemRA
ContentType Conference Proceeding
Book Chapter
CorporateAuthor Institutioner vid LTH
Departments at LTH
Lunds universitet
Institutionen för elektro- och informationsteknik
Faculty of Engineering, LTH
Lunds Tekniska Högskola
Lund University
Department of Electrical and Information Technology
CorporateAuthor_xml – name: Faculty of Engineering, LTH
– name: Lund University
– name: Institutioner vid LTH
– name: Lunds Tekniska Högskola
– name: Departments at LTH
– name: Lunds universitet
– name: Department of Electrical and Information Technology
– name: Institutionen för elektro- och informationsteknik
DBID 6IE
6IL
CBEJK
RIE
RIL
7SC
8FD
JQ2
L7M
L~C
L~D
ADTPV
BNKNJ
D95
BMRNB
DOI 10.1109/WI.2006.169
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
SwePub
SwePub Conference
SWEPUB Lunds universitet
SwePub Book Chapter
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts



Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EndPage 965
ExternalDocumentID oai_portal_research_lu_se_publications_d0aaf5e3_2bc8_4a02_b519_75040f8d8788
oai_lup_lub_lu_se_d0aaf5e3_2bc8_4a02_b519_75040f8d8788
4061503
Genre orig-research
Conference Paper
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AARBI
AAWTH
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
OCL
RIB
RIC
RIE
RIL
7SC
8FD
JQ2
L7M
L~C
L~D
ADTPV
BNKNJ
D95
BMRNB
ID FETCH-LOGICAL-i365t-9d15e7b7d81905370bfe8e4f5a68d7b8873420842c4e19cc6498f505098c455d3
IEDL.DBID RIE
ISBN 9780769527475
0769527477
IngestDate Thu Aug 21 06:52:52 EDT 2025
Sat Apr 05 03:40:04 EDT 2025
Thu Jul 10 23:51:28 EDT 2025
Wed Aug 27 01:58:10 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i365t-9d15e7b7d81905370bfe8e4f5a68d7b8873420842c4e19cc6498f505098c455d3
Notes SourceType-Conference Papers & Proceedings-1
ObjectType-Conference Paper-1
content type line 25
PQID 31633522
PQPubID 23500
PageCount 5
ParticipantIDs ieee_primary_4061503
swepub_primary_oai_portal_research_lu_se_publications_d0aaf5e3_2bc8_4a02_b519_75040f8d8788
proquest_miscellaneous_31633522
swepub_primary_oai_lup_lub_lu_se_d0aaf5e3_2bc8_4a02_b519_75040f8d8788
PublicationCentury 2000
PublicationDate 2006-Dec.
20061218
2006
PublicationDateYYYYMMDD 2006-12-01
2006-12-18
2006-01-01
PublicationDate_xml – month: 12
  year: 2006
  text: 2006-Dec.
PublicationDecade 2000
PublicationTitle 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 main conference proceedings) : (WI '06) : proceedings : 18-22 December, 2006, Hong Kong, China
PublicationTitleAbbrev WI
PublicationYear 2006
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000396778
Score 1.3821056
Snippet The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated...
The paper aims to explore to what degree different types of terms in Engineering Information (Ei) thesaurus and classification scheme influence automated...
SourceID swepub
proquest
ieee
SourceType Open Access Repository
Aggregation Database
Publisher
StartPage 961
SubjectTerms Abstracts
automated subject classification
Automatic control
Colon
compendex database
data collection
document classification
Electrical Engineering, Electronic Engineering, Information Engineering
Elektroteknik och elektronik
Engineering and Technology
engineering information
Gases
Information technology
Mechanical variables measurement
Solids
string-to-string matching
Teknik
Thesauri
thesauri term
Vocabulary
Title The Role of Different Thesauri Terms and Captions in Automated Subject Classification
URI https://ieeexplore.ieee.org/document/4061503
https://www.proquest.com/docview/31633522
https://lup.lub.lu.se/record/617040
oai:portal.research.lu.se:publications/d0aaf5e3-2bc8-4a02-b519-75040f8d8788
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB61PcGFR4tYnj5wJNtkbSfOEZVWBakIoa6ouFh-TETVKlmxyYVfz4yTXRbUQw-RkkixY3tkf_P6BuCdRxPZH5X5PPhMOVVl3kWVmYBlGTwdual0wsWX8nypPl_pqz14v82FQcQUfIZzvk2-_NiFgU1lxyrRl8t92CcxG3O1tvaUXNbMhTZq5rVmZauaCHY2z3rKzyvy-vj7p9ERUaRIZ-7qX4i5SxuajpqzR3Cx-ckxwuRmPvR-Hn7_x99431E8hqO_SX3i6_a4egJ72D6Fhzt8hIewJKER37pbFF0jPk6lU3pBb9fMPiQuaRtfC9dGceLGYBhx3YoPQ98R7sUoaBdis45IlTY5Bikt-xEsz04vT86zqe5Cdi1L3Wd1LDRWvoqMFrSsct-gQdVoV5pYedqWJDvl1SIoLOoQSlWbRjORjAlK6yifwUHbtfgchKsSx5pDJBgg68I1oVCujrlTDcZYzOCQJ8iuRmoNO83NDN5u1sKSuLMPw7XYDWsrCT8yZpzB6bhE20-ZJ_t2WNHl6bJrtNSLazRKu_DBWOXyhfUEWC0z2ueNiYbU_xn8uKOdUfWxE9_Sz6m91Y4h9V6Nv7h7dC_hwWIqf5QXr-Cg_zXga4I2vX-TZPoPt0j26w
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcoBeKLSIBUp94Ei2ydpOnCMqrbbQrRDaFRUXy4-JqFolKza58OvxONllQT30YCmJFL_l-TyPbwDeW1Se7FGJTZ1NhBFFYo0XiXKY584GkRtTJ8yu8ulCfL6W1zvwYRMLg4jR-QzH9Bht-b5xHanKTkSkL-eP4HGQ-0L20VobjUrKS2JD6-_mpaTrVjFQ7Kzf5RChl6XlyfeL3hSRRV9nauxfkLlNHBqFzfkzmK272fuY3I671o7d7_8YHB86jn04_BvWx75uBNZz2MH6BextMRIewCJsG_atuUPWVOzTkDylZeHriviH2Dwc5Ctmas9OTe8Ow25q9rFrm4B80bNwDpFih8Vcm-SFFBf-EBbnZ_PTaTJkXkhueC7bpPSZxMIWnvCC5EVqK1QoKmly5QsbDiZOZnkxcQKz0rlclKqSRCWjnJDS85ewWzc1vgJmisiyZhADEOBlZiqXCVP61IgKvc9GcEATpJc9uYYe5mYEx-u10GHDkxXD1Nh0K80DgiTUOIKzfok2vxJT9l23DMWGoleoQyumksj1xDqlhUkn2gbIqonTPq2UV4VSI_hxTz395UcPjEs_h_qWW6rUB1X--v7RHcOT6Xx2qS8vrr68gaeTIRlSmr2F3fZXh0cB6LT2XdzffwDFXvo4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Proceedings+of+the+2006+IEEE%2FWIC%2FACM+International+Conference+on+Web+Intelligence&rft.au=Golub%2C+Koraljka&rft.atitle=The+role+of+different+thesauri+terms+and+captions+in+automated+subject+classification&rft.date=2006-01-01&rft.isbn=9780769527475&rft.spage=961&rft_id=info:doi/10.1109%2FWI.2006.169&rft.externalDocID=oai_portal_research_lu_se_publications_d0aaf5e3_2bc8_4a02_b519_75040f8d8788
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769527475/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769527475/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769527475/sc.gif&client=summon&freeimage=true