The Role of Different Thesauri Terms and Captions in Automated Subject Classification
The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination wi...
Saved in:
Published in | 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 main conference proceedings) : (WI '06) : proceedings : 18-22 December, 2006, Hong Kong, China pp. 961 - 965 |
---|---|
Main Author | |
Format | Conference Proceeding Book Chapter |
Language | English |
Published |
IEEE
01.12.2006
|
Subjects | |
Online Access | Get full text |
ISBN | 9780769527475 0769527477 |
DOI | 10.1109/WI.2006.169 |
Cover
Loading…
Abstract | The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination with a stemmer and a stop-word list. The algorithm comprises string-to-string matching between words in the documents to be classified and words in term lists derived from the Ei thesaurus and classification scheme. The data collection for evaluation consists of some 35000 scientific paper abstracts from the compendex database. A subset of the Ei thesaurus and classification scheme is used, comprising 92 classes at up to five hierarchical levels from general engineering. The results show that preferred terms perform best, whereas captions perform worst. Stemming in most cases shows performance improvement, whereas the stop-word list does not have a significant impact |
---|---|
AbstractList | The paper aims to explore to what degree different types of terms in Engineering Information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination with a stemmer and a stop-word list. The algorithm comprises string-to-string matching between words in the documents to be classified and words in term lists derived from the Ei thesaurus and classification scheme. The data collection for evaluation consists of some 35000 scientific paper abstracts from the Compendex database. A subset of the Ei thesaurus and classification scheme is used, comprising 92 classes at up to five hierarchical levels from General Engineering. The results show that preferred terms perform best, whereas captions perform worst. Stemming in most cases shows to improve performance, whereas the stop-word list does not have a significant impact. The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination with a stemmer and a stop-word list. The algorithm comprises string-to-string matching between words in the documents to be classified and words in term lists derived from the Ei thesaurus and classification scheme. The data collection for evaluation consists of some 35000 scientific paper abstracts from the compendex database. A subset of the Ei thesaurus and classification scheme is used, comprising 92 classes at up to five hierarchical levels from general engineering. The results show that preferred terms perform best, whereas captions perform worst. Stemming in most cases shows performance improvement, whereas the stop-word list does not have a significant impact |
Author | Golub, K. |
Author_xml | – sequence: 1 givenname: K. surname: Golub fullname: Golub, K. organization: Dept. of Inf. Technol., Lund Univ |
BackLink | https://lup.lub.lu.se/record/617040$$DView record from Swedish Publication Index oai:portal.research.lu.se:publications/d0aaf5e3-2bc8-4a02-b519-75040f8d8788$$DView record from Swedish Publication Index |
BookMark | eNqNjk2LFDEQhgMqrO7OaY9ecvI2Y6XzfVzGr4UBQWfxGNLpCpsl02mTbsR_vy0zeBMsKApeHp633pCXYxmRkFsGO8bAvv9xv-sA1I4p-4JsrDaglZWdFlpekU1rT7AOt0pr85o8HB-RfisZaYn0Q4oRK44zXdPml5roEeupUT8OdO-nOZWx0TTSu2UuJz_jQL8v_ROGme6zby3FFPwf6Ia8ij433FzuNXn49PG4_7I9fP18v787bBNXct7agUnUvR4MsyC5hj6iQRGlV2bQvTGaiw6M6IJAZkNQwpooQYI1QUg58Gviz972C6eld1NNJ19_u-KTm0qdfXYVG_oaHl1eXEO3UvnyZHMDeB8lctf1wTjhoXO9ZNZpCQKiGYw2Zu04_LMjL9O6_cX9n7p3Z91Uy88F2-xOqQXM2Y9YluY4U5zLrlvBt2cwIeLfVgGKSeD8GQfemRA |
ContentType | Conference Proceeding Book Chapter |
CorporateAuthor | Institutioner vid LTH Departments at LTH Lunds universitet Institutionen för elektro- och informationsteknik Faculty of Engineering, LTH Lunds Tekniska Högskola Lund University Department of Electrical and Information Technology |
CorporateAuthor_xml | – name: Faculty of Engineering, LTH – name: Lund University – name: Institutioner vid LTH – name: Lunds Tekniska Högskola – name: Departments at LTH – name: Lunds universitet – name: Department of Electrical and Information Technology – name: Institutionen för elektro- och informationsteknik |
DBID | 6IE 6IL CBEJK RIE RIL 7SC 8FD JQ2 L7M L~C L~D ADTPV BNKNJ D95 BMRNB |
DOI | 10.1109/WI.2006.169 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional SwePub SwePub Conference SWEPUB Lunds universitet SwePub Book Chapter |
DatabaseTitle | Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Computer and Information Systems Abstracts |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EndPage | 965 |
ExternalDocumentID | oai_portal_research_lu_se_publications_d0aaf5e3_2bc8_4a02_b519_75040f8d8788 oai_lup_lub_lu_se_d0aaf5e3_2bc8_4a02_b519_75040f8d8788 4061503 |
Genre | orig-research Conference Paper |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AARBI AAWTH ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIB RIC RIE RIL 7SC 8FD JQ2 L7M L~C L~D ADTPV BNKNJ D95 BMRNB |
ID | FETCH-LOGICAL-i365t-9d15e7b7d81905370bfe8e4f5a68d7b8873420842c4e19cc6498f505098c455d3 |
IEDL.DBID | RIE |
ISBN | 9780769527475 0769527477 |
IngestDate | Thu Aug 21 06:52:52 EDT 2025 Sat Apr 05 03:40:04 EDT 2025 Thu Jul 10 23:51:28 EDT 2025 Wed Aug 27 01:58:10 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i365t-9d15e7b7d81905370bfe8e4f5a68d7b8873420842c4e19cc6498f505098c455d3 |
Notes | SourceType-Conference Papers & Proceedings-1 ObjectType-Conference Paper-1 content type line 25 |
PQID | 31633522 |
PQPubID | 23500 |
PageCount | 5 |
ParticipantIDs | ieee_primary_4061503 swepub_primary_oai_portal_research_lu_se_publications_d0aaf5e3_2bc8_4a02_b519_75040f8d8788 proquest_miscellaneous_31633522 swepub_primary_oai_lup_lub_lu_se_d0aaf5e3_2bc8_4a02_b519_75040f8d8788 |
PublicationCentury | 2000 |
PublicationDate | 2006-Dec. 20061218 2006 |
PublicationDateYYYYMMDD | 2006-12-01 2006-12-18 2006-01-01 |
PublicationDate_xml | – month: 12 year: 2006 text: 2006-Dec. |
PublicationDecade | 2000 |
PublicationTitle | 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 main conference proceedings) : (WI '06) : proceedings : 18-22 December, 2006, Hong Kong, China |
PublicationTitleAbbrev | WI |
PublicationYear | 2006 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000396778 |
Score | 1.3821056 |
Snippet | The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated... The paper aims to explore to what degree different types of terms in Engineering Information (Ei) thesaurus and classification scheme influence automated... |
SourceID | swepub proquest ieee |
SourceType | Open Access Repository Aggregation Database Publisher |
StartPage | 961 |
SubjectTerms | Abstracts automated subject classification Automatic control Colon compendex database data collection document classification Electrical Engineering, Electronic Engineering, Information Engineering Elektroteknik och elektronik Engineering and Technology engineering information Gases Information technology Mechanical variables measurement Solids string-to-string matching Teknik Thesauri thesauri term Vocabulary |
Title | The Role of Different Thesauri Terms and Captions in Automated Subject Classification |
URI | https://ieeexplore.ieee.org/document/4061503 https://www.proquest.com/docview/31633522 https://lup.lub.lu.se/record/617040 oai:portal.research.lu.se:publications/d0aaf5e3-2bc8-4a02-b519-75040f8d8788 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB61PcGFR4tYnj5wJNtkbSfOEZVWBakIoa6ouFh-TETVKlmxyYVfz4yTXRbUQw-RkkixY3tkf_P6BuCdRxPZH5X5PPhMOVVl3kWVmYBlGTwdual0wsWX8nypPl_pqz14v82FQcQUfIZzvk2-_NiFgU1lxyrRl8t92CcxG3O1tvaUXNbMhTZq5rVmZauaCHY2z3rKzyvy-vj7p9ERUaRIZ-7qX4i5SxuajpqzR3Cx-ckxwuRmPvR-Hn7_x99431E8hqO_SX3i6_a4egJ72D6Fhzt8hIewJKER37pbFF0jPk6lU3pBb9fMPiQuaRtfC9dGceLGYBhx3YoPQ98R7sUoaBdis45IlTY5Bikt-xEsz04vT86zqe5Cdi1L3Wd1LDRWvoqMFrSsct-gQdVoV5pYedqWJDvl1SIoLOoQSlWbRjORjAlK6yifwUHbtfgchKsSx5pDJBgg68I1oVCujrlTDcZYzOCQJ8iuRmoNO83NDN5u1sKSuLMPw7XYDWsrCT8yZpzB6bhE20-ZJ_t2WNHl6bJrtNSLazRKu_DBWOXyhfUEWC0z2ueNiYbU_xn8uKOdUfWxE9_Sz6m91Y4h9V6Nv7h7dC_hwWIqf5QXr-Cg_zXga4I2vX-TZPoPt0j26w |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcoBeKLSIBUp94Ei2ydpOnCMqrbbQrRDaFRUXy4-JqFolKza58OvxONllQT30YCmJFL_l-TyPbwDeW1Se7FGJTZ1NhBFFYo0XiXKY584GkRtTJ8yu8ulCfL6W1zvwYRMLg4jR-QzH9Bht-b5xHanKTkSkL-eP4HGQ-0L20VobjUrKS2JD6-_mpaTrVjFQ7Kzf5RChl6XlyfeL3hSRRV9nauxfkLlNHBqFzfkzmK272fuY3I671o7d7_8YHB86jn04_BvWx75uBNZz2MH6BextMRIewCJsG_atuUPWVOzTkDylZeHriviH2Dwc5Ctmas9OTe8Ow25q9rFrm4B80bNwDpFih8Vcm-SFFBf-EBbnZ_PTaTJkXkhueC7bpPSZxMIWnvCC5EVqK1QoKmly5QsbDiZOZnkxcQKz0rlclKqSRCWjnJDS85ewWzc1vgJmisiyZhADEOBlZiqXCVP61IgKvc9GcEATpJc9uYYe5mYEx-u10GHDkxXD1Nh0K80DgiTUOIKzfok2vxJT9l23DMWGoleoQyumksj1xDqlhUkn2gbIqonTPq2UV4VSI_hxTz395UcPjEs_h_qWW6rUB1X--v7RHcOT6Xx2qS8vrr68gaeTIRlSmr2F3fZXh0cB6LT2XdzffwDFXvo4 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Proceedings+of+the+2006+IEEE%2FWIC%2FACM+International+Conference+on+Web+Intelligence&rft.au=Golub%2C+Koraljka&rft.atitle=The+role+of+different+thesauri+terms+and+captions+in+automated+subject+classification&rft.date=2006-01-01&rft.isbn=9780769527475&rft.spage=961&rft_id=info:doi/10.1109%2FWI.2006.169&rft.externalDocID=oai_portal_research_lu_se_publications_d0aaf5e3_2bc8_4a02_b519_75040f8d8788 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769527475/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769527475/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769527475/sc.gif&client=summon&freeimage=true |