The Role of Different Thesauri Terms and Captions in Automated Subject Classification

The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination wi...

Full description

Saved in:
Bibliographic Details
Published in2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 main conference proceedings) : (WI '06) : proceedings : 18-22 December, 2006, Hong Kong, China pp. 961 - 965
Main Author Golub, K.
Format Conference Proceeding Book Chapter
LanguageEnglish
Published IEEE 01.12.2006
Subjects
Online AccessGet full text
ISBN9780769527475
0769527477
DOI10.1109/WI.2006.169

Cover

Loading…
More Information
Summary:The paper aims to explore to what degree different types of terms in engineering information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination with a stemmer and a stop-word list. The algorithm comprises string-to-string matching between words in the documents to be classified and words in term lists derived from the Ei thesaurus and classification scheme. The data collection for evaluation consists of some 35000 scientific paper abstracts from the compendex database. A subset of the Ei thesaurus and classification scheme is used, comprising 92 classes at up to five hierarchical levels from general engineering. The results show that preferred terms perform best, whereas captions perform worst. Stemming in most cases shows performance improvement, whereas the stop-word list does not have a significant impact
Bibliography:SourceType-Conference Papers & Proceedings-1
ObjectType-Conference Paper-1
content type line 25
ISBN:9780769527475
0769527477
DOI:10.1109/WI.2006.169