Automated subject classification of textual Web pages, based on a controlled vocabulary: Challenges and recommendations

The primary objective of this study was to identify and address problems of applying a controlled vocabulary in automated subject classification of textual Web pages, in the area of engineering. Web pages have special characteristics such as structural information, but are at the same time rather he...

Full description

Saved in:

Bibliographic Details
Published in	The new review of hypermedia and multimedia Vol. 12; no. 1; pp. 11 - 27
Main Author	Golub, Koraljka
Format	Journal Article
Language	English
Published	Taylor & Francis Group 01.06.2006
Subjects	Automated subject classification Biblioteks- och informationsvetenskap Controlled vocabulary Electrical Engineering, Electronic Engineering, Information Engineering Elektroteknik och elektronik Engineering and Technology Engineering Information thesaurus and classification scheme Library and Information Science Teknik
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The primary objective of this study was to identify and address problems of applying a controlled vocabulary in automated subject classification of textual Web pages, in the area of engineering. Web pages have special characteristics such as structural information, but are at the same time rather heterogeneous. The classification approach used comprises string-to-string matching between words in a term list extracted from the Ei (Engineering Information) thesaurus and classification scheme, and words in the text to be classified. Based on a sample of 70 Web pages, a number of problems with the term list are identified. Reasons for those problems are discussed and improvements proposed. Methods for implementing the improvements are also specified, suggesting further research.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1361-4568 1740-7842 1740-7842
DOI:	10.1080/13614560600774313