Abstracting for Dimensionality Reduction in Text Classification

There is a growing interest in efficient models of text mining and an emergent need for new data structures that address word relationships. Detailed knowledge about the taxonomic environment of keywords that are used in text documents can provide valuable insight into the nature of the subject matt...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of intelligent systems Vol. 28; no. 2; pp. 115 - 138
Main Authors McAllister, Richard A., Angryk, Rafal A.
Format Journal Article
LanguageEnglish
Published Hoboken, NJ Blackwell Publishing Ltd 01.02.2013
Wiley
John Wiley & Sons, Inc
Subjects
Online AccessGet full text
ISSN0884-8173
1098-111X
DOI10.1002/int.21543

Cover

Loading…
More Information
Summary:There is a growing interest in efficient models of text mining and an emergent need for new data structures that address word relationships. Detailed knowledge about the taxonomic environment of keywords that are used in text documents can provide valuable insight into the nature of the subject matter contained therein. Such insight may be used to enhance the data structures used in the text data mining task as relationships become usefully apparent. A popular scalable technique used to infer these relationships, while reducing dimensionality, has been Latent Semantic Analysis. We present a new approach, which uses an ontology of lexical ions to create ion profiles of documents and uses these profiles to perform text organization based on a process that we call frequent ion analysis. We introduce TATOO, the Text ion TOOlkit, which is a full implementation of this new approach. We present our data model via an example of how taxonomically derived ions can be used to supplement semantic data structures for the text classification task.
Bibliography:istex:B312D4FFF3294448CF75EC28F424E7D94987B795
ArticleID:INT21543
ark:/67375/WNG-TT3GX6HX-F
ObjectType-Article-1
SourceType-Scholarly Journals-1
content type line 14
ObjectType-Feature-2
content type line 23
ISSN:0884-8173
1098-111X
DOI:10.1002/int.21543