A theory of term importance in automatic text analysis

A good deal of work has been done over the years in an attempt to use statistical or probabilistic techniques as a basis for automatic indexing and content analysis. (1–10) Unfortunately, many of these methods are lacking in effectiveness, and the more refined procedures are computationally unattrac...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the American Society for Information Science Vol. 26; no. 1; pp. 33 - 44
Main Authors	Salton, G., Yang, C. S., Yu, C. T.
Format	Journal Article
Language	English
Published	Washington, D.C Wiley Subscription Services, Inc., A Wiley Company 01.01.1975 American Documentation Institute Wiley Periodicals Inc
Subjects	Associative indexing Automatic text analysis Collections Content analysis Correlation Discrimination Documentation Information processing Information retrieval Retrieval performance measures Separation Statistical analysis Terms Text analysis Thesauri Value analysis Weighting
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A good deal of work has been done over the years in an attempt to use statistical or probabilistic techniques as a basis for automatic indexing and content analysis. (1–10) Unfortunately, many of these methods are lacking in effectiveness, and the more refined procedures are computationally unattractive. A new technique, known as discrimination value analysis, ranks the text words in accordance with how well they are able to discriminate the documents of a collection from each other; that is, the value of a term depends on how much the average separation between individual documents changes when the given term is assigned for content identification. The best words are those which achieve the greatest separation. The discrimination value analysis is computationally simple, and it assigns a specific role in content analysis to single words, juxtaposed words and phrases, and word groups or thesaurus categories. Experimental results are given showing the effectiveness of the technique.
Bibliography:	ArticleID:ASI4630260106 ark:/67375/WNG-N5X7Z8SF-H istex:D4BAC7F550A01890903B5C82638FB2B3D12DCB22 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0002-8231 1097-4571
DOI:	10.1002/asi.4630260106