From paragraph to graph: Latent semantic analysis for information visualization

Most techniques for relating textual information rely on intellectually created links such as author-chosen keywords and titles, authority indexing terms, or bibliographic citations. Similarity of the semantic content of whole documents, rather than just titles, abstracts, or overlap of keywords, of...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 101; no. Suppl 1; pp. 5214 - 5219
Main Authors	Landauer, Thomas K., Laham, Darrell, Derr, Marcia
Format	Journal Article
Language	English
Published	United States National Acad Sciences 06.04.2004 National Academy of Sciences
Series	Colloquium Paper
Subjects	Analysis Animals Biochemical Phenomena Biochemistry Documentation Information Pattern Recognition, Automated Semantics Subject Headings Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Most techniques for relating textual information rely on intellectually created links such as author-chosen keywords and titles, authority indexing terms, or bibliographic citations. Similarity of the semantic content of whole documents, rather than just titles, abstracts, or overlap of keywords, offers an attractive alternative. Latent semantic analysis provides an effective dimension reduction method for the purpose that reflects synonymy and the sense of arbitrary word combinations. However, latent semantic analysis correlations with human text-to-text similarity judgments are often empirically highest at ≈300 dimensions. Thus, two- or three-dimensional visualizations are severely limited in what they can show, and the first and/or second automatically discovered principal component, or any three such for that matter, rarely capture all of the relations that might be of interest. It is our conjecture that linguistic meaning is intrinsically and irreducibly very high dimensional. Thus, some method to explore a high dimensional similarity space is needed. But the 2.7 × 10 7 projections and infinite rotations of, for example, a 300-dimensional pattern are impossible to examine. We suggest, however, that the use of a high dimensional dynamic viewer with an effective projection pursuit routine and user control, coupled with the exquisite abilities of the human visual system to extract information about objects and from moving patterns, can often succeed in discovering multiple revealing views that are missed by current computational algorithms. We show some examples of the use of latent semantic analysis to support such visualizations and offer views on future needs.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 Abbreviations: LSA, latent semantic analysis; SVD, singular value decomposition; MeSH, medical subject heading; cos, cosine. This paper results from the Arthur M. Sackler Colloquium of the National Academy of Sciences, “Mapping Knowledge Domains,” held May 9-11, 2003, at the Arnold and Mabel Beckman Center of the National Academies of Sciences and Engineering in Irvine, CA. To whom correspondence should be addressed. E-mail: landauer@psych.colorado.edu.
ISSN:	0027-8424 1091-6490
DOI:	10.1073/pnas.0400341101