“Text” and “Text Mining” in the Academic Field of Natural Language Processing

Text mining is used to discover new knowledge or verify hypotheses based on a large collection of electronic text and has become one of standard methods used in various academic fields involving sociology. Natural language processing (NLP), which researches a computer-based means of processing natur...

Full description

Saved in:
Bibliographic Details
Published inJapanese Sociological Review Vol. 68; no. 3; pp. 351 - 367
Main Author TANAKA, Shosaku
Format Journal Article
LanguageJapanese
Published The Japan Sociological Society 2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Text mining is used to discover new knowledge or verify hypotheses based on a large collection of electronic text and has become one of standard methods used in various academic fields involving sociology. Natural language processing (NLP), which researches a computer-based means of processing natural languages such as Japanese and English, is an interdisciplinary field involving disciplines such as computer science, linguistics, and cognitive science. NLP is also one of essential components of text mining that needs to process large collection of text. This paper provides an overview of NLP and its model of “text”, and discusses “text mining” in the anticipation that it will become increasingly common.NLP drastically approximates a language and its texts by employing formal, mathematical, and simple models to develop new techniques. Consequently, considerable linguistic information such as that related to context is inevitably lost during text mining when using these general techniques of NLP. Furthermore, acquired fragments of knowledge and their interpretation as the last phase of text mining are affected. Experts must complement them with their knowledge of the object domain. Conversely, text mining is also an important field to which NLP has been applied. NLP not only provides generic analyses of texts but also tries to develop issue-based methods that aid the entire process of text mining.
ISSN:0021-5414
1884-2755
DOI:10.4057/jsr.68.351