Apparatus and method for classifying document, and computer program product

According to an embodiment, a document classification apparatus includes an extraction unit, a clustering unit, a classification unit, and a label assignment unit. The extraction unit is configured to extract feature words from documents. The clustering unit is configured to cluster the feature word...

Full description

Saved in:
Bibliographic Details
Main Authors Manabe Toshihiko, Kokubu Tomoharu, Nakano Wataru, Inaba Masumi
Format Patent
LanguageEnglish
Published 29.11.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:According to an embodiment, a document classification apparatus includes an extraction unit, a clustering unit, a classification unit, and a label assignment unit. The extraction unit is configured to extract feature words from documents. The clustering unit is configured to cluster the feature words into clusters so that a difference between the number of documents each including any one of the feature words belonging to one cluster and the number of documents each including any one of the feature words belonging to another cluster is equal to or less than a predetermined reference value. The classification unit is configured to classify the documents into the clusters so that each document belongs to the cluster to which the feature word included in the each document belongs. The label assignment unit is configured to assign a classification label to each cluster as a word representative of the corresponding feature words.
Bibliography:Application Number: US201313845989