Apparatus and method for classifying document, and computer program product
According to an embodiment, a document classification apparatus includes an extraction unit, a clustering unit, a classification unit, and a label assignment unit. The extraction unit is configured to extract feature words from documents. The clustering unit is configured to cluster the feature word...
Saved in:
Main Authors | , , , |
---|---|
Format | Patent |
Language | English |
Published |
29.11.2016
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | According to an embodiment, a document classification apparatus includes an extraction unit, a clustering unit, a classification unit, and a label assignment unit. The extraction unit is configured to extract feature words from documents. The clustering unit is configured to cluster the feature words into clusters so that a difference between the number of documents each including any one of the feature words belonging to one cluster and the number of documents each including any one of the feature words belonging to another cluster is equal to or less than a predetermined reference value. The classification unit is configured to classify the documents into the clusters so that each document belongs to the cluster to which the feature word included in the each document belongs. The label assignment unit is configured to assign a classification label to each cluster as a word representative of the corresponding feature words. |
---|---|
Bibliography: | Application Number: US201313845989 |