SYSTEM AND METHOD FOR CATEGORIZING DOCUMENTS, AND APPARATUS APPLIED TO THE SAME

The present invention discloses a document classification system, a method thereof, and an apparatus for application. In other words, the present invention rearranges the configuration information defined in a web-based unstructured document to generate a structured document, determines whether the...

Full description

Saved in:
Bibliographic Details
Main Authors HWANG, YOUNG SOOK, YIN, CHANG HAO, LEE, JUNE SUP
Format Patent
LanguageEnglish
Korean
Published 21.04.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The present invention discloses a document classification system, a method thereof, and an apparatus for application. In other words, the present invention rearranges the configuration information defined in a web-based unstructured document to generate a structured document, determines whether the structured document is related to a category based on the similarity between the structured document and reference documents of the categories set for document classification, and assigning a certain one in the categories based on the inclusion state of keywords set for each category of the related document when the document is determined to be related, in order to efficiently classify and apply unstructured documents to actual services. [Reference numerals] (AA) Start; (BB) End; (S210) Collect an imformal document; (S220) Generate a formal document; (S230) Require to generate a reference document?; (S240) Select and store the reference document through groupping; (S250) Check the reference document for selecting the related document; (S260) Determine the similarity; (S270) Abnormal reference value?; (S280) Determine to the related document; (S290) Category path exists; (S300) Allocate a category by using the category path; (S310) Allocate the category by using the weight and the frequency of keywords; (S320) Residual formal document exists?; (S330) Allocate the category by using the similarity; (S340) Residual formal document exists; (S350) Exclude non-relevant documents
Bibliography:Application Number: KR20120110625