ANALYZING DEDUPLICATED DATA BLOCKS ASSOCIATED WITH UNSTRUCTURED DOCUMENTS
Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
01.06.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition. |
---|---|
Bibliography: | Application Number: US202117537470 |