Dynamic extraction of contextually-coherent text blocks

Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic docu...

Full description

Saved in:
Bibliographic Details
Main Authors Izhaki-Allerhand, Liron, Mizrachi, Ran, Asi, Abedelkader, Ronen, Royi, Jassin, Ohad
Format Patent
LanguageEnglish
Published 08.06.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.
Bibliography:Application Number: US201815990405