Indexing and querying hash sequence matrices

Embodiments are directed to indexing and querying a sequence of hash values in an indexing matrix. A computer system accesses a document to extract a portion of text from the document. The computer system applies a hashing algorithm to the extracted text. The hash values of the extracted text form a...

Full description

Saved in:
Bibliographic Details
Main Authors GANDHI MAUKTIK H, LAMANNA CHARLES WILLIAM, BREWER JASON ERIC
Format Patent
LanguageEnglish
Published 08.09.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Embodiments are directed to indexing and querying a sequence of hash values in an indexing matrix. A computer system accesses a document to extract a portion of text from the document. The computer system applies a hashing algorithm to the extracted text. The hash values of the extracted text form a representative sequence of hash values. The computer system inserts each hash value of the sequence of hash values into an indexing matrix, which is configured to store multiple different hash value sequences. The computer system also queries the indexing matrix to determine how similar the plurality of hash value sequences are to the selected hash value sequence based on how many hash values of the selected hash value sequence overlap with the hash values of the plurality of stored hash value sequences.
Bibliography:Application Number: US20100943780