Indexing and querying hash sequence matrices
Embodiments are directed to indexing and querying a sequence of hash values in an indexing matrix. A computer system accesses a document to extract a portion of text from the document. The computer system applies a hashing algorithm to the extracted text. The hash values of the extracted text form a...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
08.09.2015
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Embodiments are directed to indexing and querying a sequence of hash values in an indexing matrix. A computer system accesses a document to extract a portion of text from the document. The computer system applies a hashing algorithm to the extracted text. The hash values of the extracted text form a representative sequence of hash values. The computer system inserts each hash value of the sequence of hash values into an indexing matrix, which is configured to store multiple different hash value sequences. The computer system also queries the indexing matrix to determine how similar the plurality of hash value sequences are to the selected hash value sequence based on how many hash values of the selected hash value sequence overlap with the hash values of the plurality of stored hash value sequences. |
---|---|
Bibliography: | Application Number: US20100943780 |