Method for duplicate checking of mass data and system thereof
The invention discloses a method for duplicate checking of mass data and a system thereof, the method comprises the following steps: extracting data key words from the mass data, wherein, the data key words are used for separating the data from other data areas; dividing the data key words according...
Saved in:
Main Author | |
---|---|
Format | Patent |
Language | Chinese English |
Published |
28.11.2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The invention discloses a method for duplicate checking of mass data and a system thereof, the method comprises the following steps: extracting data key words from the mass data, wherein, the data key words are used for separating the data from other data areas; dividing the data key words according to the first N+M letters of the data key words, and putting the data key words with the same firstN+M letters in a file to obtain key word data files; wherein, the first N letters of the data key words are same, the first N+M letters are not exactly same (N and M are nonnegative integers); and performing duplicate checking on the data in the key word data files to obtain the duplicate checking result. The method helps realize a function of independent duplicate checking of mass data in a low configuration environment. |
---|---|
Bibliography: | Application Number: CN20091108569 |