Method for duplicate checking of mass data and system thereof

The invention discloses a method for duplicate checking of mass data and a system thereof, the method comprises the following steps: extracting data key words from the mass data, wherein, the data key words are used for separating the data from other data areas; dividing the data key words according...

Full description

Saved in:
Bibliographic Details
Main Author NIU GUOYANG
Format Patent
LanguageChinese
English
Published 28.11.2012
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention discloses a method for duplicate checking of mass data and a system thereof, the method comprises the following steps: extracting data key words from the mass data, wherein, the data key words are used for separating the data from other data areas; dividing the data key words according to the first N+M letters of the data key words, and putting the data key words with the same firstN+M letters in a file to obtain key word data files; wherein, the first N letters of the data key words are same, the first N+M letters are not exactly same (N and M are nonnegative integers); and performing duplicate checking on the data in the key word data files to obtain the duplicate checking result. The method helps realize a function of independent duplicate checking of mass data in a low configuration environment.
Bibliography:Application Number: CN20091108569