Method for checking and processing repeated data

The invention discloses a method for checking and processing repeated data. The method comprises the following steps: A, acquiring data to be verified, and initializing the data structure of the data to be verified; B, calculating the hash code of each datum in the data to be verified; C, checking w...

Full description

Saved in:
Bibliographic Details
Main Authors XIONG DAOYONG, LONG QINGLIN, CHEN CHENGZHI, LIANG GUOHUI, LI AIMIN
Format Patent
LanguageEnglish
Published 04.03.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention discloses a method for checking and processing repeated data. The method comprises the following steps: A, acquiring data to be verified, and initializing the data structure of the data to be verified; B, calculating the hash code of each datum in the data to be verified; C, checking whether repeated data exist among the data or not according to the hash code of each datum, and updating a tag code of each datum according to a checking result; D, transmitting each datum of which the tag code is updated to each distributed calculating node in order to determine whether repeated data exist between each datum of which the tag code is updated and local data through each distributed calculating node; E, transmitting each datum compared by each distributed calculating node to a summarizing node. By adopting the method, the comparison time of massive data can be shortened, and the data lookup and cleaning efficiencies are increased.
Bibliography:Application Number: CN20141633391