Automatic Data Cleaning System for Large-Scale Location Image Databases Using a Multilevel Extractor and Multiresolution Dissimilarity Calculation

In this article, we propose a system for automatically classifying and cleaning location images in large-scale image databases uploaded by arbitrary users. Detecting incorrect scenes uploaded by users and maintaining the correctness of the database through automatic data cleaning are essential becau...

Full description

Saved in:
Bibliographic Details
Published inIEEE intelligent systems Vol. 36; no. 5; pp. 49 - 56
Main Authors Cheng, Hsu-Yung, Yu, Chih-Chang
Format Journal Article
LanguageEnglish
Published Los Alamitos IEEE 01.09.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this article, we propose a system for automatically classifying and cleaning location images in large-scale image databases uploaded by arbitrary users. Detecting incorrect scenes uploaded by users and maintaining the correctness of the database through automatic data cleaning are essential because human inspection is not feasible for verifying massive amounts of data. In this study, we compared different feature extractors using deep convolutional neural networks trained using big data. We designed a multilevel extractor to improve feature extraction. Moreover, a detector based on multiresolution dissimilarity calculation was designed to overcome the issue of large intraclass distances and successfully identify incorrect scenes. The proposed system was validated using a highly challenging dataset with 138,000 images collected from Google Places. The experiments show that the multilevel extractor and the detector based on multiresolution dissimilarity calculation can improve the accuracy in identifying incorrect scenes and achieve satisfying data cleaning results.
ISSN:1541-1672
1941-1294
DOI:10.1109/MIS.2020.3021704