Automatic Detection of Four-Panel Cartoon in Large-Scale Korean Digitized Newspapers using Deep Learning

In the realm of cultural and historical studies, the collection of image-based content from big data is a fundamental aspect of data analysis. However, this process is as intricate as extracting resources from vast terrains. Echoing this sentiment, there is a growing appreciation in scholarly circle...

Full description

Saved in:
Bibliographic Details
Published inJournal of open humanities data Vol. 10; p. 36
Main Authors Seojoon Lee, Byungjun Kim, Bong Gwan Jun
Format Journal Article
LanguageEnglish
Published Ubiquity Press 01.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the realm of cultural and historical studies, the collection of image-based content from big data is a fundamental aspect of data analysis. However, this process is as intricate as extracting resources from vast terrains. Echoing this sentiment, there is a growing appreciation in scholarly circles for “Four-panel Cartoon” (FPC) as a valuable image content source in big data digital newspapers in the Republic of Korea. Yet, identifying these FPCs amidst the vastness of big data archives is an arduous journey, especially given their unstructured image data format — a task both time-intensive and costly. To address this issue, this research paper presents a novel computational FPC detection mechanism: the development of the YOLOv5_FPC model, via fine-tuning the You Only Look Once Version 5 (YOLOv5) deep learning model, tailored precisely for FPC image detection. We applied our YOLOv5_FPC model to the Chosun Ilbo News Library archive (1920–1940) for automatic FPC data mining, spanning 47,777 JPG image files. We identified 1040 FPC objects within 1035 files, which include previously undiscovered FPCs by previous researchers. We provide a detailed description of our methodology, which includes the collection, labeling, training, detection, and distribution of the data we discovered from big data newspaper archives. Our findings, now available as an open-access dataset in the Journal of Open Humanities Data (JOHD) Dataverse, invite discussions among humanities researchers focusing on the culture and history of Korea between 1920 and 1940.
ISSN:2059-481X
DOI:10.5334/johd.205