Classification With Unbalanced Samples by Self-Sampling and Semicorrelated Co-Training- An Application to Algal Bloom Detection

Machine-learning-based methods provide attractive solutions to algal bloom detection. However, the effective utilization of training sets remains a crucial challenge. Taking the extraction of Ulva prolifera as an example, to improve the detection accuracy, this manuscript presents a model based on s...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on geoscience and remote sensing Vol. 61; pp. 1 - 12
Main Authors Lyu, Xinrong, Zhou, Jun, Ren, Peng, Frery, Alejandro C.
Format Journal Article
LanguageEnglish
Published New York IEEE 2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Machine-learning-based methods provide attractive solutions to algal bloom detection. However, the effective utilization of training sets remains a crucial challenge. Taking the extraction of Ulva prolifera as an example, to improve the detection accuracy, this manuscript presents a model based on self-sampling and semicorrelated co-training. The self-sampling module comprises balanced sampling and gradient descent to enhance the efficiency of extracting useful information from U.prolifera training sets. Balanced sampling optimizes the distribution of sampling points, while gradient descent determines the optimal number of sampling points. During the iteration process, useful information will be continuously extracted driven by the self-sampling module as the input of training for the subsequent machine-learning algorithm. The classical semisupervised machine-learning approach named co-training is a very effective semisupervised approach, but it requires two views to be sufficient and independent, a condition that is difficult to meet in practical applications. To address this issue, we developed a semicorrelated co-training module to achieve the two-view condition. To mitigate the problem of limited labeled samples, both labeled and unlabeled samples are used as inputs for the semicorrelated co-training module. Benefiting from the self-sampling module and the semicorrelated co-training module, the experimental results based on different U. prolifera datasets from MODIS and Sentinel-1 synthetic aperture radar (SAR) show that the proposed model in the manuscript has contributed to the improvement of the detection accuracy of U.prolifera.
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2023.3299312