Predicting the Depression of the South Korean Elderly using SMOTE and an Imbalanced Binary Dataset

Since the number of healthy people is much more than that of ill people, it is highly likely that the problem of imbalanced data will occur when predicting the depression of the elderly living in the community using big data. When raw data are directly analyzed without using supplementary techniques...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of advanced computer science & applications Vol. 12; no. 1
Main Author Byeon, Haewon
Format Journal Article
LanguageEnglish
Published West Yorkshire Science and Information (SAI) Organization Limited 2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Since the number of healthy people is much more than that of ill people, it is highly likely that the problem of imbalanced data will occur when predicting the depression of the elderly living in the community using big data. When raw data are directly analyzed without using supplementary techniques such as a sample algorithm for datasets, which have imbalanced class ratios, it can decrease the performance of machine learning by causing prediction errors in the analysis process. Therefore, it is necessary to use a data sampling technique for overcoming this imbalanced data issue. As a result, this study tried to identify an effective way for processing imbalanced data to develop ensemble-based machine learning by comparing the performance of sampling methods using the depression data of the elderly living in South Korean communities, which had quite imbalanced class ratios. This study developed a model for predicting the depression of the elderly living in the community using a logistic regression model, gradient boosting machine (GBM), and random forest, and compared the accuracy, sensitivity, and specificity of them to evaluate the prediction performance of them. This study analyzed 4,085 elderly people (≥60 years old) living in the community. The depression data of the elderly in the community used in this study had an unbalance issue: the result of the depression screening test showed that 87.5% of subjects did not have depression, while 12.5% of them had depression. This study used oversampling, undersampling, and SMOTE methods to overcome the unbalance problem of the binary dataset, and the prediction performance (accuracy, sensitivity, and specificity) of each sampling method was compared. The results of this study confirmed that the SMOTE-based random forest algorithm showing the highest accuracy (a sensitivity ≥ 0.6 and a specificity ≥ 0.6) was best prediction performance among random forest, GBM, and logistic regression analysis. Further studies are needed to compare the accuracy of SMOTE, undersampling, and oversampling for imbalanced data with high dimensional y-variables.
AbstractList Since the number of healthy people is much more than that of ill people, it is highly likely that the problem of imbalanced data will occur when predicting the depression of the elderly living in the community using big data. When raw data are directly analyzed without using supplementary techniques such as a sample algorithm for datasets, which have imbalanced class ratios, it can decrease the performance of machine learning by causing prediction errors in the analysis process. Therefore, it is necessary to use a data sampling technique for overcoming this imbalanced data issue. As a result, this study tried to identify an effective way for processing imbalanced data to develop ensemble-based machine learning by comparing the performance of sampling methods using the depression data of the elderly living in South Korean communities, which had quite imbalanced class ratios. This study developed a model for predicting the depression of the elderly living in the community using a logistic regression model, gradient boosting machine (GBM), and random forest, and compared the accuracy, sensitivity, and specificity of them to evaluate the prediction performance of them. This study analyzed 4,085 elderly people (≥60 years old) living in the community. The depression data of the elderly in the community used in this study had an unbalance issue: the result of the depression screening test showed that 87.5% of subjects did not have depression, while 12.5% of them had depression. This study used oversampling, undersampling, and SMOTE methods to overcome the unbalance problem of the binary dataset, and the prediction performance (accuracy, sensitivity, and specificity) of each sampling method was compared. The results of this study confirmed that the SMOTE-based random forest algorithm showing the highest accuracy (a sensitivity ≥ 0.6 and a specificity ≥ 0.6) was best prediction performance among random forest, GBM, and logistic regression analysis. Further studies are needed to compare the accuracy of SMOTE, undersampling, and oversampling for imbalanced data with high dimensional y-variables.
Author Byeon, Haewon
Author_xml – sequence: 1
  givenname: Haewon
  surname: Byeon
  fullname: Byeon, Haewon
BookMark eNp9UMtOAjEUbQwmIvIHLpq4HuyDzsMdAiqKwQRM3E06fciQocW2s-DvLY-VC29ycm9uzrk351yDjrFGAXCL0QAPWVrcz15H4-VoQBDBA4QJwhhdgC7BLE0Yy1DnOOcJRtnXFeh7v0GxaEHSnHZB9eGUrEWozTcMawUnaueU97U10OrjZmnbsIZv1ilu4LSRyjV72PqDYPm-WE0hNzICzrYVb7gRSsLH2nC3hxMeuFfhBlxq3njVP_ce-HyarsYvyXzxPBuP5omghIWkwrwaKhGR5kJTWkisRUYlFySrskIQVuRVTotcSsKzjFOkRV5pmopUslRj2gN3p7s7Z39a5UO5sa0z8WVJUsYwLkg-jKyHE0s4671TuhR14CEaDo7XTYlReYy1PMVaHmItz7FG8fCPeOfqbfT6v-wXopJ9MQ
CitedBy_id crossref_primary_10_3390_asi5060120
crossref_primary_10_1007_s42001_024_00356_6
crossref_primary_10_5498_wjp_v12_i2_204
ContentType Journal Article
Copyright 2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
3V.
7XB
8FE
8FG
8FK
8G5
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
GUQSH
HCIFZ
JQ2
K7-
M2O
MBDVC
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.14569/IJACSA.2021.0120110
DatabaseName CrossRef
ProQuest Central (Corporate)
ProQuest Central (purchase pre-March 2016)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Research Library
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection (ProQuest)
ProQuest One Community College
ProQuest Central
ProQuest Central Student
Research Library Prep
ProQuest SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Research Library
Research Library (Corporate)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
Publicly Available Content Database
Research Library Prep
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
Research Library (Alumni Edition)
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Research Library
ProQuest Central (New)
Advanced Technologies & Aerospace Collection
ProQuest Central Basic
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2156-5570
ExternalDocumentID 10_14569_IJACSA_2021_0120110
GroupedDBID .DC
5VS
8G5
AAYXX
ABUWG
ADMLS
AFKRA
ALMA_UNASSIGNED_HOLDINGS
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
CITATION
DWQXO
EBS
EJD
GNUQQ
GUQSH
HCIFZ
K7-
KQ8
M2O
OK1
PHGZM
PHGZT
PIMPY
RNS
3V.
7XB
8FE
8FG
8FK
JQ2
MBDVC
P62
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c325t-b1ab4ecb4e68cf339d1fc73dac27b79c2598b8398dd2a77a30fc8bf36c6d56f13
IEDL.DBID BENPR
ISSN 2158-107X
IngestDate Fri Jul 25 04:05:10 EDT 2025
Tue Jul 01 01:10:02 EDT 2025
Thu Apr 24 23:09:16 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c325t-b1ab4ecb4e68cf339d1fc73dac27b79c2598b8398dd2a77a30fc8bf36c6d56f13
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/2655119284?pq-origsite=%requestingapplication%
PQID 2655119284
PQPubID 5444811
ParticipantIDs proquest_journals_2655119284
crossref_citationtrail_10_14569_IJACSA_2021_0120110
crossref_primary_10_14569_IJACSA_2021_0120110
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-00-00
20210101
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – year: 2021
  text: 2021-00-00
PublicationDecade 2020
PublicationPlace West Yorkshire
PublicationPlace_xml – name: West Yorkshire
PublicationTitle International journal of advanced computer science & applications
PublicationYear 2021
Publisher Science and Information (SAI) Organization Limited
Publisher_xml – name: Science and Information (SAI) Organization Limited
SSID ssj0000392683
Score 2.1997132
Snippet Since the number of healthy people is much more than that of ill people, it is highly likely that the problem of imbalanced data will occur when predicting the...
SourceID proquest
crossref
SourceType Aggregation Database
Enrichment Source
Index Database
SubjectTerms Accuracy
Algorithms
Big Data
Data sampling
Datasets
Machine learning
Medical screening
Older people
Oversampling
Performance evaluation
Performance prediction
Regression analysis
Regression models
Sampling methods
Title Predicting the Depression of the South Korean Elderly using SMOTE and an Imbalanced Binary Dataset
URI https://www.proquest.com/docview/2655119284
Volume 12
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV05T8UwDI44FhZuxK0MrIHXpG3aCT3gPS5xiEN6W5U4DQv0cZSBf4_dphwLDJGqXoOT2J8d2x9jOxFotGo2Fy6zIND_8sL43AgwKo5zD7lsE2Qv05P7-GyUjELA7S2kVXY6sVHUbgwUI9-TaUJHXqhN959fBLFG0elqoNCYZNOogjN0vqYPBpfXN19Rlh6a_7TpxYmmjfqY6lGon0PgkO-dnvUPb_voJcpol6pIIyqk_WmffqvnxuYM59lsAIu8387uApsoq0U21xEx8LAvl5i9fqXzFspg5gjo-FGX3lrxsW_uNEx5_HyMELHiA2LmfvzglPP-wG8vru4G3FQOBz99spTrCKXjB02lLj8yNRq6epndDwd3hycikCcIUDKphY2MjUvAkWbglcpd5EErZ0Bqq3NAtyeziI4y56TR2qieh8x6lULqktRHaoVNVeOqXGU80YBuoC17qpRx6nRmdFxKkLm0eBnFa0x1IisgdBYngovHgjwMEnTRCrogQRdB0GtMfH313HbW-Of9zW42irDP3orvVbH-9-MNNkM_a4Mnm2yqfn0vtxBO1HabTWbD4-2wcj4B3uTHGA
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELYoHNpLC32oUKA-tEeXjZ3EyaFCC7vLLgu0Eou0t9Qe273QLI9UFX-qv5GZPChcyomDpSiJfRiPZ74Zz4OxTxFo1Go2Fy6zIND-CsKE3AgwKo7zALlsAmRP0vFZfDhP5kvsb5cLQ2GVnUysBbVbAPnId2Sa0JUXStPdi0tBXaPodrVrodGwxdTf_EGT7frrZID7-1nK0XC2PxZtVwEBSiaVsJGxsQccaQZBqdxFAbRyBqS2Oge0BzKLsCFzThqtjeoFyGxQKaQuSUOkcN1nbCXGmXSistHBnU-nh2AjrSt_oiKlqql63mbrIUzJdyaH_f3TPtqkMvpCOasRpe3e14YPlUGt4Uar7GULTXm_4aU1tuTL1-xV1_aBt1LgDbPfr-h2h-KlOcJHPuiCaUu-CPWbui8fny4QkJZ8SH3Az284Rdj_5KfH32ZDbkqHg09-WYqsBO_4Xp0XzAemQrVavWVnT0LUd2y5XJT-PeOJBjQ6re8pL-PU6czo2EuQubT4GMXrTHUkK6CtY07tNM4LsmeI0EVD6IIIXbSEXmfibtZFU8fjkf83u90o2lN9XfzjwY3_f_7Ino9nx0fF0eRk-oG9oIUbt80mW66ufvstBDKV3a65h7MfT82utzpvA4M
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Predicting+the+Depression+of+the+South+Korean+Elderly+using+SMOTE+and+an+Imbalanced+Binary+Dataset&rft.jtitle=International+journal+of+advanced+computer+science+%26+applications&rft.au=Byeon%2C+Haewon&rft.date=2021&rft.issn=2158-107X&rft.eissn=2156-5570&rft.volume=12&rft.issue=1&rft_id=info:doi/10.14569%2FIJACSA.2021.0120110&rft.externalDBID=n%2Fa&rft.externalDocID=10_14569_IJACSA_2021_0120110
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2158-107X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2158-107X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2158-107X&client=summon