Predicting the Depression of the South Korean Elderly using SMOTE and an Imbalanced Binary Dataset
Since the number of healthy people is much more than that of ill people, it is highly likely that the problem of imbalanced data will occur when predicting the depression of the elderly living in the community using big data. When raw data are directly analyzed without using supplementary techniques...
Saved in:
Published in | International journal of advanced computer science & applications Vol. 12; no. 1 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
West Yorkshire
Science and Information (SAI) Organization Limited
2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Since the number of healthy people is much more than that of ill people, it is highly likely that the problem of imbalanced data will occur when predicting the depression of the elderly living in the community using big data. When raw data are directly analyzed without using supplementary techniques such as a sample algorithm for datasets, which have imbalanced class ratios, it can decrease the performance of machine learning by causing prediction errors in the analysis process. Therefore, it is necessary to use a data sampling technique for overcoming this imbalanced data issue. As a result, this study tried to identify an effective way for processing imbalanced data to develop ensemble-based machine learning by comparing the performance of sampling methods using the depression data of the elderly living in South Korean communities, which had quite imbalanced class ratios. This study developed a model for predicting the depression of the elderly living in the community using a logistic regression model, gradient boosting machine (GBM), and random forest, and compared the accuracy, sensitivity, and specificity of them to evaluate the prediction performance of them. This study analyzed 4,085 elderly people (≥60 years old) living in the community. The depression data of the elderly in the community used in this study had an unbalance issue: the result of the depression screening test showed that 87.5% of subjects did not have depression, while 12.5% of them had depression. This study used oversampling, undersampling, and SMOTE methods to overcome the unbalance problem of the binary dataset, and the prediction performance (accuracy, sensitivity, and specificity) of each sampling method was compared. The results of this study confirmed that the SMOTE-based random forest algorithm showing the highest accuracy (a sensitivity ≥ 0.6 and a specificity ≥ 0.6) was best prediction performance among random forest, GBM, and logistic regression analysis. Further studies are needed to compare the accuracy of SMOTE, undersampling, and oversampling for imbalanced data with high dimensional y-variables. |
---|---|
AbstractList | Since the number of healthy people is much more than that of ill people, it is highly likely that the problem of imbalanced data will occur when predicting the depression of the elderly living in the community using big data. When raw data are directly analyzed without using supplementary techniques such as a sample algorithm for datasets, which have imbalanced class ratios, it can decrease the performance of machine learning by causing prediction errors in the analysis process. Therefore, it is necessary to use a data sampling technique for overcoming this imbalanced data issue. As a result, this study tried to identify an effective way for processing imbalanced data to develop ensemble-based machine learning by comparing the performance of sampling methods using the depression data of the elderly living in South Korean communities, which had quite imbalanced class ratios. This study developed a model for predicting the depression of the elderly living in the community using a logistic regression model, gradient boosting machine (GBM), and random forest, and compared the accuracy, sensitivity, and specificity of them to evaluate the prediction performance of them. This study analyzed 4,085 elderly people (≥60 years old) living in the community. The depression data of the elderly in the community used in this study had an unbalance issue: the result of the depression screening test showed that 87.5% of subjects did not have depression, while 12.5% of them had depression. This study used oversampling, undersampling, and SMOTE methods to overcome the unbalance problem of the binary dataset, and the prediction performance (accuracy, sensitivity, and specificity) of each sampling method was compared. The results of this study confirmed that the SMOTE-based random forest algorithm showing the highest accuracy (a sensitivity ≥ 0.6 and a specificity ≥ 0.6) was best prediction performance among random forest, GBM, and logistic regression analysis. Further studies are needed to compare the accuracy of SMOTE, undersampling, and oversampling for imbalanced data with high dimensional y-variables. |
Author | Byeon, Haewon |
Author_xml | – sequence: 1 givenname: Haewon surname: Byeon fullname: Byeon, Haewon |
BookMark | eNp9UMtOAjEUbQwmIvIHLpq4HuyDzsMdAiqKwQRM3E06fciQocW2s-DvLY-VC29ycm9uzrk351yDjrFGAXCL0QAPWVrcz15H4-VoQBDBA4QJwhhdgC7BLE0Yy1DnOOcJRtnXFeh7v0GxaEHSnHZB9eGUrEWozTcMawUnaueU97U10OrjZmnbsIZv1ilu4LSRyjV72PqDYPm-WE0hNzICzrYVb7gRSsLH2nC3hxMeuFfhBlxq3njVP_ce-HyarsYvyXzxPBuP5omghIWkwrwaKhGR5kJTWkisRUYlFySrskIQVuRVTotcSsKzjFOkRV5pmopUslRj2gN3p7s7Z39a5UO5sa0z8WVJUsYwLkg-jKyHE0s4671TuhR14CEaDo7XTYlReYy1PMVaHmItz7FG8fCPeOfqbfT6v-wXopJ9MQ |
CitedBy_id | crossref_primary_10_3390_asi5060120 crossref_primary_10_1007_s42001_024_00356_6 crossref_primary_10_5498_wjp_v12_i2_204 |
ContentType | Journal Article |
Copyright | 2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 3V. 7XB 8FE 8FG 8FK 8G5 ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ GUQSH HCIFZ JQ2 K7- M2O MBDVC P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U |
DOI | 10.14569/IJACSA.2021.0120110 |
DatabaseName | CrossRef ProQuest Central (Corporate) ProQuest Central (purchase pre-March 2016) ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Research Library ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection (ProQuest) ProQuest One Community College ProQuest Central ProQuest Central Student Research Library Prep ProQuest SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Research Library Research Library (Corporate) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic |
DatabaseTitle | CrossRef Publicly Available Content Database Research Library Prep Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College Research Library (Alumni Edition) ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Research Library ProQuest Central (New) Advanced Technologies & Aerospace Collection ProQuest Central Basic ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 2156-5570 |
ExternalDocumentID | 10_14569_IJACSA_2021_0120110 |
GroupedDBID | .DC 5VS 8G5 AAYXX ABUWG ADMLS AFKRA ALMA_UNASSIGNED_HOLDINGS ARAPS AZQEC BENPR BGLVJ CCPQU CITATION DWQXO EBS EJD GNUQQ GUQSH HCIFZ K7- KQ8 M2O OK1 PHGZM PHGZT PIMPY RNS 3V. 7XB 8FE 8FG 8FK JQ2 MBDVC P62 PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U |
ID | FETCH-LOGICAL-c325t-b1ab4ecb4e68cf339d1fc73dac27b79c2598b8398dd2a77a30fc8bf36c6d56f13 |
IEDL.DBID | BENPR |
ISSN | 2158-107X |
IngestDate | Fri Jul 25 04:05:10 EDT 2025 Tue Jul 01 01:10:02 EDT 2025 Thu Apr 24 23:09:16 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 1 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c325t-b1ab4ecb4e68cf339d1fc73dac27b79c2598b8398dd2a77a30fc8bf36c6d56f13 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
OpenAccessLink | https://www.proquest.com/docview/2655119284?pq-origsite=%requestingapplication% |
PQID | 2655119284 |
PQPubID | 5444811 |
ParticipantIDs | proquest_journals_2655119284 crossref_citationtrail_10_14569_IJACSA_2021_0120110 crossref_primary_10_14569_IJACSA_2021_0120110 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2021-00-00 20210101 |
PublicationDateYYYYMMDD | 2021-01-01 |
PublicationDate_xml | – year: 2021 text: 2021-00-00 |
PublicationDecade | 2020 |
PublicationPlace | West Yorkshire |
PublicationPlace_xml | – name: West Yorkshire |
PublicationTitle | International journal of advanced computer science & applications |
PublicationYear | 2021 |
Publisher | Science and Information (SAI) Organization Limited |
Publisher_xml | – name: Science and Information (SAI) Organization Limited |
SSID | ssj0000392683 |
Score | 2.1997132 |
Snippet | Since the number of healthy people is much more than that of ill people, it is highly likely that the problem of imbalanced data will occur when predicting the... |
SourceID | proquest crossref |
SourceType | Aggregation Database Enrichment Source Index Database |
SubjectTerms | Accuracy Algorithms Big Data Data sampling Datasets Machine learning Medical screening Older people Oversampling Performance evaluation Performance prediction Regression analysis Regression models Sampling methods |
Title | Predicting the Depression of the South Korean Elderly using SMOTE and an Imbalanced Binary Dataset |
URI | https://www.proquest.com/docview/2655119284 |
Volume | 12 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV05T8UwDI44FhZuxK0MrIHXpG3aCT3gPS5xiEN6W5U4DQv0cZSBf4_dphwLDJGqXoOT2J8d2x9jOxFotGo2Fy6zIND_8sL43AgwKo5zD7lsE2Qv05P7-GyUjELA7S2kVXY6sVHUbgwUI9-TaUJHXqhN959fBLFG0elqoNCYZNOogjN0vqYPBpfXN19Rlh6a_7TpxYmmjfqY6lGon0PgkO-dnvUPb_voJcpol6pIIyqk_WmffqvnxuYM59lsAIu8387uApsoq0U21xEx8LAvl5i9fqXzFspg5gjo-FGX3lrxsW_uNEx5_HyMELHiA2LmfvzglPP-wG8vru4G3FQOBz99spTrCKXjB02lLj8yNRq6epndDwd3hycikCcIUDKphY2MjUvAkWbglcpd5EErZ0Bqq3NAtyeziI4y56TR2qieh8x6lULqktRHaoVNVeOqXGU80YBuoC17qpRx6nRmdFxKkLm0eBnFa0x1IisgdBYngovHgjwMEnTRCrogQRdB0GtMfH313HbW-Of9zW42irDP3orvVbH-9-MNNkM_a4Mnm2yqfn0vtxBO1HabTWbD4-2wcj4B3uTHGA |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELYoHNpLC32oUKA-tEeXjZ3EyaFCC7vLLgu0Eou0t9Qe273QLI9UFX-qv5GZPChcyomDpSiJfRiPZ74Zz4OxTxFo1Go2Fy6zIND-CsKE3AgwKo7zALlsAmRP0vFZfDhP5kvsb5cLQ2GVnUysBbVbAPnId2Sa0JUXStPdi0tBXaPodrVrodGwxdTf_EGT7frrZID7-1nK0XC2PxZtVwEBSiaVsJGxsQccaQZBqdxFAbRyBqS2Oge0BzKLsCFzThqtjeoFyGxQKaQuSUOkcN1nbCXGmXSistHBnU-nh2AjrSt_oiKlqql63mbrIUzJdyaH_f3TPtqkMvpCOasRpe3e14YPlUGt4Uar7GULTXm_4aU1tuTL1-xV1_aBt1LgDbPfr-h2h-KlOcJHPuiCaUu-CPWbui8fny4QkJZ8SH3Az284Rdj_5KfH32ZDbkqHg09-WYqsBO_4Xp0XzAemQrVavWVnT0LUd2y5XJT-PeOJBjQ6re8pL-PU6czo2EuQubT4GMXrTHUkK6CtY07tNM4LsmeI0EVD6IIIXbSEXmfibtZFU8fjkf83u90o2lN9XfzjwY3_f_7Ino9nx0fF0eRk-oG9oIUbt80mW66ufvstBDKV3a65h7MfT82utzpvA4M |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Predicting+the+Depression+of+the+South+Korean+Elderly+using+SMOTE+and+an+Imbalanced+Binary+Dataset&rft.jtitle=International+journal+of+advanced+computer+science+%26+applications&rft.au=Byeon%2C+Haewon&rft.date=2021&rft.issn=2158-107X&rft.eissn=2156-5570&rft.volume=12&rft.issue=1&rft_id=info:doi/10.14569%2FIJACSA.2021.0120110&rft.externalDBID=n%2Fa&rft.externalDocID=10_14569_IJACSA_2021_0120110 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2158-107X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2158-107X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2158-107X&client=summon |