Breast Cancer Classification Using Outlier Detection and Variance Inflation Factor
In terms of malignant tumors, breast cancer is one of the most prevalent. Breast cancer is a form of cancer that develops in the breast tissue when the surrounding, healthy breast tissue is overtaken by the uncontrollably growing cells in the breast tissue. Several features or patient conditions can...
Saved in:
Published in | Engineering, MAthematics and Computer Science (EMACS) Journal Vol. 5; no. 1; pp. 17 - 23 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
31.01.2023
|
Online Access | Get full text |
ISSN | 2686-2573 2686-2573 |
DOI | 10.21512/emacsjournal.v5i1.9223 |
Cover
Abstract | In terms of malignant tumors, breast cancer is one of the most prevalent. Breast cancer is a form of cancer that develops in the breast tissue when the surrounding, healthy breast tissue is overtaken by the uncontrollably growing cells in the breast tissue. Several features or patient conditions can be used in a machine learning approach to predict breast cancer. Machine learning will be utilized in these situations to determine if the cancer is malignant or benign. The Wisconsin Breast Cancer (Diagnostic) Data Set, which contains 32 characteristics and 569 collected data, was the dataset used in this research.. Feature selection in this study is done by eliminating outliers using the upper and lower quartile of each feature then feature selection is also carried out on features that have features that have a high variance inflation factor. The machine learning methods used in this research are Logistic Regression, Random Forest, KNN, SVC, XG Boost, Gradient Boosting, and Ridge Classifier. The selection of this method is based on the target that will be predicted by 2 labels, namely benign cancer, and malignant cancer. The result obtained is that the selection of features using the variance inflation factor increases the accuracy of the previous Logistic Regression and Random Forest methods from 98.25% to 99.12%. The method that has the highest level of accuracy is the Logistic Regression and Random Forest methods which have a value of 99.12%. The next research will be developed by trying other optimization techniques for hyperparameter tuning. |
---|---|
AbstractList | In terms of malignant tumors, breast cancer is one of the most prevalent. Breast cancer is a form of cancer that develops in the breast tissue when the surrounding, healthy breast tissue is overtaken by the uncontrollably growing cells in the breast tissue. Several features or patient conditions can be used in a machine learning approach to predict breast cancer. Machine learning will be utilized in these situations to determine if the cancer is malignant or benign. The Wisconsin Breast Cancer (Diagnostic) Data Set, which contains 32 characteristics and 569 collected data, was the dataset used in this research.. Feature selection in this study is done by eliminating outliers using the upper and lower quartile of each feature then feature selection is also carried out on features that have features that have a high variance inflation factor. The machine learning methods used in this research are Logistic Regression, Random Forest, KNN, SVC, XG Boost, Gradient Boosting, and Ridge Classifier. The selection of this method is based on the target that will be predicted by 2 labels, namely benign cancer, and malignant cancer. The result obtained is that the selection of features using the variance inflation factor increases the accuracy of the previous Logistic Regression and Random Forest methods from 98.25% to 99.12%. The method that has the highest level of accuracy is the Logistic Regression and Random Forest methods which have a value of 99.12%. The next research will be developed by trying other optimization techniques for hyperparameter tuning. |
Author | Juarto, Budi |
Author_xml | – sequence: 1 givenname: Budi surname: Juarto fullname: Juarto, Budi |
BookMark | eNqFkMFOwzAMQCM0JMbYN9AfaGmSpm0OHKAwmDRpEmJcK8dNUVCXoiRD4u9pOw4TF062bD_LfpdkZnurCbmmacKooOxG7wH9R39wFrrkSxiaSMb4GZmzvMxjJgo-O8kvyNJ7o9IsK7iQlM3Jy73T4ENUgUXtoqqDYaA1CMH0Ntp5Y9-j7SF0Zmg-6KBxqoNtojdwZoSitW274_gKMPTuipy30Hm9_I0Lsls9vlbP8Wb7tK7uNjEOl_OYlbJtBLIikyzPFIUGJec8baiWAjSVJXJaqFIJWcoybzJQPGt5oRuFiErwBSmOe9H13jvd1p_O7MF91zStJzv1qZ16tFOPdgby9g-JJkwvBAem-5f_Ae7BdZM |
CitedBy_id | crossref_primary_10_59652_jeime_v2i4_353 crossref_primary_10_1016_j_measen_2023_100901 crossref_primary_10_34248_bsengineering_1387431 crossref_primary_10_1016_j_tfp_2025_100774 |
ContentType | Journal Article |
DBID | AAYXX CITATION |
DOI | 10.21512/emacsjournal.v5i1.9223 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2686-2573 |
EndPage | 23 |
ExternalDocumentID | 10_21512_emacsjournal_v5i1_9223 |
GroupedDBID | AAYXX CITATION M~E |
ID | FETCH-LOGICAL-c1513-289fd5c2749264b1adc93330d1e95ae198c317b8b598986d4ab34f37edbcccb53 |
ISSN | 2686-2573 |
IngestDate | Thu Apr 24 22:58:55 EDT 2025 Tue Jul 01 03:10:31 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Issue | 1 |
Language | English |
License | https://creativecommons.org/licenses/by-sa/4.0 |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c1513-289fd5c2749264b1adc93330d1e95ae198c317b8b598986d4ab34f37edbcccb53 |
OpenAccessLink | https://journal.binus.ac.id/index.php/EMACS/article/download/9223/4690 |
PageCount | 7 |
ParticipantIDs | crossref_primary_10_21512_emacsjournal_v5i1_9223 crossref_citationtrail_10_21512_emacsjournal_v5i1_9223 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2023-01-31 |
PublicationDateYYYYMMDD | 2023-01-31 |
PublicationDate_xml | – month: 01 year: 2023 text: 2023-01-31 day: 31 |
PublicationDecade | 2020 |
PublicationTitle | Engineering, MAthematics and Computer Science (EMACS) Journal |
PublicationYear | 2023 |
SSID | ssib044735912 |
Score | 1.8228589 |
Snippet | In terms of malignant tumors, breast cancer is one of the most prevalent. Breast cancer is a form of cancer that develops in the breast tissue when the... |
SourceID | crossref |
SourceType | Enrichment Source Index Database |
StartPage | 17 |
Title | Breast Cancer Classification Using Outlier Detection and Variance Inflation Factor |
Volume | 5 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1JS8QwFA4uFy-iqLiTg7eh47RpuhzHcUSFUXDDW8lWELSKdjx48Lf7XtJ2qg64XMpQyCOT7-vLy8tbCNkTOWxLQZR7wF3mhVIpL015z_Ol70dCCaEFOvRHZ9HxdXh6y28noUM2u6SUXfU2Na_kP6jCO8AVs2T_gGwjFF7Ab8AXnoAwPH-F8QFGlJedASL37PpbYuSPw9QFA5yPy3tMKDk0pam6ghe6cwMnZJsrcFLkLhiuc2Qb73xy1E9KFVq3ab8u8OrKOtf9IBr1ALbqcNQfXKKj4bQ1fYzPwdBR65Q9GOu7tqchwDirWkVbhRRESeTBJ-4UkpnyrtKo_BtxnHZ0WZrVPuvSjL9qcGuBIJgPQr1US9195Xd-Nw0CNtm06ov6L3tZE2EIZxsrKmsLylBQhoJmyXwQx_Zef_Q-rBVQiH2YU3tD3vwvFxNoZe1Pn1TLommZJldLZLFaa9p3BFkmM6ZYIReOHNSRg34mB7XkoBU5aEMOCsDSmhy0IQd15Fgl10fDq8GxVzXQ8BTMl3lwmM41V0GMVSFD6QutUsZYT_sm5cL4aaLAfJSJ5NhFNNKhkCzMWWw0fLNKcrZG5orHwqwTmmglQU7Ok4iFsWbCqFjG2A0hlUnPqA0S1YuQqaq6PDY5uc9-QGKD9JqBT67Ayk9DNv8-ZIssTBi9TebK57HZAWuylLuWAR96-3wE |
linkProvider | ISSN International Centre |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Breast+Cancer+Classification+Using+Outlier+Detection+and+Variance+Inflation+Factor&rft.jtitle=Engineering%2C+MAthematics+and+Computer+Science+%28EMACS%29+Journal&rft.au=Juarto%2C+Budi&rft.date=2023-01-31&rft.issn=2686-2573&rft.eissn=2686-2573&rft.volume=5&rft.issue=1&rft.spage=17&rft.epage=23&rft_id=info:doi/10.21512%2Femacsjournal.v5i1.9223&rft.externalDBID=n%2Fa&rft.externalDocID=10_21512_emacsjournal_v5i1_9223 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2686-2573&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2686-2573&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2686-2573&client=summon |