Breast Cancer Classification Using Outlier Detection and Variance Inflation Factor

In terms of malignant tumors, breast cancer is one of the most prevalent. Breast cancer is a form of cancer that develops in the breast tissue when the surrounding, healthy breast tissue is overtaken by the uncontrollably growing cells in the breast tissue. Several features or patient conditions can...

Full description

Saved in:
Bibliographic Details
Published inEngineering, MAthematics and Computer Science (EMACS) Journal Vol. 5; no. 1; pp. 17 - 23
Main Author Juarto, Budi
Format Journal Article
LanguageEnglish
Published 31.01.2023
Online AccessGet full text
ISSN2686-2573
2686-2573
DOI10.21512/emacsjournal.v5i1.9223

Cover

Abstract In terms of malignant tumors, breast cancer is one of the most prevalent. Breast cancer is a form of cancer that develops in the breast tissue when the surrounding, healthy breast tissue is overtaken by the uncontrollably growing cells in the breast tissue. Several features or patient conditions can be used in a machine learning approach to predict breast cancer. Machine learning will be utilized in these situations to determine if the cancer is malignant or benign. The Wisconsin Breast Cancer (Diagnostic) Data Set, which contains 32 characteristics and 569 collected data, was the dataset used in this research.. Feature selection in this study is done by eliminating outliers using the upper and lower quartile of each feature then feature selection is also carried out on features that have features that have a high variance inflation factor. The machine learning methods used in this research are Logistic Regression, Random Forest, KNN, SVC, XG Boost, Gradient Boosting, and Ridge Classifier. The selection of this method is based on the target that will be predicted by 2 labels, namely benign cancer, and malignant cancer. The result obtained is that the selection of features using the variance inflation factor increases the accuracy of the previous Logistic Regression and Random Forest methods from 98.25% to 99.12%. The method that has the highest level of accuracy is the Logistic Regression and Random Forest methods which have a value of 99.12%. The next research will be developed by trying other optimization techniques for hyperparameter tuning.
AbstractList In terms of malignant tumors, breast cancer is one of the most prevalent. Breast cancer is a form of cancer that develops in the breast tissue when the surrounding, healthy breast tissue is overtaken by the uncontrollably growing cells in the breast tissue. Several features or patient conditions can be used in a machine learning approach to predict breast cancer. Machine learning will be utilized in these situations to determine if the cancer is malignant or benign. The Wisconsin Breast Cancer (Diagnostic) Data Set, which contains 32 characteristics and 569 collected data, was the dataset used in this research.. Feature selection in this study is done by eliminating outliers using the upper and lower quartile of each feature then feature selection is also carried out on features that have features that have a high variance inflation factor. The machine learning methods used in this research are Logistic Regression, Random Forest, KNN, SVC, XG Boost, Gradient Boosting, and Ridge Classifier. The selection of this method is based on the target that will be predicted by 2 labels, namely benign cancer, and malignant cancer. The result obtained is that the selection of features using the variance inflation factor increases the accuracy of the previous Logistic Regression and Random Forest methods from 98.25% to 99.12%. The method that has the highest level of accuracy is the Logistic Regression and Random Forest methods which have a value of 99.12%. The next research will be developed by trying other optimization techniques for hyperparameter tuning.
Author Juarto, Budi
Author_xml – sequence: 1
  givenname: Budi
  surname: Juarto
  fullname: Juarto, Budi
BookMark eNqFkMFOwzAMQCM0JMbYN9AfaGmSpm0OHKAwmDRpEmJcK8dNUVCXoiRD4u9pOw4TF062bD_LfpdkZnurCbmmacKooOxG7wH9R39wFrrkSxiaSMb4GZmzvMxjJgo-O8kvyNJ7o9IsK7iQlM3Jy73T4ENUgUXtoqqDYaA1CMH0Ntp5Y9-j7SF0Zmg-6KBxqoNtojdwZoSitW274_gKMPTuipy30Hm9_I0Lsls9vlbP8Wb7tK7uNjEOl_OYlbJtBLIikyzPFIUGJec8baiWAjSVJXJaqFIJWcoybzJQPGt5oRuFiErwBSmOe9H13jvd1p_O7MF91zStJzv1qZ16tFOPdgby9g-JJkwvBAem-5f_Ae7BdZM
CitedBy_id crossref_primary_10_59652_jeime_v2i4_353
crossref_primary_10_1016_j_measen_2023_100901
crossref_primary_10_34248_bsengineering_1387431
crossref_primary_10_1016_j_tfp_2025_100774
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.21512/emacsjournal.v5i1.9223
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2686-2573
EndPage 23
ExternalDocumentID 10_21512_emacsjournal_v5i1_9223
GroupedDBID AAYXX
CITATION
M~E
ID FETCH-LOGICAL-c1513-289fd5c2749264b1adc93330d1e95ae198c317b8b598986d4ab34f37edbcccb53
ISSN 2686-2573
IngestDate Thu Apr 24 22:58:55 EDT 2025
Tue Jul 01 03:10:31 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Issue 1
Language English
License https://creativecommons.org/licenses/by-sa/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c1513-289fd5c2749264b1adc93330d1e95ae198c317b8b598986d4ab34f37edbcccb53
OpenAccessLink https://journal.binus.ac.id/index.php/EMACS/article/download/9223/4690
PageCount 7
ParticipantIDs crossref_primary_10_21512_emacsjournal_v5i1_9223
crossref_citationtrail_10_21512_emacsjournal_v5i1_9223
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-01-31
PublicationDateYYYYMMDD 2023-01-31
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-31
  day: 31
PublicationDecade 2020
PublicationTitle Engineering, MAthematics and Computer Science (EMACS) Journal
PublicationYear 2023
SSID ssib044735912
Score 1.8228589
Snippet In terms of malignant tumors, breast cancer is one of the most prevalent. Breast cancer is a form of cancer that develops in the breast tissue when the...
SourceID crossref
SourceType Enrichment Source
Index Database
StartPage 17
Title Breast Cancer Classification Using Outlier Detection and Variance Inflation Factor
Volume 5
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1JS8QwFA4uFy-iqLiTg7eh47RpuhzHcUSFUXDDW8lWELSKdjx48Lf7XtJ2qg64XMpQyCOT7-vLy8tbCNkTOWxLQZR7wF3mhVIpL015z_Ol70dCCaEFOvRHZ9HxdXh6y28noUM2u6SUXfU2Na_kP6jCO8AVs2T_gGwjFF7Ab8AXnoAwPH-F8QFGlJedASL37PpbYuSPw9QFA5yPy3tMKDk0pam6ghe6cwMnZJsrcFLkLhiuc2Qb73xy1E9KFVq3ab8u8OrKOtf9IBr1ALbqcNQfXKKj4bQ1fYzPwdBR65Q9GOu7tqchwDirWkVbhRRESeTBJ-4UkpnyrtKo_BtxnHZ0WZrVPuvSjL9qcGuBIJgPQr1US9195Xd-Nw0CNtm06ov6L3tZE2EIZxsrKmsLylBQhoJmyXwQx_Zef_Q-rBVQiH2YU3tD3vwvFxNoZe1Pn1TLommZJldLZLFaa9p3BFkmM6ZYIReOHNSRg34mB7XkoBU5aEMOCsDSmhy0IQd15Fgl10fDq8GxVzXQ8BTMl3lwmM41V0GMVSFD6QutUsZYT_sm5cL4aaLAfJSJ5NhFNNKhkCzMWWw0fLNKcrZG5orHwqwTmmglQU7Ok4iFsWbCqFjG2A0hlUnPqA0S1YuQqaq6PDY5uc9-QGKD9JqBT67Ayk9DNv8-ZIssTBi9TebK57HZAWuylLuWAR96-3wE
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Breast+Cancer+Classification+Using+Outlier+Detection+and+Variance+Inflation+Factor&rft.jtitle=Engineering%2C+MAthematics+and+Computer+Science+%28EMACS%29+Journal&rft.au=Juarto%2C+Budi&rft.date=2023-01-31&rft.issn=2686-2573&rft.eissn=2686-2573&rft.volume=5&rft.issue=1&rft.spage=17&rft.epage=23&rft_id=info:doi/10.21512%2Femacsjournal.v5i1.9223&rft.externalDBID=n%2Fa&rft.externalDocID=10_21512_emacsjournal_v5i1_9223
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2686-2573&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2686-2573&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2686-2573&client=summon