한국프로야구 골든글러브 수상자 예측모델 비교

야구는 타 종목에 비해 기록이 다양하고 완벽하게 보존되며, 경기에서 일어난 모든 기록들을 통해 경기내용 및경기결과의 복기가 가능하다. 이처럼 다양한 기록이 존재하는 야구에서 각 포지션별로 최고의 선수에게 주어지는“골든글러브”상이 존재한다. 현재 골든글러브 수상은 취재기자, 중계 PD, 해설위원 등 미디어 관계자를 대상으로투표를 진행하다 보니 포지션별 명확한 기준 및 중요 변인을 알아보는데 한계가 있다. 따라서 본 연구에서는 2003 년~2022년까지 한국프로야구 골든글러브 후보 및 수상자의 기록을 기반으로 로지스틱 회귀분석과 머신러...

Full description

Saved in:

Bibliographic Details
Published in	한국체육측정평가학회지 Vol. 26; no. 4; pp. 69 - 81
Main Authors	권순규(Soongyu KWON), 최형준(Hyongjun CHOI)
Format	Journal Article
Language	Korean
Published	한국체육측정평가학회 01.12.2024
Subjects	체육 머신러닝 골든글러브 수상자 예측 한국프로야구 세이버메트릭스 Golden Glove Award Predictions Korean Professional Baseball Prediction Techniques Logistic Regression Analysis Machine Learning 예측기법 Sabermetrics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	야구는 타 종목에 비해 기록이 다양하고 완벽하게 보존되며, 경기에서 일어난 모든 기록들을 통해 경기내용 및경기결과의 복기가 가능하다. 이처럼 다양한 기록이 존재하는 야구에서 각 포지션별로 최고의 선수에게 주어지는“골든글러브”상이 존재한다. 현재 골든글러브 수상은 취재기자, 중계 PD, 해설위원 등 미디어 관계자를 대상으로투표를 진행하다 보니 포지션별 명확한 기준 및 중요 변인을 알아보는데 한계가 있다. 따라서 본 연구에서는 2003 년~2022년까지 한국프로야구 골든글러브 후보 및 수상자의 기록을 기반으로 로지스틱 회귀분석과 머신러닝을 활용한 골든글러브 수상자 예측모델을 설계하고 설계된 예측모델의 성능을 비교․분석하여 골든글러브 수상자 예측에적합한 모델을 알아보는 데 목적이 있다. 로지스틱 회귀분석과 서포트백터 머신, 랜덤포레스트, XGboost 모델을설계하고 각 모델별 하이퍼 파라미터를 제시하였다. 값을 표준 점수화하여 나타내는 z-score 값과 최대-최소 정규화를 하여 나타내는 min-max값을 사용하여 각각의 모델을 두 가지 형태로 나타냈으며, 모델별로 최적화 변인 탐색후 성능 평가를 실시하였다. 첫째, 한국 프로야구 골든글러브 예측 모델을 설계하는 데 있어 로지스틱 회귀 분석 모델에서는 L1, L2, elasticnet 이 커널(kernel)로 사용되었고, 서포트 벡터 머신 모델에서는 rbf, poly가 커널(kernel)로 사용되었으며, 비선형모델로서 중요 변인은 탐색하지 못하였다. 랜덤 포레스트 모델에서는 gini, entropy가 준거(criterion)로 사용되었으며, XGBoost 모델에서는 exact, approx, hist가 준거(criterion)로 사용되었다. F1 score가 높아지도록 하기 위하여 변인을 하나씩 제거하는 방식으로 진행하였고, 각 모델에서 포지션별 사용 변인은 모두 다르게 선정되었다. 둘째, 머신러닝 예측 모델의 예측 성능을 비교한 결과 각 포지션별 차이는 존재하지만 서포트 벡터 머신 모델2와 XGBoost 모델의 예측 정확도, 그리고 F1 score가 높게 나타났으며, 로지스틱 회귀분석 모델과 랜덤 포레스트모델의 정확도와 F1 score는 상대적으로 낮게 나타났다. 전체적으로 z-score 값으로 표준화 한 모델 1보다min-max로 표준화 한 모델 2의 예측 능력이 뛰어나게 나타났다. 따라서, 골든글러브 수상자를 예측하기 위해서는 min-max를 사용한 XGBoost 모델 2를 활용하여 예측하는 것이바람직하며, 추후 수비 기록을 포함하여 예측한다면 보다 뛰어난 결과가 나타날 것이라 사료된다. Baseball has a wide variety of statistics compared to other sports, and these records are perfectly preserved, allowing for a detailed review of game contents and outcomes based on the statistics of each game. In baseball, there is an award called the “Golden Glove” given to the best player in each position. The selection of the Golden Glove winners is based on votes from media personnel such as reporters, broadcast PDs, and commentators, making it difficult to establish clear criteria and important variables for each position. Therefore, the aim of this study is to design a predictive model for the Golden Glove winners based on the statistics of candidates and winners from the Korean professional baseball league between 2003 and 2022, using logistic regression analysis and machine learning techniques. The study also compares and analyzes the performance of the designed predictive models to identify the most suitable model for predicting the Golden Glove winners. The models include logistic regression, support vector machines, random forests, and XGBoost, with hyperparameters specified for each model. Z-scores, which standardize values, and min-max normalization, which scales values, were used to present the models in two different forms. After optimizing variables, the performance of each model was evaluated. First, in designing the Korean professional baseball Golden Glove prediction model, the logistic regression model used L1, L2, and elasticnet as kernels, the support vector machine model used rbf and poly as kernels, and no significant variables were identified in the nonlinear models. The random forest model used gini and entropy as criteria, and the XGBoost model used exact, approx, and hist as criteria. To increase the F1 score, variables were removed one by one, and the selected variables for each position differed between models. Second, the comparison of the machine learning prediction models revealed that although there were differences in performance across positions, the support vector machine model 2 and XGBoost model showed high prediction accuracy and F1 scores, while the logistic regression and random forest models showed relatively lower accuracy and F1 scores. Overall, the models that were standardized using min-max (model 2) demonstrated better prediction abilities than those standardized using z-scoring (model 1). Therefore, to predict the Golden Glove winners, it is advisable to use the XGBoost model 2 with min-max normalization, and it is believed that including defensive statistics in future models will yield even better results. KCI Citation Count: 0
ISSN:	1229-4225 2671-9134