UsIL-6: An unbalanced learning strategy for identifying IL-6 inducing peptides by undersampling technique
•We propose a bioinformatics tool (UsIL-6) for accurately identifying IL-6 inducing peptides.•The model is based on NearMiss3 undersampling technique, Boruta feature selection method and extreme randomization tree machine learning classification algorithm.•In order to better explain the correlation...
Saved in:
Published in | Computer methods and programs in biomedicine Vol. 250; p. 108176 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Ireland
Elsevier B.V
01.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •We propose a bioinformatics tool (UsIL-6) for accurately identifying IL-6 inducing peptides.•The model is based on NearMiss3 undersampling technique, Boruta feature selection method and extreme randomization tree machine learning classification algorithm.•In order to better explain the correlation between prediction results and each feature and understand the relationship between each feature and positive or negative class prediction, we used the framework called Shapley Additive Explanation (SHAP) to explain the output of the ML classifier.•UsIL-6 achieved 0.870 AUC and 0.808 BACC on independent test dataset, outperforming the state-of-the-art models.
Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application.
In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6.
The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition.
The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0169-2607 1872-7565 1872-7565 |
DOI: | 10.1016/j.cmpb.2024.108176 |