A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation
Abstract A number of machine learning (ML)-based algorithms have been proposed for predicting mutation-induced stability changes in proteins. In this critical review, we used hypothetical reverse mutations to evaluate the performance of five representative algorithms and found all of them suffer fro...
Saved in:
Published in | Briefings in bioinformatics Vol. 21; no. 4; pp. 1285 - 1292 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
15.07.2020
Oxford Publishing Limited (England) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Abstract
A number of machine learning (ML)-based algorithms have been proposed for predicting mutation-induced stability changes in proteins. In this critical review, we used hypothetical reverse mutations to evaluate the performance of five representative algorithms and found all of them suffer from the problem of overfitting. This approach is based on the fact that if a wild-type protein is more stable than a mutant protein, then the same mutant is less stable than the wild-type protein. We analyzed the underlying issues and suggest that the main causes of the overfitting problem include that the numbers of training cases were too small, and the features used in the models were not sufficiently informative for the task. We make recommendations on how to avoid overfitting in this important research area and improve the reliability and robustness of ML-based algorithms in general. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-3 content type line 23 ObjectType-Review-1 |
ISSN: | 1477-4054 1467-5463 1477-4054 |
DOI: | 10.1093/bib/bbz071 |