Detecting anomalous proteins using deep representations

Many advances in biomedicine can be attributed to identifying unusual proteins and genes. Many of these proteins’ unique properties were discovered by manual inspection, which is becoming infeasible at the scale of modern protein datasets. Here, we propose to tackle this challenge using anomaly dete...

Full description

Saved in:
Bibliographic Details
Published inNAR genomics and bioinformatics Vol. 6; no. 1; p. lqae021
Main Authors Michael-Pitschaze, Tomer, Cohen, Niv, Ofer, Dan, Hoshen, Yedid, Linial, Michal
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.03.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Many advances in biomedicine can be attributed to identifying unusual proteins and genes. Many of these proteins’ unique properties were discovered by manual inspection, which is becoming infeasible at the scale of modern protein datasets. Here, we propose to tackle this challenge using anomaly detection methods that automatically identify unexpected properties. We adopt a state-of-the-art anomaly detection paradigm from computer vision, to highlight unusual proteins. We generate meaningful representations without labeled inputs, using pretrained deep neural network models. We apply these protein language models (pLM) to detect anomalies in function, phylogenetic families, and segmentation tasks. We compute protein anomaly scores to highlight human prion-like proteins, distinguish viral proteins from their host proteome, and mark non-classical ion/metal binding proteins and enzymes. Other tasks concern segmentation of protein sequences into folded and unstructured regions. We provide candidates for rare functionality (e.g. prion proteins). Additionally, we show the anomaly score is useful in 3D folding-related segmentation. Our novel method shows improved performance over strong baselines and has objectively high performance across a variety of tasks. We conclude that the combination of pLM and anomaly detection techniques is a valid method for discovering a range of global and local protein characteristics.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
The first two authors should be regarded as Joint First Authors.
ISSN:2631-9268
2631-9268
DOI:10.1093/nargab/lqae021