Large-Scale Structure-Based Prediction and Identification of Novel Protease Substrates Using Computational Protein Design

Characterizing the substrate specificity of protease enzymes is critical for illuminating the molecular basis of their diverse and complex roles in a wide array of biological processes. Rapid and accurate prediction of their extended substrate specificity would also aid in the design of custom prote...

Full description

Saved in:

Bibliographic Details
Published in	Journal of molecular biology Vol. 429; no. 2; pp. 220 - 236
Main Authors	Pethe, Manasi A., Rubenstein, Aliza B., Khare, Sagar D.
Format	Journal Article
Language	English
Published	England Elsevier Ltd 20.01.2017
Subjects	Algorithms Amino Acid Sequence Catalytic Domain Computational Biology computational modeling Computer Simulation Endopeptidases - chemistry proteases Proteins - chemistry Reproducibility of Results Rosetta software specificity prediction Substrate Specificity Rosetta software TEV YESS ROC proteases SVM HIVPR AUC TI substrate specificity HCV specificity prediction computational modeling
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Characterizing the substrate specificity of protease enzymes is critical for illuminating the molecular basis of their diverse and complex roles in a wide array of biological processes. Rapid and accurate prediction of their extended substrate specificity would also aid in the design of custom proteases capable of selectively and controllably cleaving biotechnologically or therapeutically relevant targets. However, current in silico approaches for protease specificity prediction, rely on, and are therefore limited by, machine learning of sequence patterns in known experimental data. Here, we describe a general approach for predicting peptidase substrates de novo using protein structure modeling and biophysical evaluation of enzyme–substrate complexes. We construct atomic resolution models of thousands of candidate substrate–enzyme complexes for each of five model proteases belonging to the four major protease mechanistic classes—serine, cysteine, aspartyl, and metallo-proteases—and develop a discriminatory scoring function using enzyme design modules from Rosetta and AMBER's MMPBSA. We rank putative substrates based on calculated interaction energy with a modeled near-attack conformation of the enzyme active site. We show that the energetic patterns obtained from these simulations can be used to robustly rank and classify known cleaved and uncleaved peptides and that these structural-energetic patterns have greater discriminatory power compared to purely sequence-based statistical inference. Combining sequence and energetic patterns using machine-learning algorithms further improves classification performance, and analysis of structural models provides physical insight into the structural basis for the observed specificities. We further tested the predictive capability of the model by designing and experimentally characterizing the cleavage of four novel substrate motifs for the hepatitis C virus NS3/4 protease using an in vivo assay. The presented structure-based approach is generalizable to other protease enzymes with known or modeled structures, and complements existing experimental methods for specificity determination. [Display omitted] •Develop a general, structure-based approach for predicting protease substrate specificity using Rosetta and AMBER MMPBSA.•Recapitulate known protease specificity profiles with accuracy comparable to sequence-only methods.•Combining sequence and structure energy features using machine learning helps increase discrimination performance.•Validated approach experimentally in yeast cells.•Discovered novel sequence specificities for HCV NS3 4A protease using our computational approach.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0022-2836 1089-8638
DOI:	10.1016/j.jmb.2016.11.031