Construct a variable-length fragment library for de novo protein structure prediction

Abstract Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-l...

Full description

Saved in:

Bibliographic Details
Published in	Briefings in bioinformatics Vol. 23; no. 3
Main Authors	Feng, Qiongqiong, Hou, Minghua, Liu, Jun, Zhao, Kailong, Zhang, Guijun
Format	Journal Article
Language	English
Published	England Oxford University Press 13.05.2022 Oxford Publishing Limited (England)
Subjects	Amino acid sequence Clustering Cutting Fragments Libraries Markov chains Predictions Protein folding Protein structure Proteins Queries Secondary structure secondary structure de novo protein structure prediction fragment library hidden Markov model
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Abstract Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-length fragment library (VFlib). In VFlib, a master structure database was first constructed from the Protein Data Bank through sequence clustering. The hidden Markov model (HMM) profile of each protein in the master structure database was generated by HHsuite, and the secondary structure of each protein was calculated by DSSP. For the query sequence, the HMM-profile was first constructed. Then, variable-length fragments were retrieved from the master structure database through dynamically variable-length profile–profile comparison. A complete method for chopping the query HMM-profile during this process was proposed to obtain fragments with increased diversity. Finally, secondary structure information was used to further screen the retrieved fragments to generate the final fragment library of specific query sequence. The experimental results obtained with a set of 120 nonredundant proteins show that the global precision and coverage of the fragment library generated by VFlib were 55.04% and 94.95% at the RMSD cutoff of 1.5 Å, respectively. Compared with the benchmark method of NNMake, the global precision of our fragment library had increased by 62.89% with equivalent coverage. Furthermore, the fragments generated by VFlib and NNMake were used to predict structure models through fragment assembly. Controlled experimental results demonstrate that the average TM-score of VFlib was 16.00% higher than that of NNMake.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1467-5463 1477-4054
DOI:	10.1093/bib/bbac086