Using Fully Convolutional Network to Locate Transcription Factor Binding Sites Based on DNA Sequence and Conservation Information

Transcription factors (TFs) play a part in gene expression. TFs can form complex gene expression regulation system by combining with DNA. Thereby, identifying the binding regions has become an indispensable step for understanding the regulatory mechanism of gene expression. Due to the great achievem...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM transactions on computational biology and bioinformatics Vol. 20; no. 5; pp. 2690 - 2699
Main Authors Zhang, Qinhu, Xu, Youhong, Wang, Siguo, Wu, Yong, Ye, Yuannong, Yuan, Chang-An, Gribova, Valeriya, Filaretov, Vladimir Fedorovich, Huang, De-Shuang
Format Journal Article
LanguageEnglish
Published United States IEEE 01.09.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Transcription factors (TFs) play a part in gene expression. TFs can form complex gene expression regulation system by combining with DNA. Thereby, identifying the binding regions has become an indispensable step for understanding the regulatory mechanism of gene expression. Due to the great achievements of applying deep learning (DL) to computer vision and language processing in recent years, many scholars are inspired to use these methods to predict TF binding sites (TFBSs), achieving extraordinary results. However, these methods mainly focus on whether DNA sequences include TFBSs. In this paper, we propose a fully convolutional network (FCN) coupled with refinement residual block (RRB) and global average pooling layer (GAPL), namely FCNARRB. Our model could classify binding sequences at nucleotide level by outputting dense label for input data. Experimental results on human ChIP-seq datasets show that the RRB and GAPL structures are very useful for improving model performance. Adding GAPL improves the performance by 9.32% and 7.61% in terms of IoU (Intersection of Union) and PRAUC (Area Under Curve of Precision and Recall), and adding RRB improves the performance by 7.40% and 4.64%, respectively. In addition, we find that conservation information can help locate TFBSs.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1545-5963
1557-9964
DOI:10.1109/TCBB.2022.3219831