pDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine
•A benchmark dataset with DHSs sequences of two model plant was built.•An SVM based prediction method with an accuracy up to 87% was proposed.•The global sequence-order information and local DNA properties was integrated. DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersens...
Saved in:
Published in | Journal of theoretical biology Vol. 426; pp. 126 - 133 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
England
Elsevier Ltd
07.08.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •A benchmark dataset with DHSs sequences of two model plant was built.•An SVM based prediction method with an accuracy up to 87% was proposed.•The global sequence-order information and local DNA properties was integrated.
DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0022-5193 1095-8541 |
DOI: | 10.1016/j.jtbi.2017.05.030 |