Correlation approach to identify coding regions in DNA sequences

Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA...

Full description

Saved in:

Bibliographic Details
Published in	Biophysical journal Vol. 67; no. 1; pp. 64 - 70
Main Authors	Ossadnik, S.M., Buldyrev, S.V., Goldberger, A.L., Havlin, S., Mantegna, R.N., Peng, C.K., Simons, M., Stanley, H.E.
Format	Journal Article
Language	English
Published	Legacy CDMS Elsevier Inc 01.07.1994
Subjects	Algorithms Base Sequence Chloroplasts - genetics Chromosomes, Fungal coding sequence finder algorithms correlation analysis DNA DNA - chemistry DNA - genetics DNA, Fungal - genetics False Positive Reactions Genetic Code Life Sciences (General) nucleotide sequences open reading frames Plants - genetics prediction Saccharomyces cerevisiae Space life sciences structural genes Non-Nasa Center Nasa Discipline Cardiopulmonary NASA Discipline Cardiopulmonary Non-NASA Center
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.
Bibliography:	CDMS Legacy CDMS ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0006-3495 1542-0086
DOI:	10.1016/S0006-3495(94)80455-2