Clustering DNA sequences by feature vectors

We represent all DNA sequences as points in twelve-dimensional space in such a way that homologous DNA sequences are clustered together, from which a new genomic space is created for global DNA sequences comparison of millions of genes simultaneously. More specifically, basing on the contents of fou...

Full description

Saved in:

Bibliographic Details
Published in	Molecular phylogenetics and evolution Vol. 41; no. 1; pp. 64 - 69
Main Authors	Liu, Libin, Ho, Yee-kin, Yau, Stephen
Format	Journal Article
Language	English
Published	United States Elsevier Inc 01.10.2006
Subjects	DNA DNA sequences Genetic Techniques Genomic space Global comparison of gene structures Globins - genetics Histones - genetics Models, Genetic Muramidase - genetics Myoglobin - genetics Rhodopsin - genetics Sequence Alignment - methods Software Vector distance Global comparison of gene structures Genomic space DNA sequences Vector distance
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We represent all DNA sequences as points in twelve-dimensional space in such a way that homologous DNA sequences are clustered together, from which a new genomic space is created for global DNA sequences comparison of millions of genes simultaneously. More specifically, basing on the contents of four nucleotides, their distances from the origin and their distribution along the sequences, a twelve-dimensional vector is given to any DNA sequence. The applicability of this analysis on global comparison of gene structures was tested on myoglobin, β-globin, histone-4, lysozyme, and rhodopsin families. Members from each family exhibit smaller vector distances relative to the distances of members from different families. The vector distance also distinguishes random sequences generated based on same bases composition. Sequence comparisons showed consistency with the BLAST method. Once the new gene is discovered, we can compute the location of this new gene in our genomic space. It is natural to predict that the properties of this new gene are similar to the properties of known genes that are locating near by. Biologists can do various experiments to test these properties.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1055-7903 1095-9513
DOI:	10.1016/j.ympev.2006.05.019