Neural network input representations that produce accurate consensus sequences from DNA fragment assemblies

Given inputs extracted from an aligned column of DNA bases and the underlying Perkin Elmer Applied Biosystems (ABI) fluorescent traces, our goal is to train a neural network to determine correctly the consensus base for the column. Choosing an appropriate network input representation is critical to...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 15; no. 9; pp. 723 - 728
Main Authors	ALLEX, C. F, SHAVLIK, J. W, BLATTNER, F. R
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 01.09.1999 Oxford Publishing Limited (England)
Subjects	Base Sequence Biological and medical sciences Consensus Sequence DNA Fragmentation DNA, Bacterial - analysis Escherichia coli - genetics Fundamental and applied biological sciences. Psychology General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Molecular Sequence Data Neural Networks (Computer) Sequence Analysis, DNA - methods Characterization Consensus sequence DNA Computerized processing Sequence alignment Method Neural network Comparative study
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Given inputs extracted from an aligned column of DNA bases and the underlying Perkin Elmer Applied Biosystems (ABI) fluorescent traces, our goal is to train a neural network to determine correctly the consensus base for the column. Choosing an appropriate network input representation is critical to success in this task. We empirically compare five representations; one uses only base calls and the others include trace information. We attained the most accurate results from networks that incorporate trace information into their input representations. Based on estimates derived from using 10-fold cross-validation, the best network topology produces consensus accuracies ranging from 99.26% to >99.98% for coverages from two to six aligned sequences. With a coverage of six, it makes only three errors in 20 000 consensus calls. In contrast, the network that only uses base calls in its input representation has over double that error rate: eight errors in 20 000 consensus calls. allex@cs.wisc.edu
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1367-4803 1367-4811 1460-2059
DOI:	10.1093/bioinformatics/15.9.723