Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals

The completion of the HapMap project has stimulated further development of haplotype-based methodologies for disease associations. A key aspect of such development is the statistical inference of individual diplotypes from unphased genotypes. Several methodologies for inferring haplotypes have been...

Full description

Saved in:

Bibliographic Details
Published in	BMC genetics Vol. 8; no. 1; p. 2
Main Authors	Li, Shuying Sue, Cheng, Jacob Jen-Hao, Zhao, Lue Ping
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 29.01.2007 BioMed Central BMC
Subjects	African Continental Ancestry Group - genetics Bayes Theorem Chromosome Mapping Chromosomes, Human, X - genetics European Continental Ancestry Group - genetics Female Genome, Human Genotype Haplotypes Humans Male Models, Genetic Polymorphism, Single Nucleotide
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The completion of the HapMap project has stimulated further development of haplotype-based methodologies for disease associations. A key aspect of such development is the statistical inference of individual diplotypes from unphased genotypes. Several methodologies for inferring haplotypes have been developed, but they have not been evaluated extensively to determine which method not only performs well, but also can be easily incorporated in downstream haplotype-based association analyses. In this paper, we attempt to do so. Our evaluation was carried out by comparing the two leading Bayesian methods, implemented in PHASE and HAPLOTYPER, and the two leading empirical methods, implemented in PL-EM and HPlus. We used these methods to analyze real data, namely the dense genotypes on X-chromosome of 30 European and 30 African trios provided by the International HapMap Project, and simulated genotype data. Our conclusions are based on these analyses. All programs performed very well on X-chromosome data, with an average similarity index of 0.99 and an average prediction rate of 0.99 for both European and African trios. On simulated data with approximation of coalescence, PHASE implementing the Bayesian method based on the coalescence approximation outperformed other programs on small sample sizes. When the sample size increased, other programs performed as well as PHASE. PL-EM and HPlus implementing empirical methods required much less running time than the programs implementing the Bayesian methods. They required only one hundredth or thousandth of the running time required by PHASE, particularly when analyzing large sample sizes and large umber of SNPs. For large sample sizes (hundreds or more), which most association studies require, the two empirical methods might be used since they infer the haplotypes as accurately as any Bayesian methods and can be incorporated easily into downstream haplotype-based analyses such as haplotype-association analyses.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2156 1471-2156
DOI:	10.1186/1471-2156-8-2