Single Nucleotide Polymorphism (SNP)-Strings: An Alternative Method for Assessing Genetic Associations

Genome-wide association studies (GWAS) identify disease-associations for single-nucleotide-polymorphisms (SNPs) from scattered genomic-locations. However, SNPs frequently reside on several different SNP-haplotypes, only some of which may be disease-associated. This circumstance lowers the observed o...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 9; no. 4; p. e90034
Main Authors Goodin, Douglas S., Khankhanian, Pouya
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 01.04.2014
Public Library of Science (PLoS)
Subjects
Online AccessGet full text
ISSN1932-6203
1932-6203
DOI10.1371/journal.pone.0090034

Cover

More Information
Summary:Genome-wide association studies (GWAS) identify disease-associations for single-nucleotide-polymorphisms (SNPs) from scattered genomic-locations. However, SNPs frequently reside on several different SNP-haplotypes, only some of which may be disease-associated. This circumstance lowers the observed odds-ratio for disease-association. Here we develop a method to identify the two SNP-haplotypes, which combine to produce each person's SNP-genotype over specified chromosomal segments. Two multiple sclerosis (MS)-associated genetic regions were modeled; DRB1 (a Class II molecule of the major histocompatibility complex) and MMEL1 (an endopeptidase that degrades both neuropeptides and β-amyloid). For each locus, we considered sets of eleven adjacent SNPs, surrounding the putative disease-associated gene and spanning ∼200 kb of DNA. The SNP-information was converted into an ordered-set of eleven-numbers (subject-vectors) based on whether a person had zero, one, or two copies of particular SNP-variant at each sequential SNP-location. SNP-strings were defined as those ordered-combinations of eleven-numbers (0 or 1), representing a haplotype, two of which combined to form the observed subject-vector. Subject-vectors were resolved using probabilistic methods. In both regions, only a small number of SNP-strings were present. We compared our method to the SHAPEIT-2 phasing-algorithm. When the SNP-information spanning 200 kb was used, SHAPEIT-2 was inaccurate. When the SHAPEIT-2 window was increased to 2,000 kb, the concordance between the two methods, in both of these eleven-SNP regions, was over 99%, suggesting that, in these regions, both methods were quite accurate. Nevertheless, correspondence was not uniformly high over the entire DNA-span but, rather, was characterized by alternating peaks and valleys of concordance. Moreover, in the valleys of poor-correspondence, SHAPEIT-2 was also inconsistent with itself, suggesting that the SNP-string method is more accurate across the entire region. Accurate haplotype identification will enhance the detection of genetic-associations. The SNP-string method provides a simple means to accomplish this and can be extended to cover larger genomic regions, thereby improving a GWAS's power, even for those published previously.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Conceived and designed the experiments: DSG. Performed the experiments: DSG. Analyzed the data: DSG PK. Wrote the paper: DSG PK.
Competing Interests: The authors have declared that no competing interests exist.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0090034