Segmenting the human genome based on states of neutral genetic divergence

Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the National Academy of Sciences - PNAS Vol. 110; no. 36; pp. 14699 - 14704
Main Authors Don, Prabhani Kuruppumullage, Ananda, Guruprasad, Chiaromonte, Francesca, Makova, Kateryna D.
Format Journal Article
LanguageEnglish
Published United States National Academy of Sciences 03.09.2013
National Acad Sciences
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human–orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states—each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants—including those associated with cancer and other diseases—and to improve computational predictions of noncoding functional elements.
Bibliography:http://dx.doi.org/10.1073/pnas.1221792110
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
Edited by Wen-Hsiung Li, The University of Chicago, Chicago, IL, and approved July 23, 2013 (received for review December 13, 2012)
Author contributions: P.K.D., G.A., F.C., and K.D.M. designed research; P.K.D. and G.A. performed research; P.K.D. and G.A. analyzed data; and P.K.D., G.A., F.C., and K.D.M. wrote the paper.
1P.K.D. and G.A. contributed equally to this work.
2Present address: Computational Sciences, The Jackson Laboratory, Bar Harbor, ME 04609.
ISSN:0027-8424
1091-6490
1091-6490
DOI:10.1073/pnas.1221792110