High-throughput SNP genotyping of historical and modern samples of five bird species via sequence capture of ultraconserved elements

Sample availability limits population genetics research on many species, especially taxa from regions with high diversity. However, many such species are well represented in museum collections assembled before the molecular era. Development of techniques to recover genetic data from these invaluable...

Full description

Saved in:
Bibliographic Details
Published inMolecular ecology resources Vol. 16; no. 5; pp. 1204 - 1223
Main Authors Lim, Haw Chuan, Braun, Michael J.
Format Journal Article
LanguageEnglish
Published England Blackwell Publishing Ltd 01.09.2016
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Sample availability limits population genetics research on many species, especially taxa from regions with high diversity. However, many such species are well represented in museum collections assembled before the molecular era. Development of techniques to recover genetic data from these invaluable specimens will benefit biodiversity science. Using a mixture of freshly preserved and historical tissue samples, and a sequence capture probe set targeting >5000 loci, we produced high‐confidence genotype calls on thousands of single nucleotide polymorphisms (SNPs) in each of five South‐East Asian bird species and their close relatives (N = 27–43). On average, 66.2% of the reads mapped to the pseudo‐reference genome of each species. Of these mapped reads, an average of 52.7% was identified as PCR or optical duplicates. We achieved deeper effective sequencing for historical samples (122.7×) compared to modern samples (23.5×). The number of nucleotide sites with at least 8× sequencing depth was high, with averages ranging from 0.89 × 106 bp (Arachnothera, modern samples) to 1.98 × 106 bp (Stachyris, modern samples). Linear regression revealed that the amount of sequence data obtained from each historical sample (represented by per cent of the pseudo‐reference genome recovered with ≥8× sequencing depth) was positively and significantly (P ≤ 0.013) related to how recently the sample was collected. We observed characteristic post‐mortem damage in the DNA of historical samples. However, we were able to reduce the error rate significantly by truncating ends of reads during read mapping (local alignment) and conducting stringent SNP and genotype filtering.
Bibliography:Fig. S1. Agilent Bioanalyzer results showing fragment size of pooled enriched products to be sequenced. Products are separated into three pools, one containing enriched products of modern samples (modDNA), and two containing enriched products of historical samples (pool 7-40 and pool 23-39). Fig. S2. Number of cleaned read pairs per sample. Each sample is represented by one dot and x-axis indicates arbitrarily numbered enrichment pools. The last eight pools to the right contain libraries of modern samples while the remainders are pools containing libraries of historical samples. Sequencing output (zero or close to zero) of failed libraries are shown. Fig. S3. Bar charts of the number of cleaned read pairs per sample. Each bar chart shows one enrichment pool; numbering scheme follows that of Fig. S2. Sequencing output (zero or close to zero) of failed libraries are not shown. Fig. S4. Bowtie mapping rate of each sample. Each sample is represented by one dot and x-axis indicates arbitrarily numbered enrichment pools. Numbering scheme follows that of Fig. S2. Fig. S5. Read duplication rate of each sample. Each sample is represented by one dot and x-axis indicates arbitrarily numbered enrichment pools. Numbering scheme follows that of Fig. S2. Fig. S6. Number (averaged over samples) of sites (y-axis, in log10 scale) with various sequencing depth (x-axis). Bar charts are sorted according to species group and age of samples (modern vs. historical). Fig. S7. Count of UCE loci with various proportion of sites with at least 8× average sequencing depth. Frequency histograms are sorted by species groups (rows) and age of sample (modern - left column, historical - right column). A. longirostra group (A and B); I. puella group (C and D); N. grandis group (E and F); P. atriceps group (G and H); and S. nigriceps group (I and J). Fig. S8. Rate of C to T (left column) and G to A (right column) substitutions of five exemplar historical samples (one from each of the five study species). Rates are shown for each of the first 25 bases from the 5′ (left column) or 3′ end (right column). Fig. S9. Empirical values (line) and Bayesian estimates (filled circle, error bars = 95% posterior prediction intervals) of rates of various substitutions at the first 11 base positions from the start of the 5′ end (positive values on x-axis) and 3′ end (negative values on x-axis) of each read. Red = C to T substitution, green = G to A substitution, blue = other substitutions. Results from five exemplar historical samples are shown. Fig. S10. The top four panels show frequency of each of the four nucleotides within reads (demarcated by grey boxes, first 10 and last 10 positions are shown), and just upstream or downstream of reads (based on pseudo-reference genome). Each dot represents the average frequency for each UCE locus at each position, and solid lines show the 'genome-wide' values. The bottom left panel shows: 1) observed C to T substitution rate if soft-clipped bases are included (yellow line), and when soft-clipped bases are excluded (red line); 2) G to A substitution (blue line). The bottom right panel shows: 1) observed G to A substitution rate if soft-clipped bases are included (yellow line), and when soft-clipped bases are excluded (blue line); 2) C to T substitution (red line). Positive x-axis labels are base position from the 5′ end of each read (going downstream); negative x-axis labels are base position from the 3′ end of each read (going upstream). Fig. S11. Average number of called genotypes per sample for each of the five species groups, sorted according to age of samples (hist. = historical, mod. = modern). Labels for the five species groups are: A- Arachnothera; I - Irena; N - Niltava; P - Pycnonotus; S - Stachyris. Error bars show standard deviations.
Smithsonian Institution Molecular Evolution Fellowship
National Museum of Natural History Small Grants Program
ArticleID:MEN12568
American Ornithologists' Union Wetmore Research Award
ark:/67375/WNG-DSNPSFN9-4
istex:4F3DE83A55983D43FB32E503DDC0D45C5A318CF4
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1755-098X
1755-0998
DOI:10.1111/1755-0998.12568