Evaluating the performance of targeted sequence capture, RNA‐Seq, and degenerate‐primer PCR cloning for sequencing the largest mammalian multigene family

Multigene families evolve from single‐copy ancestral genes via duplication, and typically encode proteins critical to key biological processes. Molecular analyses of these gene families require high‐confidence sequences, but the high sequence similarity of the members can create challenges for seque...

Full description

Saved in:
Bibliographic Details
Published inMolecular ecology resources Vol. 20; no. 1; pp. 140 - 153
Main Authors Yohe, Laurel R., Davies, Kalina T. J., Simmons, Nancy B., Sears, Karen E., Dumont, Elizabeth R., Rossiter, Stephen J., Dávalos, Liliana M.
Format Journal Article
LanguageEnglish
Published England Wiley Subscription Services, Inc 01.01.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multigene families evolve from single‐copy ancestral genes via duplication, and typically encode proteins critical to key biological processes. Molecular analyses of these gene families require high‐confidence sequences, but the high sequence similarity of the members can create challenges for sequencing and downstream analyses. Focusing on the common vampire bat, Desmodus rotundus, we evaluated how different sequencing approaches performed in recovering the largest mammalian protein‐coding multigene family: olfactory receptors (OR). Using the genome as a reference, we determined the proportion of intact protein‐coding receptors recovered by: (a) amplicons from degenerate primers sequenced via Sanger technology, (b) RNA‐Seq of the main olfactory epithelium, and (c) those genes captured with probes designed from transcriptomes of closely‐related species. Our initial re‐annotation of the high‐quality vampire bat genome resulted in >400 intact OR genes, more than doubling the original estimate. Sanger‐sequenced amplicons performed the poorest among the three approaches, detecting <33% of receptors in the genome. In contrast, the transcriptome reliably recovered >50% of the annotated genomic ORs, and targeted sequence capture recovered nearly 75% of annotated genes. Each sequencing approach assembled high‐quality sequences, even if it did not recover all receptors in the genome. While some variation may be due to limitations of the study design (e.g., different individuals), variation among approaches was mostly caused by low coverage of some receptors rather than high rates of assembly error. Given this variability, we caution against using the counts of intact receptors per species to model the birth‐death process of multigene families. Instead, our results support the use of orthologous sequences to explore and model the evolutionary processes shaping these genes.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Undefined-1
ObjectType-Feature-3
content type line 23
ISSN:1755-098X
1755-0998
DOI:10.1111/1755-0998.13093