Quantification of Representative Sequences pipeline for amplicon sequencing: case study on within‐population ITS1 sequence variation in a microparasite infecting Daphnia

Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioi...

Full description

Saved in:
Bibliographic Details
Published inMolecular ecology resources Vol. 15; no. 6; pp. 1385 - 1395
Main Authors González-Tortuero, E., Rusek, J., Petrusek, A., Gießler, S., Lyras, D., Grath, S., Castro-Monzón, F., Wolinska, J.
Format Journal Article
LanguageEnglish
Published England Blackwell Pub 01.11.2015
Blackwell Publishing Ltd
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within‐population variation. Additionally, a public Illumina data set was used to validate the pipeline on community‐level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova) revealed that population structure of C. mesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within‐population structure but also the successful application of the QRS pipeline on Illumina‐generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences.
Bibliography:http://dx.doi.org/10.1111/1755-0998.12396
ark:/67375/WNG-GWG44D3P-H
istex:C2F17A4EE700DFBB4FF69B39E44CB6CFE9D29970
DFG-SPP 1399 - No. WO 1587/2-2
DFG-SNF - No. WO 1587/3-1
ArticleID:MEN12396
European Science Foundation
Czech Science Foundation - No. EEF/10/E022
Data S1 Quantification of Representative Sequences (QRS). Manual. Data S2 Assessment of effects of the de-noising step in the QRS pipeline. Data S3 Validation of the QRS pipeline on an Illumina dataset. Table S1 Number of Caullerya mesnili ITS1 sequences generated in the 454 run as well as number of retrieved sequences in the 'filtered 454' and 'raw 454' datasets (provided separately per each of 16 analysed population samples). Table S2 Type and number of representative ITS1 sequence variants as assigned by statistical parsimony analysis. Table S3 Brief summary of all parameters of the QRS pipeline when it is executed in batch mode. Table S4 Comparison of the frequencies of representative ITS1 variants between the 'filtered 454' and 'Sanger' datasets after excluding the most abundant representative ITS1 sequence type (C2.14) across all samples (provided separately per each of 16 analysed population samples). Fig. S1 Comparison of the results generated by three pipelines (QRS, UPARSE and mothur) that were used for validation of the Illumina dataset.
German Science Foundation (DFG) - No. ME 3134/4-1
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1755-098X
1755-0998
DOI:10.1111/1755-0998.12396