GBScleanR: Robust genotyping error correction using hidden Markov model with error pattern recognition
The developments in sequencing technology have enabled researchers to acquire genotype data from large populations with dense markers. Recently developed methods that are based on reduced representation sequencing (RRS), such as Genotyping By Sequencing (GBS), provide cost-effective and time-saving...
Saved in:
Published in | bioRxiv |
---|---|
Main Authors | , , |
Format | Paper |
Language | English |
Published |
Cold Spring Harbor
Cold Spring Harbor Laboratory Press
22.03.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The developments in sequencing technology have enabled researchers to acquire genotype data from large populations with dense markers. Recently developed methods that are based on reduced representation sequencing (RRS), such as Genotyping By Sequencing (GBS), provide cost-effective and time-saving genotyping platforms; however, many drawbacks that are associated with these technologies, such as missing and false homozygous calls at heterozygous sites, significantly affect the accuracy. Several error correction methods that incorporate allele read counts in a hidden Markov model (HMM) have been developed to overcome these issues. Those methods assume that markers have a uniform error rate with no bias in the allele read ratio and infer a 50% chance of obtaining a read for either allele at a heterozygous site. However, bias does occur because of uneven amplification of genomic fragments and read mismapping. In this paper we introduce a novel error correction tool, GBScleanR, which enables robust and precise error correction for noisy RRS-based genotype data by incorporating marker-specific error rates into the HMM. The results indicate that GBScleanR improves the accuracy by 10-40 percentage points as compared to the existing tools in simulation datasets and achieves the most reliable genotype estimation in real data even with error prone markers. Competing Interest Statement The authors have declared no competing interest. Footnotes * Add a few lines in the Result section for the algorithm evaluation using real data. |
---|---|
DOI: | 10.1101/2022.03.18.484886 |