GenoPipe: identifying the genotype of origin within (epi)genomic datasets

Abstract Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics a...

Full description

Saved in:
Bibliographic Details
Published inNucleic acids research Vol. 51; no. 22; pp. 12054 - 12068
Main Authors Lang, Olivia W, Srivastava, Divyanshi, Pugh, B Franklin, Lai, William K M
Format Journal Article
LanguageEnglish
Published England Oxford University Press 11.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics assay (i.e. cell line contamination, reagent swapping, tube mislabelling, etc.) and are often difficult to identify post-execution. However, the DNA sequenced in genomic experiments contains certain markers (e.g. indels) encoded within and can often be ascertained forensically from experimental datasets. We developed the Genotype validation Pipeline (GenoPipe), a suite of heuristic tools that operate together directly on raw and aligned sequencing data from individual high-throughput sequencing experiments to characterize the underlying genome of the source material. We demonstrate how GenoPipe validates and rescues erroneously annotated experiments by identifying unique markers inherent to an organism's genome (i.e. epitope insertions, gene deletions and SNPs). Graphical Abstract Graphical Abstract
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0305-1048
1362-4962
DOI:10.1093/nar/gkad950