GenoPipe: identifying the genotype of origin within (epi)genomic datasets

Abstract Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics a...

Full description

Saved in:
Bibliographic Details
Published inNucleic acids research Vol. 51; no. 22; pp. 12054 - 12068
Main Authors Lang, Olivia W, Srivastava, Divyanshi, Pugh, B Franklin, Lai, William K M
Format Journal Article
LanguageEnglish
Published England Oxford University Press 11.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Abstract Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics assay (i.e. cell line contamination, reagent swapping, tube mislabelling, etc.) and are often difficult to identify post-execution. However, the DNA sequenced in genomic experiments contains certain markers (e.g. indels) encoded within and can often be ascertained forensically from experimental datasets. We developed the Genotype validation Pipeline (GenoPipe), a suite of heuristic tools that operate together directly on raw and aligned sequencing data from individual high-throughput sequencing experiments to characterize the underlying genome of the source material. We demonstrate how GenoPipe validates and rescues erroneously annotated experiments by identifying unique markers inherent to an organism's genome (i.e. epitope insertions, gene deletions and SNPs). Graphical Abstract Graphical Abstract
AbstractList Abstract Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics assay (i.e. cell line contamination, reagent swapping, tube mislabelling, etc.) and are often difficult to identify post-execution. However, the DNA sequenced in genomic experiments contains certain markers (e.g. indels) encoded within and can often be ascertained forensically from experimental datasets. We developed the Genotype validation Pipeline (GenoPipe), a suite of heuristic tools that operate together directly on raw and aligned sequencing data from individual high-throughput sequencing experiments to characterize the underlying genome of the source material. We demonstrate how GenoPipe validates and rescues erroneously annotated experiments by identifying unique markers inherent to an organism's genome (i.e. epitope insertions, gene deletions and SNPs). Graphical Abstract Graphical Abstract
Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics assay (i.e. cell line contamination, reagent swapping, tube mislabelling, etc.) and are often difficult to identify post-execution. However, the DNA sequenced in genomic experiments contains certain markers (e.g. indels) encoded within and can often be ascertained forensically from experimental datasets. We developed the Genotype validation Pipeline (GenoPipe), a suite of heuristic tools that operate together directly on raw and aligned sequencing data from individual high-throughput sequencing experiments to characterize the underlying genome of the source material. We demonstrate how GenoPipe validates and rescues erroneously annotated experiments by identifying unique markers inherent to an organism's genome (i.e. epitope insertions, gene deletions and SNPs).
Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics assay (i.e. cell line contamination, reagent swapping, tube mislabelling, etc.) and are often difficult to identify post-execution. However, the DNA sequenced in genomic experiments contains certain markers (e.g. indels) encoded within and can often be ascertained forensically from experimental datasets. We developed the Genotype validation Pipeline (GenoPipe), a suite of heuristic tools that operate together directly on raw and aligned sequencing data from individual high-throughput sequencing experiments to characterize the underlying genome of the source material. We demonstrate how GenoPipe validates and rescues erroneously annotated experiments by identifying unique markers inherent to an organism's genome (i.e. epitope insertions, gene deletions and SNPs).
Abstract Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics assay (i.e. cell line contamination, reagent swapping, tube mislabelling, etc.) and are often difficult to identify post-execution. However, the DNA sequenced in genomic experiments contains certain markers (e.g. indels) encoded within and can often be ascertained forensically from experimental datasets. We developed the Genotype validation Pipeline (GenoPipe), a suite of heuristic tools that operate together directly on raw and aligned sequencing data from individual high-throughput sequencing experiments to characterize the underlying genome of the source material. We demonstrate how GenoPipe validates and rescues erroneously annotated experiments by identifying unique markers inherent to an organism's genome (i.e. epitope insertions, gene deletions and SNPs).
Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics assay (i.e. cell line contamination, reagent swapping, tube mislabelling, etc.) and are often difficult to identify post-execution. However, the DNA sequenced in genomic experiments contains certain markers (e.g. indels) encoded within and can often be ascertained forensically from experimental datasets. We developed the Genotype validation Pipeline (GenoPipe), a suite of heuristic tools that operate together directly on raw and aligned sequencing data from individual high-throughput sequencing experiments to characterize the underlying genome of the source material. We demonstrate how GenoPipe validates and rescues erroneously annotated experiments by identifying unique markers inherent to an organism's genome (i.e. epitope insertions, gene deletions and SNPs). Graphical Abstract
Author Pugh, B Franklin
Srivastava, Divyanshi
Lang, Olivia W
Lai, William K M
Author_xml – sequence: 1
  givenname: Olivia W
  surname: Lang
  fullname: Lang, Olivia W
– sequence: 2
  givenname: Divyanshi
  surname: Srivastava
  fullname: Srivastava, Divyanshi
– sequence: 3
  givenname: B Franklin
  orcidid: 0000-0001-8341-4476
  surname: Pugh
  fullname: Pugh, B Franklin
– sequence: 4
  givenname: William K M
  orcidid: 0000-0003-4351-7037
  surname: Lai
  fullname: Lai, William K M
  email: wkl29@cornell.edu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/37933851$$D View this record in MEDLINE/PubMed
BookMark eNp9kMFLwzAUxoMouqkn79KTTKQuL0mbxouI6BQGetBziO1bF92S2nTK_nsjm0Mvnr7D9-P3Hl-fbDvvkJAjoOdAFR860w7rN1OpjG6RHvCcpULlbJv0KKdZClQUe6QfwiulICATu2SPS8V5kUGP3I_Q-Ufb4EViK3SdnSytq5Nuikkdm27ZYOIniW9tbV3yabtpjAE29vS7ntsyqUxnAnbhgOxMzCzg4Tr3yfPtzdP1XTp-GN1fX43Tkkvo0kIWLwXn1DAj0VBZVchZWSFgxZUCVDyXkCse66ykJsullIJzYBljUFDk--Ry5W0WL3Osyvh0a2a6ae3ctEvtjdV_G2enuvYfGqgEEEJFw2BtaP37AkOn5zaUOJsZh34RNCuKXAkmACJ6tkLL1ofQ4mRzB6j-Xl_H9fV6_Ugf_35tw_7MHYGTFeAXzb-mL2nnkCQ
Cites_doi 10.1038/ng.806
10.1073/pnas.93.3.1156
10.1038/s41586-021-03314-8
10.1101/125724
10.1101/cshperspect.a006890
10.1038/nrc775
10.1038/nature11247
10.1093/database/baw074
10.1016/j.molcel.2015.05.004
10.1186/s12915-020-0748-z
10.1126/science.6451928
10.1093/bioinformatics/btq033
10.3389/fgene.2014.00111
10.1002/cpmb.104
10.15252/embj.201695621
10.1093/nar/gkz1062
10.1534/genetics.114.161620
10.1038/nrg2626
10.1038/nmeth.1334
10.1093/bioinformatics/btp352
10.1371/journal.pone.0186281
10.1038/s41746-019-0079-z
10.1093/nargab/lqaa060
10.1371/journal.pone.0171435
10.1016/j.molcel.2008.07.020
10.2144/000112598
10.1371/journal.pone.0116218
10.1038/s41586-020-2649-2
10.1038/nmeth.1923
10.1093/nar/gky594
10.1126/science.1232033
10.15252/embr.201744876
10.1126/science.1231143
10.1016/j.celrep.2013.08.016
10.1002/cpmb.59
10.1007/s10565-007-9019-9
10.1016/j.celrep.2017.01.022
10.1126/science.285.5429.901
10.1186/s12859-018-2512-8
10.1038/nbt1008-1113
10.1038/nrc2852
10.1093/bioinformatics/btp373
10.1101/gr.136184.111
10.1534/genetics.110.120717
10.1038/nature00935
10.1038/nature02046
10.1038/s41586-019-1549-9
10.1186/s12864-018-4703-0
10.1371/journal.pbio.1002476
10.1038/s41592-019-0686-2
10.1534/genetics.107.076216
ContentType Journal Article
Copyright The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research. 2023
The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
Copyright_xml – notice: The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research. 2023
– notice: The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
DBID TOX
NPM
AAYXX
CITATION
7X8
5PM
DOI 10.1093/nar/gkad950
DatabaseName OUP_牛津大学出版社OA刊
PubMed
CrossRef
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle PubMed
CrossRef
MEDLINE - Academic
DatabaseTitleList
PubMed
MEDLINE - Academic
CrossRef

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: TOX
  name: OUP_牛津大学出版社OA刊
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Anatomy & Physiology
Chemistry
EISSN 1362-4962
EndPage 12068
ExternalDocumentID 10_1093_nar_gkad950
37933851
10.1093/nar/gkad950
Genre Journal Article
GrantInformation_xml – fundername: NIEHS NIH HHS
  grantid: R01 ES034353
– fundername: NIH HHS
  grantid: R01ES034353
– fundername: ;
  grantid: BIO220026
– fundername: ;
  grantid: R01ES034353
GroupedDBID ---
-DZ
-~X
.55
.GJ
.I3
0R~
123
18M
1TH
29N
2WC
3O-
4.4
482
53G
5VS
5WA
6.Y
70E
85S
A8Z
AAFWJ
AAHBH
AAMVS
AAOGV
AAPPN
AAPXW
AAUQX
AAVAP
AAWDT
AAYJJ
ABPTD
ABQLI
ABQTQ
ABSAR
ABSMQ
ABXVV
ACFRR
ACGFO
ACGFS
ACIPB
ACIWK
ACMRT
ACNCT
ACPQN
ACPRK
ACUTJ
ACZBC
ADBBV
ADHZD
AEGXH
AEKPW
AENEX
AENZO
AFFNX
AFPKN
AFRAH
AFSHK
AFULF
AFYAG
AGKRT
AGMDO
AHMBA
AIAGR
ALMA_UNASSIGNED_HOLDINGS
ALUQC
ANFBD
AOIJS
AQDSO
ASAOO
ASPBG
ATDFG
ATTQO
AVWKF
AZFZN
BAWUL
BAYMD
BCNDV
BEYMZ
BTTYL
C1A
CAG
CIDKT
COF
CS3
CXTWN
CZ4
D0S
DFGAJ
DIK
DU5
D~K
E3Z
EBD
EBS
EJD
ELUNK
EMOBN
ESTFP
F20
F5P
FEDTE
GROUPED_DOAJ
GX1
H13
HH5
HVGLF
HYE
HZ~
H~9
IH2
KAQDR
KC5
KQ8
KSI
M49
MBTAY
MVM
M~E
NTWIH
NU-
OAWHX
OBC
OBS
OEB
OES
OJQWA
OVD
O~Y
P2P
PB-
PEELM
PQQKQ
QBD
R44
RD5
RNI
RNS
ROL
ROX
ROZ
RPM
RXO
RZF
RZO
SJN
SV3
TCN
TEORI
TN5
TOX
TR2
UHB
WG7
WOQ
X7H
X7M
XSB
XSW
YSK
ZKX
ZXP
~91
~D7
~KM
ABEJV
NPM
AAYXX
CITATION
7X8
5PM
ID FETCH-LOGICAL-c371t-878b8330a2a7ea07dde32cde1ed3991e936716932a75c0a5677743312522180e3
IEDL.DBID RPM
ISSN 0305-1048
IngestDate Tue Sep 17 21:29:20 EDT 2024
Fri Oct 25 04:57:23 EDT 2024
Fri Aug 23 03:39:34 EDT 2024
Sat Nov 02 12:25:54 EDT 2024
Wed Aug 28 03:16:11 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 22
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c371t-878b8330a2a7ea07dde32cde1ed3991e936716932a75c0a5677743312522180e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-8341-4476
0000-0003-4351-7037
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10711449/
PMID 37933851
PQID 2886942411
PQPubID 23479
PageCount 15
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_10711449
proquest_miscellaneous_2886942411
crossref_primary_10_1093_nar_gkad950
pubmed_primary_37933851
oup_primary_10_1093_nar_gkad950
PublicationCentury 2000
PublicationDate 2023-12-11
PublicationDateYYYYMMDD 2023-12-11
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-12-11
  day: 11
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Nucleic acids research
PublicationTitleAlternate Nucleic Acids Res
PublicationYear 2023
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References 36993164 - bioRxiv. 2023 Mar 15
Hunter (2023121116522245600_B5) 2017; 18
Stupple (2023121116522245600_B6) 2019; 2
Hughes (2023121116522245600_B11) 2007; 43
Reuter (2023121116522245600_B3) 2015; 58
Shetty (2023121116522245600_B31) 2019; 128
Trivedi (2023121116522245600_B15) 2014; 5
Masters (2023121116522245600_B9) 2002; 2
de Jonge (2023121116522245600_B53) 2017; 36
Mohammad (2023121116522245600_B39) 2019; 20
Slatko (2023121116522245600_B4) 2018; 122
Nardone (2023121116522245600_B10) 2007; 23
Song (2023121116522245600_B48) 2016; 2016
National Institutes of Health (2023121116522245600_B17) 2007
Landt (2023121116522245600_B14) 2012; 22
Kim (2023121116522245600_B24) 1996; 93
Sinha (2023121116522245600_B55) 2017
Didion (2023121116522245600_B20) 2014; 15
Langmead (2023121116522245600_B43) 2012; 9
Horbach (2023121116522245600_B13) 2017; 12
Kircher (2023121116522245600_B8) 2011; 12
Mali (2023121116522245600_B26) 2013; 339
Snapp (2023121116522245600_B29) 2005; 21
Li (2023121116522245600_B47) 2009; 25
Dirks (2023121116522245600_B19) 2004; 88
Metzker (2023121116522245600_B2) 2010; 11
Chan (2023121116522245600_B37) 2018; 19
Winzeler (2023121116522245600_B33) 1999; 285
Schloss (2023121116522245600_B1) 2008; 26
Almeida (2023121116522245600_B22) 2016; 14
Endrullat (2023121116522245600_B16) 2016; 10
Rossi (2023121116522245600_B54) 2021; 592
Giaever (2023121116522245600_B58) 2014; 197
Li (2023121116522245600_B42) 2013
Costello (2023121116522245600_B59) 2018; 19
Haruki (2023121116522245600_B30) 2008; 31
Ghaemmaghami (2023121116522245600_B28) 2003; 425
Cong (2023121116522245600_B27) 2013; 339
Koboldt (2023121116522245600_B40) 2009; 25
Giaever (2023121116522245600_B57) 2002; 418
ENCODE Project Consortium (2023121116522245600_B32) 2012; 489
Quinlan (2023121116522245600_B44) 2010; 26
Craigie (2023121116522245600_B56) 2012; 2
American Type Culture Collection Standards Development Organization Workgroup, A.S.N. (2023121116522245600_B12) 2010; 10
Luo (2023121116522245600_B49) 2020; 48
Bosque (2023121116522245600_B50) 2017; 18
Liang-Chu (2023121116522245600_B21) 2015; 10
Chen (2023121116522245600_B23) 2020; 2
Cai (2023121116522245600_B52) 2013; 4
Ejsmont (2023121116522245600_B35) 2009; 6
Fasterius (2023121116522245600_B41) 2017; 12
Christian (2023121116522245600_B25) 2010; 186
DePristo (2023121116522245600_B38) 2011; 43
Goig (2023121116522245600_B7) 2020; 18
Ryder (2023121116522245600_B34) 2007; 177
Virtanen (2023121116522245600_B46) 2020; 17
Legrand (2023121116522245600_B36) 2018; 46
Nelson-Rees (2023121116522245600_B18) 1981; 212
Harris (2023121116522245600_B45) 2020; 585
Puddu (2023121116522245600_B51) 2019; 573
References_xml – volume: 43
  start-page: 491
  year: 2011
  ident: 2023121116522245600_B38
  article-title: A framework for variation discovery and genotyping using next-generation DNA sequencing data
  publication-title: Nat. Genet.
  doi: 10.1038/ng.806
  contributor:
    fullname: DePristo
– volume: 93
  start-page: 1156
  year: 1996
  ident: 2023121116522245600_B24
  article-title: Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain
  publication-title: Proc. Natl. Acad. Sci. U.S.A.
  doi: 10.1073/pnas.93.3.1156
  contributor:
    fullname: Kim
– volume: 592
  start-page: 309
  year: 2021
  ident: 2023121116522245600_B54
  article-title: A high-resolution protein architecture of the budding yeast genome
  publication-title: Nature
  doi: 10.1038/s41586-021-03314-8
  contributor:
    fullname: Rossi
– year: 2017
  ident: 2023121116522245600_B55
  article-title: Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing
  doi: 10.1101/125724
  contributor:
    fullname: Sinha
– year: 2013
  ident: 2023121116522245600_B42
  article-title: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
  contributor:
    fullname: Li
– volume: 2
  start-page: a006890
  year: 2012
  ident: 2023121116522245600_B56
  article-title: HIV DNA integration
  publication-title: Cold Spring Harb. Perspect. Med.
  doi: 10.1101/cshperspect.a006890
  contributor:
    fullname: Craigie
– volume: 10
  start-page: 2
  year: 2016
  ident: 2023121116522245600_B16
  article-title: Standardization and quality management in next-generation sequencing
  publication-title: Appl. Transl. Genom.
  contributor:
    fullname: Endrullat
– volume: 2
  start-page: 315
  year: 2002
  ident: 2023121116522245600_B9
  article-title: HeLa cells 50 years on: the good, the bad and the ugly
  publication-title: Nat. Rev. Cancer
  doi: 10.1038/nrc775
  contributor:
    fullname: Masters
– volume: 489
  start-page: 57
  year: 2012
  ident: 2023121116522245600_B32
  article-title: An integrated encyclopedia of DNA elements in the human genome
  publication-title: Nature
  doi: 10.1038/nature11247
  contributor:
    fullname: ENCODE Project Consortium
– volume: 2016
  start-page: baw074
  year: 2016
  ident: 2023121116522245600_B48
  article-title: Integration of new alternative reference strain genome sequences into the Saccharomyces genome database
  publication-title: Database (Oxford)
  doi: 10.1093/database/baw074
  contributor:
    fullname: Song
– volume: 58
  start-page: 586
  year: 2015
  ident: 2023121116522245600_B3
  article-title: High-throughput sequencing technologies
  publication-title: Mol. Cell
  doi: 10.1016/j.molcel.2015.05.004
  contributor:
    fullname: Reuter
– volume: 18
  start-page: 24
  year: 2020
  ident: 2023121116522245600_B7
  article-title: Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability
  publication-title: BMC Biol.
  doi: 10.1186/s12915-020-0748-z
  contributor:
    fullname: Goig
– volume: 12
  start-page: 382
  year: 2011
  ident: 2023121116522245600_B8
  article-title: Addressing challenges in the production and analysis of illumina sequencing data
  publication-title: Bmc Genomics [Electronic Resource]
  contributor:
    fullname: Kircher
– volume: 212
  start-page: 446
  year: 1981
  ident: 2023121116522245600_B18
  article-title: Cross-contamination of cells in culture
  publication-title: Science
  doi: 10.1126/science.6451928
  contributor:
    fullname: Nelson-Rees
– volume: 26
  start-page: 841
  year: 2010
  ident: 2023121116522245600_B44
  article-title: BEDTools: a flexible suite of utilities for comparing genomic features
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq033
  contributor:
    fullname: Quinlan
– volume: 5
  start-page: 111
  year: 2014
  ident: 2023121116522245600_B15
  article-title: Quality control of next-generation sequencing data without a reference
  publication-title: Front. Genet.
  doi: 10.3389/fgene.2014.00111
  contributor:
    fullname: Trivedi
– volume: 128
  start-page: e104
  year: 2019
  ident: 2023121116522245600_B31
  article-title: Auxin-inducible degron system for depletion of proteins in Saccharomyces cerevisiae
  publication-title: Curr. Protoc. Mol. Biol.
  doi: 10.1002/cpmb.104
  contributor:
    fullname: Shetty
– volume: 36
  start-page: 274
  year: 2017
  ident: 2023121116522245600_B53
  article-title: Molecular mechanisms that distinguish TFIID housekeeping from regulatable SAGA promoters
  publication-title: EMBO J.
  doi: 10.15252/embj.201695621
  contributor:
    fullname: de Jonge
– volume: 48
  start-page: D882
  year: 2020
  ident: 2023121116522245600_B49
  article-title: New developments on the Encyclopedia of DNA Elements (ENCODE) data portal
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkz1062
  contributor:
    fullname: Luo
– volume: 197
  start-page: 451
  year: 2014
  ident: 2023121116522245600_B58
  article-title: The yeast deletion collection: a decade of functional genomics
  publication-title: Genetics
  doi: 10.1534/genetics.114.161620
  contributor:
    fullname: Giaever
– volume: 11
  start-page: 31
  year: 2010
  ident: 2023121116522245600_B2
  article-title: Sequencing technologies - the next generation
  publication-title: Nat. Rev. Genet.
  doi: 10.1038/nrg2626
  contributor:
    fullname: Metzker
– volume: 6
  start-page: 435
  year: 2009
  ident: 2023121116522245600_B35
  article-title: A toolkit for high-throughput, cross-species gene engineering in Drosophila
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.1334
  contributor:
    fullname: Ejsmont
– volume: 25
  start-page: 2078
  year: 2009
  ident: 2023121116522245600_B47
  article-title: The Sequence Alignment/Map format and SAMtools
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp352
  contributor:
    fullname: Li
– volume: 12
  start-page: e0186281
  year: 2017
  ident: 2023121116522245600_B13
  article-title: The ghosts of HeLa: how cell line misidentification contaminates the scientific literature
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0186281
  contributor:
    fullname: Horbach
– volume: 15
  start-page: 847
  year: 2014
  ident: 2023121116522245600_B20
  article-title: SNP array profiling of mouse cell lines identifies their strains of origin and reveals cross-contamination and widespread aneuploidy
  publication-title: Bmc Genomics [Electronic Resource]
  contributor:
    fullname: Didion
– volume: 21
  start-page: 21.4.1
  year: 2005
  ident: 2023121116522245600_B29
  article-title: Design and use of fluorescent fusion proteins in cell biology
  publication-title: Curr. Protoc. Cell Biol.
  contributor:
    fullname: Snapp
– volume: 2
  start-page: 2
  year: 2019
  ident: 2023121116522245600_B6
  article-title: The reproducibility crisis in the age of digital medicine
  publication-title: NPJ Digit. Med.
  doi: 10.1038/s41746-019-0079-z
  contributor:
    fullname: Stupple
– volume: 2
  start-page: lqaa060
  year: 2020
  ident: 2023121116522245600_B23
  article-title: Authentication, characterization and contamination detection of cell lines, xenografts and organoids by barcode deep NGS sequencing
  publication-title: NAR Genom Bioinform
  doi: 10.1093/nargab/lqaa060
  contributor:
    fullname: Chen
– volume: 12
  start-page: e0171435
  year: 2017
  ident: 2023121116522245600_B41
  article-title: A novel RNA sequencing data analysis method for cell line authentication
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0171435
  contributor:
    fullname: Fasterius
– volume: 31
  start-page: 925
  year: 2008
  ident: 2023121116522245600_B30
  article-title: The anchor-away technique: rapid, conditional establishment of yeast mutant phenotypes
  publication-title: Mol. Cell
  doi: 10.1016/j.molcel.2008.07.020
  contributor:
    fullname: Haruki
– volume: 20
  start-page: 81
  year: 2019
  ident: 2023121116522245600_B39
  article-title: CeL-ID: cell line identification using RNA-seq data
  publication-title: Bmc Genomics [Electronic Resource]
  contributor:
    fullname: Mohammad
– volume: 43
  start-page: 575
  year: 2007
  ident: 2023121116522245600_B11
  article-title: The costs of using unauthenticated, over-passaged cell lines: how much more data do we need?
  publication-title: BioTechniques
  doi: 10.2144/000112598
  contributor:
    fullname: Hughes
– volume: 10
  start-page: e0116218
  year: 2015
  ident: 2023121116522245600_B21
  article-title: Human biosample authentication using the high-throughput, cost-effective SNPtrace(TM) system
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0116218
  contributor:
    fullname: Liang-Chu
– volume: 585
  start-page: 357
  year: 2020
  ident: 2023121116522245600_B45
  article-title: Array programming with NumPy
  publication-title: Nature
  doi: 10.1038/s41586-020-2649-2
  contributor:
    fullname: Harris
– volume: 9
  start-page: 357
  year: 2012
  ident: 2023121116522245600_B43
  article-title: Fast gapped-read alignment with Bowtie 2
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.1923
  contributor:
    fullname: Langmead
– volume: 46
  start-page: 6935
  year: 2018
  ident: 2023121116522245600_B36
  article-title: Generating genomic platforms to study Candida albicans pathogenesis
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gky594
  contributor:
    fullname: Legrand
– volume: 339
  start-page: 823
  year: 2013
  ident: 2023121116522245600_B26
  article-title: RNA-guided human genome engineering via Cas9
  publication-title: Science
  doi: 10.1126/science.1232033
  contributor:
    fullname: Mali
– volume: 18
  start-page: 1493
  year: 2017
  ident: 2023121116522245600_B5
  article-title: The reproducibility “crisis”: reaction to replication crisis should not stifle innovation
  publication-title: EMBO Rep.
  doi: 10.15252/embr.201744876
  contributor:
    fullname: Hunter
– volume: 339
  start-page: 819
  year: 2013
  ident: 2023121116522245600_B27
  article-title: Multiplex genome engineering using CRISPR/Cas systems
  publication-title: Science
  doi: 10.1126/science.1231143
  contributor:
    fullname: Cong
– volume: 4
  start-page: 1063
  year: 2013
  ident: 2023121116522245600_B52
  article-title: Integration of multiple nutrient cues and regulation of lifespan by ribosomal transcription factor Ifh1
  publication-title: Cell Rep.
  doi: 10.1016/j.celrep.2013.08.016
  contributor:
    fullname: Cai
– volume: 122
  start-page: e59
  year: 2018
  ident: 2023121116522245600_B4
  article-title: Overview of next-generation sequencing technologies
  publication-title: Curr Protoc Mol Biol
  doi: 10.1002/cpmb.59
  contributor:
    fullname: Slatko
– volume: 23
  start-page: 367
  year: 2007
  ident: 2023121116522245600_B10
  article-title: Eradication of cross-contaminated cell lines: a call for action
  publication-title: Cell Biol. Toxicol.
  doi: 10.1007/s10565-007-9019-9
  contributor:
    fullname: Nardone
– volume: 18
  start-page: 1324
  year: 2017
  ident: 2023121116522245600_B50
  article-title: Benzotriazoles reactivate latent HIV-1 through inactivation of STAT5 SUMOylation
  publication-title: Cell Rep.
  doi: 10.1016/j.celrep.2017.01.022
  contributor:
    fullname: Bosque
– volume: 285
  start-page: 901
  year: 1999
  ident: 2023121116522245600_B33
  article-title: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis
  publication-title: Science
  doi: 10.1126/science.285.5429.901
  contributor:
    fullname: Winzeler
– volume: 19
  start-page: 478
  year: 2018
  ident: 2023121116522245600_B37
  article-title: A statistical framework for detecting mislabeled and contaminated samples using shallow-depth sequence data
  publication-title: BMC Bioinf.
  doi: 10.1186/s12859-018-2512-8
  contributor:
    fullname: Chan
– volume: 26
  start-page: 1113
  year: 2008
  ident: 2023121116522245600_B1
  article-title: How to get genomes at one ten-thousandth the cost
  publication-title: Nat. Biotechnol.
  doi: 10.1038/nbt1008-1113
  contributor:
    fullname: Schloss
– volume: 10
  start-page: 441
  year: 2010
  ident: 2023121116522245600_B12
  article-title: Cell line misidentification: the beginning of the end
  publication-title: Nat. Rev. Cancer
  doi: 10.1038/nrc2852
  contributor:
    fullname: American Type Culture Collection Standards Development Organization Workgroup, A.S.N.
– volume: 25
  start-page: 2283
  year: 2009
  ident: 2023121116522245600_B40
  article-title: VarScan: variant detection in massively parallel sequencing of individual and pooled samples
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp373
  contributor:
    fullname: Koboldt
– volume: 22
  start-page: 1813
  year: 2012
  ident: 2023121116522245600_B14
  article-title: ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
  publication-title: Genome Res.
  doi: 10.1101/gr.136184.111
  contributor:
    fullname: Landt
– volume: 186
  start-page: 757
  year: 2010
  ident: 2023121116522245600_B25
  article-title: Targeting DNA double-strand breaks with TAL effector nucleases
  publication-title: Genetics
  doi: 10.1534/genetics.110.120717
  contributor:
    fullname: Christian
– volume: 418
  start-page: 387
  year: 2002
  ident: 2023121116522245600_B57
  article-title: Functional profiling of the Saccharomyces cerevisiae genome
  publication-title: Nature
  doi: 10.1038/nature00935
  contributor:
    fullname: Giaever
– volume: 425
  start-page: 737
  year: 2003
  ident: 2023121116522245600_B28
  article-title: Global analysis of protein expression in yeast
  publication-title: Nature
  doi: 10.1038/nature02046
  contributor:
    fullname: Ghaemmaghami
– volume: 573
  start-page: 416
  year: 2019
  ident: 2023121116522245600_B51
  article-title: Genome architecture and stability in the Saccharomyces cerevisiae knockout collection
  publication-title: Nature
  doi: 10.1038/s41586-019-1549-9
  contributor:
    fullname: Puddu
– volume: 19
  start-page: 332
  year: 2018
  ident: 2023121116522245600_B59
  article-title: Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms
  publication-title: Bmc Genomics (Electronic Resource)
  doi: 10.1186/s12864-018-4703-0
  contributor:
    fullname: Costello
– volume: 14
  start-page: e1002476
  year: 2016
  ident: 2023121116522245600_B22
  article-title: Standards for cell line authentication and beyond
  publication-title: PLoS Biol.
  doi: 10.1371/journal.pbio.1002476
  contributor:
    fullname: Almeida
– volume: 17
  start-page: 261
  year: 2020
  ident: 2023121116522245600_B46
  article-title: SciPy 1.0: fundamental algorithms for scientific computing in Python
  publication-title: Nat. Methods
  doi: 10.1038/s41592-019-0686-2
  contributor:
    fullname: Virtanen
– year: 2007
  ident: 2023121116522245600_B17
  article-title: Notice Regarding Authentication of Cultured Cell Lines
  contributor:
    fullname: National Institutes of Health
– volume: 88
  start-page: 43
  year: 2004
  ident: 2023121116522245600_B19
  article-title: Authentication of cancer cell lines by DNA fingerprinting
  publication-title: Methods Mol. Med.
  contributor:
    fullname: Dirks
– volume: 177
  start-page: 615
  year: 2007
  ident: 2023121116522245600_B34
  article-title: The DrosDel deletion collection: a Drosophila genomewide chromosomal deficiency resource
  publication-title: Genetics
  doi: 10.1534/genetics.107.076216
  contributor:
    fullname: Ryder
SSID ssj0014154
Score 2.4823463
Snippet Abstract Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error...
Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has...
SourceID pubmedcentral
proquest
crossref
pubmed
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 12054
SubjectTerms Computational Biology
Title GenoPipe: identifying the genotype of origin within (epi)genomic datasets
URI https://www.ncbi.nlm.nih.gov/pubmed/37933851
https://search.proquest.com/docview/2886942411
https://pubmed.ncbi.nlm.nih.gov/PMC10711449
Volume 51
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8MwDLZgF7gg3oxnkBCCQ2nTtF3KDU08JR4HkHar0tRABcsmth349zhpixgHDkhVe0haVbYVf07szwAHhAgU-U30cmm3blCmnsy59mwDuCAq0iJ0bPu3d8nVU3TTi3szkDS1MC5pX-fliXnvn5jy1eVWDvvab_LE_IfbLoUsBOOj1J-FWbLQJkavzw7IJVWkUY5jM5J1VR6F7r5RH_7LmyrS2HaAE2SbQsZ8yiVNlbn9QJu_kyZ_eKGLRVio4SM7q35zCWbQLMPKmaHQuf_JDplL6HQ75csw122aua3A9SWawUM5xFNWutJcV97ECP0xy9JqN2LZ4JlVbbKY3ZylxxEOy2M73C81s6mkIxyPVuHp4vyxe-XVXRQ8LTp8TMudzKUQgQpVB1XQofVMhLpAjgWBE46pSCxjjqDhWAcqTjqECIUgBYbk_gMUa9AyA4MbwCz44YKrVJD0o2clMdC6SDAJ6EoTbMNBI8hsWJFlZNUht8hI9Fkt-jbskZD_nrHfKCAjQdkzDGVwMBlloZRJGhHs4G1YrxTy_aFGn22QU6r6nmCptKdHyMIcpXZjUZv_f3UL5m0repvqwvk2tMYfE9whwDLOd12gv-uslO6P970vpXnrWg
link.rule.ids 230,315,730,783,787,867,888,1607,27936,27937,53804,53806
linkProvider National Library of Medicine
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1RT9swED5B9wAvaFBgZWMYCSF4CInjJHX2VlVj7WgrHkDiLXKcK4u2uhUtD_v3OzsJonvgASlSHuxE0d3F9_l89x3AGSECRX4TvVza0A3K1JM5155tABdERVqEjm1_PEkG99HPh_hhA5KmFsYl7eu8vDJ_Zlem_OVyKxcz7Td5Yv7tuE9bFoLxUepvwgf6YYOo2aXXpwfklCraKMeyGcm6Lo82775RT_7jb1Wkse0BJ8g6hYz5mlNaK3R7hTf_T5t85YeuP8JODSBZr_rQXdhAswftnqHN8-wvO2cupdPFyvdgq9-0c2vD8Aea-W25wG-sdMW5rsCJEf5jlqfVhmLZfMqqRlnMhmfpdoGL8tIOz0rNbDLpElfLfbi__n7XH3h1HwVPiy5f0YIncylEoELVRRV0aUUToS6QY0HwhGMqEsuZI2g41oGKky5hQiFIhSEBgADFAbTM3OAnYBb-cMFVKkj-0VRJDLQuEkwCutIEO3DWCDJbVHQZWXXMLTISfVaLvgMnJOS3Z5w2CshIUPYUQxmcPy-zUMokjQh48A4cVgp5eVGjzw7INVW9TLBk2usjZGOOVLuxqaP3P3oCW4O78SgbDSc3n2HbNqa3iS-cf4HW6ukZjwm-rPKvzlb_ARya7J8
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwEB61ILVcKl6FbQtrJIToISSOk6zDbbV0eRW0B5C4RY4zC1G73ohdDvx7xk6CWA49VIqUg50ompl4Po-_mQHYJ0SgyG-il0sbukGZejLn2rMN4IKoSIvQVdu_uk7ObqOLu_iuYVXOGlql0Xl5ZP5Ojkz54LiV1UT7LU_MH10NaMtCMD5K_aoY-x9hmX7aIGl36s0JAjmmunSUq7QZySY3jzbwvlGP_v0fVaSx7QMnyEKFjPmCY1pIdnuDOd9TJ9_4ouEqfGlAJOvXH7sGH9Csw0bf0AZ68swOmKN1unj5OnwetC3dNuD8FM10VFZ4zEqXoOuSnBhhQGZrtdpwLJuOWd0si9kQLd0OsSp_2uFJqZkllM5wPtuE2-Gvm8GZ1_RS8LTo8TktejKXQgQqVD1UQY9WNRHqAjkWBFE4piKxdXMEDcc6UHHSI1woBKkxJBAQoPgKS2ZqcBuYhUBccJUK0kE0VhIDrYsEk4CuNMEO7LeCzKq6ZEZWH3WLjESfNaLvQJeE_O8Ze60CMhKUPclQBqdPsyyUMkkjAh-8A1u1Ql5f1OqzA3JBVa8TbEHtxRGyM1dYu7Wrb___aBc-jU6G2e_z68vvsGJ701vuC-c_YGn--IQ7hGDm-a4z1ReO6O2y
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GenoPipe%3A+identifying+the+genotype+of+origin+within+%28epi%29genomic+datasets&rft.jtitle=Nucleic+acids+research&rft.au=Lang%2C+Olivia+W&rft.au=Srivastava%2C+Divyanshi&rft.au=Pugh%2C+B+Franklin&rft.au=Lai%2C+William+K+M&rft.date=2023-12-11&rft.pub=Oxford+University+Press&rft.issn=0305-1048&rft.eissn=1362-4962&rft.volume=51&rft.issue=22&rft.spage=12054&rft.epage=12068&rft_id=info:doi/10.1093%2Fnar%2Fgkad950&rft_id=info%3Apmid%2F37933851&rft.externalDBID=PMC10711449
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0305-1048&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0305-1048&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0305-1048&client=summon