Top-Down Crawl: a method for the ultra-rapid and motif-free alignment of sequences with associated binding metrics
Several high-throughput protein-DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the k-mer. However, understanding where a k-mer is positioned along a binding site sequence depends on alignment. Here, we present Top-Down Crawl (TDC)...
Saved in:
Published in | Bioinformatics (Oxford, England) Vol. 38; no. 22; pp. 5121 - 5123 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
15.11.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Several high-throughput protein-DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the k-mer. However, understanding where a k-mer is positioned along a binding site sequence depends on alignment. Here, we present Top-Down Crawl (TDC), an ultra-rapid tool designed for the alignment of k-mer level data in a rank-dependent and position weight matrix (PWM)-independent manner. As the framework only depends on the rank of the input, the method can accept input from many types of experiments (protein binding microarray, SELEX-seq, SMiLE-seq, etc.) without the need for specialized parameterization. Measuring the performance of the alignment using multiple linear regression with 5-fold cross-validation, we find TDC to perform as well as or better than computationally expensive PWM-based methods.
TDC can be run online at https://topdowncrawl.usc.edu or locally as a python package available through pip at https://pypi.org/project/TopDownCrawl.
Supplementary data are available at Bioinformatics online. |
---|---|
AbstractList | Abstract
Summary
Several high-throughput protein–DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the k-mer. However, understanding where a k-mer is positioned along a binding site sequence depends on alignment. Here, we present Top-Down Crawl (TDC), an ultra-rapid tool designed for the alignment of k-mer level data in a rank-dependent and position weight matrix (PWM)-independent manner. As the framework only depends on the rank of the input, the method can accept input from many types of experiments (protein binding microarray, SELEX-seq, SMiLE-seq, etc.) without the need for specialized parameterization. Measuring the performance of the alignment using multiple linear regression with 5-fold cross-validation, we find TDC to perform as well as or better than computationally expensive PWM-based methods.
Availability and implementation
TDC can be run online at https://topdowncrawl.usc.edu or locally as a python package available through pip at https://pypi.org/project/TopDownCrawl.
Supplementary information
Supplementary data are available at Bioinformatics online. Several high-throughput protein-DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the k-mer. However, understanding where a k-mer is positioned along a binding site sequence depends on alignment. Here, we present Top-Down Crawl (TDC), an ultra-rapid tool designed for the alignment of k-mer level data in a rank-dependent and position weight matrix (PWM)-independent manner. As the framework only depends on the rank of the input, the method can accept input from many types of experiments (protein binding microarray, SELEX-seq, SMiLE-seq, etc.) without the need for specialized parameterization. Measuring the performance of the alignment using multiple linear regression with 5-fold cross-validation, we find TDC to perform as well as or better than computationally expensive PWM-based methods. TDC can be run online at https://topdowncrawl.usc.edu or locally as a python package available through pip at https://pypi.org/project/TopDownCrawl. Supplementary data are available at Bioinformatics online. SUMMARYSeveral high-throughput protein-DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the k-mer. However, understanding where a k-mer is positioned along a binding site sequence depends on alignment. Here, we present Top-Down Crawl (TDC), an ultra-rapid tool designed for the alignment of k-mer level data in a rank-dependent and position weight matrix (PWM)-independent manner. As the framework only depends on the rank of the input, the method can accept input from many types of experiments (protein binding microarray, SELEX-seq, SMiLE-seq, etc.) without the need for specialized parameterization. Measuring the performance of the alignment using multiple linear regression with 5-fold cross-validation, we find TDC to perform as well as or better than computationally expensive PWM-based methods. AVAILABILITY AND IMPLEMENTATIONTDC can be run online at https://topdowncrawl.usc.edu or locally as a python package available through pip at https://pypi.org/project/TopDownCrawl. CONTACT SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online. |
Author | Rohs, Remo Chiu, Tsu-Pei Cooper, Brendon H |
Author_xml | – sequence: 1 givenname: Brendon H surname: Cooper fullname: Cooper, Brendon H organization: Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA – sequence: 2 givenname: Tsu-Pei surname: Chiu fullname: Chiu, Tsu-Pei organization: Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA – sequence: 3 givenname: Remo orcidid: 0000-0003-1752-1884 surname: Rohs fullname: Rohs, Remo organization: Departments of Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36179084$$D View this record in MEDLINE/PubMed |
BookMark | eNpVkcFOHDEMhqOKqrC0r4By5DIlmcwmkx4qVUsLSEhc6DnyJM5uqplkm2S74u0ZxHYFJ1uy_fm3_wU5iSkiIRecfeVMi6shpBB9yhPUYMvVUMHKpfhAzriQqul6zk-OOROnZFHKH8bYki3lJ3IqJFea9d0ZyY9p21ynfaSrDPvxGwU6Yd0kR2c4rRuku7FmaDJsg6MQHZ1SDb7xGZHCGNZxwlhp8rTg3x1Gi4XuQ91QKCXZABUdHUJ0Ia5fwHkW-5l89DAW_HKI5-T3r5-Pq9vm_uHmbvXjvrEd72ozyN5xyfhgfceQK6-FBmF70WstOuUAtG2VVUq1EgfUTjlnred64Fzalotz8v2Vu90NEzo768wwmm0OE-QnkyCY95UYNmad_hkt5bKXagZcHgA5zbeVaqZQLI4jREy7YlrVsk608x_nVvnaanMqJaM_ruHMvBhm3htmDobNgxdvRR7H_jskngFoopy8 |
CitedBy_id | crossref_primary_10_1016_j_bpj_2023_12_013 crossref_primary_10_1093_nar_gkad372 |
Cites_doi | 10.1016/j.cell.2011.10.053 10.1093/bioinformatics/btv735 10.1016/j.cell.2015.02.008 10.15252/msb.20167238 10.1093/nar/gkaa642 10.1038/nbt1246 10.1101/gr.222844.117 10.1093/bioinformatics/btx191 10.1038/nmeth.4143 |
ContentType | Journal Article |
Copyright | The Author(s) 2022. Published by Oxford University Press. The Author(s) 2022. Published by Oxford University Press. 2022 |
Copyright_xml | – notice: The Author(s) 2022. Published by Oxford University Press. – notice: The Author(s) 2022. Published by Oxford University Press. 2022 |
DBID | CGR CUY CVF ECM EIF NPM AAYXX CITATION 7X8 5PM |
DOI | 10.1093/bioinformatics/btac653 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed CrossRef MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) CrossRef MEDLINE - Academic |
DatabaseTitleList | CrossRef MEDLINE MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 1367-4811 |
Editor | Alkan, Can |
Editor_xml | – sequence: 1 givenname: Can surname: Alkan fullname: Alkan, Can |
EndPage | 5123 |
ExternalDocumentID | 10_1093_bioinformatics_btac653 36179084 |
Genre | Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
GrantInformation_xml | – fundername: Human Frontier Science Program grantid: RGP0021/2018 – fundername: NIGMS NIH HHS grantid: R35 GM130376 – fundername: ; grantid: R35GM130376 – fundername: ; grantid: RGP0021/2018 |
GroupedDBID | --- -E4 -~X .2P .DC .I3 0R~ 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAVAP AAVLN ABEUO ABIXL ABNKS ABPTD ABQLI ABWST ABXVV ABZBJ ACGFS ACIWK ACPRK ACUFI ACYTK ADBBV ADEYI ADEZT ADFTL ADGZP ADHKW ADHZD ADOCK ADPDF ADRIX ADRTK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFIYH AFOFC AFRAH AFXEN AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AIJHB AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC APIBT APWMN ARIXL ASPBG AVWKF AXUDD AYOIW AZVOD BAWUL BAYMD BCRHZ BHONS BQDIO BQUQU BSWAC BTQHN C45 CDBKE CGR CS3 CUY CVF CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS ECM EE~ EIF EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GX1 H13 H5~ HAR HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NPM NU- O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ Q1. Q5Y R44 RD5 RIG RNS ROL ROX RPM RUSNO RW1 RXO SV3 TEORI TJP TLC TOX TR2 W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~91 ~KM AASNB AAYXX CITATION 7X8 5PM ADGKP ADRDM ADVEK AFGWE AJEEA |
ID | FETCH-LOGICAL-c414t-b68d1601bcf40e17f939a3c83899347daa9c27c77726ebe9d7ddccf19b116c213 |
IEDL.DBID | RPM |
ISSN | 1367-4803 |
IngestDate | Tue Sep 17 21:35:11 EDT 2024 Fri Oct 25 09:45:10 EDT 2024 Fri Aug 23 02:36:26 EDT 2024 Wed Oct 16 00:40:17 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 22 |
Language | English |
License | The Author(s) 2022. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c414t-b68d1601bcf40e17f939a3c83899347daa9c27c77726ebe9d7ddccf19b116c213 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0003-1752-1884 |
OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9665867/ |
PMID | 36179084 |
PQID | 2720432084 |
PQPubID | 23479 |
PageCount | 3 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_9665867 proquest_miscellaneous_2720432084 crossref_primary_10_1093_bioinformatics_btac653 pubmed_primary_36179084 |
PublicationCentury | 2000 |
PublicationDate | 2022-11-15 |
PublicationDateYYYYMMDD | 2022-11-15 |
PublicationDate_xml | – month: 11 year: 2022 text: 2022-11-15 day: 15 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | Bioinformatics (Oxford, England) |
PublicationTitleAlternate | Bioinformatics |
PublicationYear | 2022 |
Publisher | Oxford University Press |
Publisher_xml | – name: Oxford University Press |
References | Abe (2022112014194924000_btac653-B1) 2015; 161 Bailey (2022112014194924000_btac653-B2) 1994; 2 Dantas Machado (2022112014194924000_btac653-B5) 2020; 48 Yang (2022112014194924000_btac653-B10) 2017; 13 Zhang (2022112014194924000_btac653-B11) 2018; 28 Chiu (2022112014194924000_btac653-B4) 2016; 32 Ruan (2022112014194924000_btac653-B8) 2017; 33 Berger (2022112014194924000_btac653-B3) 2006; 24 Riley (2022112014194924000_btac653-B7) 2014 Slattery (2022112014194924000_btac653-B9) 2011; 147 Isakova (2022112014194924000_btac653-B6) 2017; 14 |
References_xml | – volume: 147 start-page: 1270 year: 2011 ident: 2022112014194924000_btac653-B9 article-title: Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins publication-title: Cell doi: 10.1016/j.cell.2011.10.053 contributor: fullname: Slattery – volume: 32 start-page: 1211 year: 2016 ident: 2022112014194924000_btac653-B4 article-title: DNAshapeR: an R/bioconductor package for DNA shape prediction and feature encoding publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv735 contributor: fullname: Chiu – volume: 161 start-page: 307 year: 2015 ident: 2022112014194924000_btac653-B1 article-title: Deconvolving the recognition of DNA shape from sequence publication-title: Cell doi: 10.1016/j.cell.2015.02.008 contributor: fullname: Abe – volume: 13 start-page: 910 year: 2017 ident: 2022112014194924000_btac653-B10 article-title: Transcription factor family-specific DNA shape readout revealed by quantitative specificity models publication-title: Mol. Syst. Biol doi: 10.15252/msb.20167238 contributor: fullname: Yang – volume: 48 start-page: 8529 year: 2020 ident: 2022112014194924000_btac653-B5 article-title: Landscape of DNA binding signatures of myocyte enhancer factor-2B reveals a unique interplay of base and shape readout publication-title: Nucleic Acids Res doi: 10.1093/nar/gkaa642 contributor: fullname: Dantas Machado – volume: 24 start-page: 1429 year: 2006 ident: 2022112014194924000_btac653-B3 article-title: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities publication-title: Nat. Biotechnol doi: 10.1038/nbt1246 contributor: fullname: Berger – volume: 28 start-page: 111 year: 2018 ident: 2022112014194924000_btac653-B11 article-title: SelexGLM differentiates androgen and glucocorticoid receptor DNA-binding preference over an extended binding site publication-title: Genome Res doi: 10.1101/gr.222844.117 contributor: fullname: Zhang – volume: 33 start-page: 2288 year: 2017 ident: 2022112014194924000_btac653-B8 article-title: BEESEM: estimation of binding energy models using HT-SELEX data publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx191 contributor: fullname: Ruan – volume: 14 start-page: 316 year: 2017 ident: 2022112014194924000_btac653-B6 article-title: SMiLE-seq identifies binding motifs of single and dimeric transcription factors publication-title: Nat. Methods doi: 10.1038/nmeth.4143 contributor: fullname: Isakova – start-page: 255 year: 2014 ident: 2022112014194924000_btac653-B7 contributor: fullname: Riley – volume: 2 start-page: 28 year: 1994 ident: 2022112014194924000_btac653-B2 article-title: Fitting a mixture model by expectation maximization to discover motifs in biopolymers publication-title: Proc. Int. Conf. Intell. Syst. Mol. Biol contributor: fullname: Bailey |
SSID | ssj0005056 |
Score | 2.4603925 |
Snippet | Several high-throughput protein-DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the k-mer.... Abstract Summary Several high-throughput protein–DNA binding methods currently available produce highly reproducible measurements of binding affinity at the... SUMMARYSeveral high-throughput protein-DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the... |
SourceID | pubmedcentral proquest crossref pubmed |
SourceType | Open Access Repository Aggregation Database Index Database |
StartPage | 5121 |
SubjectTerms | Applications Note Binding Sites Position-Specific Scoring Matrices Protein Binding Sequence Analysis, DNA - methods Software |
Title | Top-Down Crawl: a method for the ultra-rapid and motif-free alignment of sequences with associated binding metrics |
URI | https://www.ncbi.nlm.nih.gov/pubmed/36179084 https://search.proquest.com/docview/2720432084 https://pubmed.ncbi.nlm.nih.gov/PMC9665867 |
Volume | 38 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1La9wwEB6SQKCXkjZ9bNoGBXpV_JBkWb2VbUPaktBDArkZPVvDxl6cXUL-fUZ-LN3mlrNtIfwN1jeeb74B-BxU4bnPOZXKWsqDiPVdoykyba-ES50SscH54rI4v-Y_b8TNDoipF6YX7VtTnzaL29Om_ttrK5e3Npl0YsnvizlSdFEWMtmFXQzQKUWfdB1pP7I1WpFRXqZsagtWLDF1O9qRRgvkxKy0LUScpMOKaFRV8u3D6Qnj_F84-c9JdHYAL0cKSb4OW30FO755DfvDUMmHQ-iu2iX9hsk1mXf6fvGFaDKMiSa4J4J8j6wXuCzt9LJ2RDeORD1eoKHzniAr_9PrA0gbyEZmTeLfWqJHJL0jpu6bYeLC-B29ewPXZ9-v5ud0nKxALc_4ipqidBmmYsYGnvpMBsWUZraMZnuMS6e1srm0Eql3gSgrJ52zNmTKZFlh84y9hb2mbfx7IKW0RoVoop8GnrugS8x6lRA6N1LaMp1BMr3SajkYaFRD4ZtV23hUIx4zOJnefIWxHgsYuvHt-q6KNWPOcgRqBu8GJDZrThDOQG5htLkh-mhvX8Hw6v20x3A6evaTH-BFHrsiojpQfIS9Vbf2n5CrrMwxsvQfv477CH0Ean_vfQ |
link.rule.ids | 230,315,730,783,787,888,27937,27938,53805,53807 |
linkProvider | National Library of Medicine |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3LbtQwFL0qRQg2vB_D00hsPXnZcdxdNVAN0KlYTFF3kZ8lYpqM0oxQ-_XYSTxiygrWSaxY5zo-Nz73XIAPlueGmJRgxpXCxFJ_visFdkzbcKpjzakvcF6c5PNT8uWMnu0BDbUwvWhfyWpary6mdfWj11auL1QUdGLRt8XMUXRa5Cy6Bbfdeo1pSNKDsiPum7Z6MzJMijgLhcE8i2TVjIak3gQ5kp1QOfW9dLLcW1UVZHd7-otz3pRO_rEXHT2A72EWgwTl53TTyam6vmHw-M_TfAj3R3aKDofLj2DP1I_hztCv8uoJtMtmjT-6vB3NWvFrdYAEGjpQIzdZ5Kgk2qzc--JWrCuNRK2Rl_pZbFtjkCP85730ADUWbRXcyP8IRmIMEqORrPo6Gz-w-0RfPoXTo0_L2RyPTRuwIgnpsMwLnbgsTypLYpMwyzMuMlV4H7-MMC0EVylTzLH63AUQ10xrpWzCZZLkKk2yZ7BfN7V5AahgSnLr_fljS1JtReESak6pSCVjqognEAWsyvXgzVEOZ-pZuQt0OQI9gfcB0tItI382ImrTbC5LfxxNstRFwASeDxBvxwyxMQG2A_72Bm_RvXvFQdpbdY8QvvzvJ9_B3flycVwefz75-grupb74wosQ6WvY79qNeeMoUSff9gvgN9MQEJk |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwEB5BEYgL5c3SAkbi6s3Ddhxzq7asyqNVD61UcYn8hIhtEqVZIfj12HmsuuXWcxwr1oztbzLffAPwwYnMUptSzIXWmDoW8rtKYo-0rWAmNoKFAufjk-zonH65YBfXWn31pH2tynm1upxX5c-eW9lc6mjiiUWnxwsP0Vme8agxLroL91gQTZ8C9YndEfeNW4MgGaZ5TKbiYEEiVdajKGkQQo5UJ3XGQj8dkgW5qpxuX1H_4c6b9Mlr99FyF75PKxloKL_m607N9d8bIo-3WupjeDSiVHQwDHkCd2z1FO4PfSv_PIP2rG7woY_f0aKVv1cfkURDJ2rkF4w8pETrlf9m3MqmNEhWBgXKn8OutRZ54P-jpyCg2qENkxuFH8JIjs5iDVJlX28TJvZH9dVzOF9-Olsc4bF5A9Y0oR1WWW4SH-0p7WhsE-4EEZLoPOj5EcqNlEKnXHOP7jPvSMJwY7R2iVBJkuk0IS9gp6or-wpQzrUSLuj0x46mxsncB9aCMZkqznUezyCa7FU0g0ZHMeTWSbFt7GI09gzeT2Yt_HYKORJZ2Xp9VYS0NCWp94IZvBzMvJlz8o8Z8C0H2AwIUt3bT7xZe8nu0Yyvb_3mO3hwergsvn0--boHD9NQgxG4iGwfdrp2bd94ZNSpt_0e-Ac6PBMZ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Top-Down+Crawl%3A+a+method+for+the+ultra-rapid+and+motif-free+alignment+of+sequences+with+associated+binding+metrics&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Cooper%2C+Brendon+H&rft.au=Chiu%2C+Tsu-Pei&rft.au=Rohs%2C+Remo&rft.date=2022-11-15&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=38&rft.issue=22&rft.spage=5121&rft.epage=5123&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtac653&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_btac653 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |