An Algorithm to Calculate the p-Value of the Monge-Elkan Distance
The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generati...
Saved in:
Published in | Journal of computational biology Vol. 32; no. 8; pp. 797 - 812 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
United States
Mary Ann Liebert, Inc., publishers
01.08.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 1557-8666 1557-8666 |
DOI | 10.1089/cmb.2024.0854 |
Cover
Abstract | The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the
p
-value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the
p
-value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance. |
---|---|
AbstractList | The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the p-value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the p-value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance.The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the p-value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the p-value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance. The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the -value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the -value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance. The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the p -value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the p -value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance. |
Author | Železný, Filip Ryšavý, Petr |
Author_xml | – sequence: 1 givenname: Petr orcidid: 0000-0002-6597-6616 surname: Ryšavý fullname: Ryšavý, Petr – sequence: 2 givenname: Filip surname: Železný fullname: Železný, Filip |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40488654$$D View this record in MEDLINE/PubMed |
BookMark | eNqF0D1PwzAQBmALFdEPGFlRRpYU24kde6xK-ZBALBWr5biXNuDYJXYG_j0JLYiN6e6k5066d4pGzjtA6JLgOcFC3pimnFNM8zkWLD9BE8JYkQrO-ehPP0bTEN4wJhnHxRka5zgXgrN8ghYLlyzs1rd13DVJ9MlSW9NZHSGJO0j26au2HSS--h6fvdtCurLv2iW3dYjaGThHp5W2AS6OdYbWd6v18iF9erl_XC6eUkMli2nFOdV0wzaiwpKDxqwoscESKJcUG15wyIUsy1LkGohkWuMCjM50JbA0kM3Q9eHsvvUfHYSomjoYsFY78F1QGSVcEkll0dOrI-3KBjZq39aNbj_Vz9c9SA_AtD6EFqpfQrAaUlV9qmpIVQ2p9j47-MFo52wNJbTxn60vh1l5bg |
Cites_doi | 10.1007/978-3-642-00382-0_45 10.1080/01621459.1963.10500830 10.1103/PhysRevE.72.020901 10.1007/978-3-319-46349-0_18 10.1016/S0019-9958(85)80046-2 10.4018/IJISMD.2018100103 10.1002/j.1538-7305.1950.tb00463.x 10.1007/BF01178683 10.2307/3212444 10.1186/s13040-023-00329-x 10.1109/ICPR.2016.7899857 10.1186/s13059-017-1319-7 10.1186/1471-2164-13-S3-S8 10.1007/978-3-642-41338-4_19 10.1007/s10115-009-0254-7 10.1145/321796.321811 10.1007/978-3-319-68765-0_23 10.1007/11574620_45 10.1214/aoms/1177729694 10.1186/1471-2105-15-187 10.1007/s10618-018-0584-8 10.1080/17538947.2017.1371253 |
ContentType | Journal Article |
Copyright | 2025, Mary Ann Liebert, Inc., publishers |
Copyright_xml | – notice: 2025, Mary Ann Liebert, Inc., publishers |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 |
DOI | 10.1089/cmb.2024.0854 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology Mathematics |
EISSN | 1557-8666 |
EndPage | 812 |
ExternalDocumentID | 40488654 10_1089_cmb_2024_0854 |
Genre | Journal Article |
GroupedDBID | --- 0R~ 29K 4.4 53G 5GY ABBKN ACGFO ADBBV AENEX AFOSN ALMA_UNASSIGNED_HOLDINGS BAWUL BNQNF CS3 D-I DIK DU5 EBS F5P IAO IHR IM4 MV1 NQHIM O9- P2P RML RNS TN5 TR2 UE5 AAYXX CITATION 34G 39C ABEFU AI. CAG CGR COF CUY CVF ECM EIF EJD IER IGS ITC NPM R.V RIG RMSOB VH1 7X8 SCNPE |
ID | FETCH-LOGICAL-c295t-f662a2d5d8f096ea057b0c09e26920c676e489bbb84ae195aa07eca3af809ce3 |
ISSN | 1557-8666 |
IngestDate | Fri Sep 05 15:55:33 EDT 2025 Fri Aug 01 03:41:23 EDT 2025 Thu Aug 07 07:27:20 EDT 2025 Thu Jul 31 06:40:19 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 8 |
Keywords | Monge-Elkan distance value null distribution p-value |
Language | English |
License | https://www.liebertpub.com/nv/resources-tools/text-and-data-mining-policy/121 |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c295t-f662a2d5d8f096ea057b0c09e26920c676e489bbb84ae195aa07eca3af809ce3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0002-6597-6616 |
PMID | 40488654 |
PQID | 3216919297 |
PQPubID | 23479 |
PageCount | 16 |
ParticipantIDs | proquest_miscellaneous_3216919297 pubmed_primary_40488654 crossref_primary_10_1089_cmb_2024_0854 maryannliebert_primary_10_1089_cmb_2024_0854 |
PublicationCentury | 2000 |
PublicationDate | 2025-08-01 |
PublicationDateYYYYMMDD | 2025-08-01 |
PublicationDate_xml | – month: 08 year: 2025 text: 2025-08-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Journal of computational biology |
PublicationTitleAlternate | J Comput Biol |
PublicationYear | 2025 |
Publisher | Mary Ann Liebert, Inc., publishers |
Publisher_xml | – name: Mary Ann Liebert, Inc., publishers |
References | B20 B21 B22 B23 B24 B25 B26 B27 B28 Bernstein S (B2) 1924 Cohen WW (B6) 2003 Monge AE (B18) 1996 B30 B31 B10 Marriott FHC (B17) 1979; 28 B11 B14 B16 B19 Levenshtein VI (B15) 1966; 10 B1 B3 B4 B5 B8 B9 Knuth D (B13) 1981; 2 |
References_xml | – ident: B11 doi: 10.1007/978-3-642-00382-0_45 – volume: 10 start-page: 707 issue: 8 year: 1966 ident: B15 publication-title: Soviet Physics Doklady – ident: B10 doi: 10.1080/01621459.1963.10500830 – ident: B16 doi: 10.1103/PhysRevE.72.020901 – start-page: 267 volume-title: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96 year: 1996 ident: B18 – ident: B21 doi: 10.1007/978-3-319-46349-0_18 – volume: 28 start-page: 75 issue: 1 year: 1979 ident: B17 publication-title: Journal of the Royal Statistical Society. Series C (Applied Statistics) – ident: B27 doi: 10.1016/S0019-9958(85)80046-2 – ident: B1 doi: 10.4018/IJISMD.2018100103 – ident: B9 doi: 10.1002/j.1538-7305.1950.tb00463.x – ident: B3 doi: 10.1007/BF01178683 – ident: B5 doi: 10.2307/3212444 – ident: B23 doi: 10.1186/s13040-023-00329-x – ident: B8 doi: 10.1109/ICPR.2016.7899857 – ident: B31 doi: 10.1186/s13059-017-1319-7 – volume: 2 volume-title: The Art of Computer Programming (Seminumerical Algorithms) year: 1981 ident: B13 – start-page: 73 volume-title: Proceedings of the 2003 International Conference on Information Integration on the Web, IIWEB’03 year: 2003 ident: B6 – ident: B30 doi: 10.1186/1471-2164-13-S3-S8 – ident: B4 doi: 10.1007/978-3-642-41338-4_19 – year: 1924 ident: B2 publication-title: Ann Sci Inst Sav Ukraine, Sect. Math – ident: B25 doi: 10.1007/s10115-009-0254-7 – ident: B28 doi: 10.1145/321796.321811 – ident: B20 doi: 10.1007/978-3-319-68765-0_23 – ident: B26 doi: 10.1007/11574620_45 – ident: B14 doi: 10.1214/aoms/1177729694 – ident: B19 doi: 10.1186/1471-2105-15-187 – ident: B22 doi: 10.1007/s10618-018-0584-8 – ident: B24 doi: 10.1080/17538947.2017.1371253 |
SSID | ssj0013607 |
Score | 2.4393797 |
Snippet | The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially... |
SourceID | proquest pubmed crossref maryannliebert |
SourceType | Aggregation Database Index Database Publisher |
StartPage | 797 |
SubjectTerms | Algorithms Computational Biology - methods High-Throughput Nucleotide Sequencing - methods Humans Original Articles Sequence Analysis, DNA - methods |
Title | An Algorithm to Calculate the p-Value of the Monge-Elkan Distance |
URI | https://www.liebertpub.com/doi/abs/10.1089/cmb.2024.0854 https://www.ncbi.nlm.nih.gov/pubmed/40488654 https://www.proquest.com/docview/3216919297 |
Volume | 32 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELdgCDQkEIyv8iUjMV62jNTxR_KYtpsGCJ4K2ltku86QaJOKpUjbX885jp2WMfHxEqWxm0j3O59_9p3vEHrNbH09kuoIJlcdUUZgSImhiZgsqZBpnKhZm-3zEz_-TN-fsJO-CGJ7uqRRB_rit-dK_gdVeAa42lOy_4BseCk8gHvAF66AMFz_CuO82svnpzWs778uLIkcy7kNK22MOwC1F32R81UIA4Dhe2qiw_k3GNITSxs93pe5qW5rPfh9wi5RU_DNnO-O2W4-lD-sl3006cJ8Q5SvbR1NYDq7qPoOR3bfZn2LgbAQ4AYzRGcWGcxl3NVH8Xaz35dc-bhbZwSFi7i9ZJzj1OY21QsFy3JCD4Ds0fV-INvlokWKWrPCXesv2bB903V0gwjhHPPvPvR-Ix6LLpMqfO3txre20S3_7w0ScseeEZRVBZzfxrJfvdRoKcf0Hrrb4YFzB_x9dM1UO-imqx56voNufwwpd88eoFFe4aAMuKlxUAYMvfASO2XAddn-XlMG7JXhIZoeHU7Hx1FXICPSJGNNVHJOJJmxWVrCStRI4N4q1nFmCM9IrLnghqaZUiql0gwzJmUsjJaJLNM40yZ5hLaqujJPECaClgbIr5lxTkuYeZSiNNNCJJmEMasH6I0XWLF0aVCKNnwhzQoQcmGFXFghD9D-pjj_1P2VF3YBds06q2Rl6tVZkRCbxgnIuxigxw6F8CoP49MrW56h7V6Rn6Ot5vvKvAD22KiXrcL8BJ_AaOg |
linkProvider | Flying Publisher |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Algorithm+to+Calculate+the+p+-Value+of+the+Monge-Elkan+Distance&rft.jtitle=Journal+of+computational+biology&rft.au=Ry%C5%A1av%C3%BD%2C+Petr&rft.au=%C5%BDelezn%C3%BD%2C+Filip&rft.date=2025-08-01&rft.eissn=1557-8666&rft.volume=32&rft.issue=8&rft.spage=797&rft_id=info:doi/10.1089%2Fcmb.2024.0854&rft_id=info%3Apmid%2F40488654&rft.externalDocID=40488654 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1557-8666&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1557-8666&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1557-8666&client=summon |