An Algorithm to Calculate the p-Value of the Monge-Elkan Distance

The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generati...

Full description

Saved in:
Bibliographic Details
Published inJournal of computational biology Vol. 32; no. 8; pp. 797 - 812
Main Authors Ryšavý, Petr, Železný, Filip
Format Journal Article
LanguageEnglish
Published United States Mary Ann Liebert, Inc., publishers 01.08.2025
Subjects
Online AccessGet full text
ISSN1557-8666
1557-8666
DOI10.1089/cmb.2024.0854

Cover

Abstract The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the p -value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the p -value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance.
AbstractList The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the p-value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the p-value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance.The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the p-value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the p-value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance.
The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the -value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the -value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance.
The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the p -value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the p -value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance.
Author Železný, Filip
Ryšavý, Petr
Author_xml – sequence: 1
  givenname: Petr
  orcidid: 0000-0002-6597-6616
  surname: Ryšavý
  fullname: Ryšavý, Petr
– sequence: 2
  givenname: Filip
  surname: Železný
  fullname: Železný, Filip
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40488654$$D View this record in MEDLINE/PubMed
BookMark eNqF0D1PwzAQBmALFdEPGFlRRpYU24kde6xK-ZBALBWr5biXNuDYJXYG_j0JLYiN6e6k5066d4pGzjtA6JLgOcFC3pimnFNM8zkWLD9BE8JYkQrO-ehPP0bTEN4wJhnHxRka5zgXgrN8ghYLlyzs1rd13DVJ9MlSW9NZHSGJO0j26au2HSS--h6fvdtCurLv2iW3dYjaGThHp5W2AS6OdYbWd6v18iF9erl_XC6eUkMli2nFOdV0wzaiwpKDxqwoscESKJcUG15wyIUsy1LkGohkWuMCjM50JbA0kM3Q9eHsvvUfHYSomjoYsFY78F1QGSVcEkll0dOrI-3KBjZq39aNbj_Vz9c9SA_AtD6EFqpfQrAaUlV9qmpIVQ2p9j47-MFo52wNJbTxn60vh1l5bg
Cites_doi 10.1007/978-3-642-00382-0_45
10.1080/01621459.1963.10500830
10.1103/PhysRevE.72.020901
10.1007/978-3-319-46349-0_18
10.1016/S0019-9958(85)80046-2
10.4018/IJISMD.2018100103
10.1002/j.1538-7305.1950.tb00463.x
10.1007/BF01178683
10.2307/3212444
10.1186/s13040-023-00329-x
10.1109/ICPR.2016.7899857
10.1186/s13059-017-1319-7
10.1186/1471-2164-13-S3-S8
10.1007/978-3-642-41338-4_19
10.1007/s10115-009-0254-7
10.1145/321796.321811
10.1007/978-3-319-68765-0_23
10.1007/11574620_45
10.1214/aoms/1177729694
10.1186/1471-2105-15-187
10.1007/s10618-018-0584-8
10.1080/17538947.2017.1371253
ContentType Journal Article
Copyright 2025, Mary Ann Liebert, Inc., publishers
Copyright_xml – notice: 2025, Mary Ann Liebert, Inc., publishers
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1089/cmb.2024.0854
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
Mathematics
EISSN 1557-8666
EndPage 812
ExternalDocumentID 40488654
10_1089_cmb_2024_0854
Genre Journal Article
GroupedDBID ---
0R~
29K
4.4
53G
5GY
ABBKN
ACGFO
ADBBV
AENEX
AFOSN
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BNQNF
CS3
D-I
DIK
DU5
EBS
F5P
IAO
IHR
IM4
MV1
NQHIM
O9-
P2P
RML
RNS
TN5
TR2
UE5
AAYXX
CITATION
34G
39C
ABEFU
AI.
CAG
CGR
COF
CUY
CVF
ECM
EIF
EJD
IER
IGS
ITC
NPM
R.V
RIG
RMSOB
VH1
7X8
SCNPE
ID FETCH-LOGICAL-c295t-f662a2d5d8f096ea057b0c09e26920c676e489bbb84ae195aa07eca3af809ce3
ISSN 1557-8666
IngestDate Fri Sep 05 15:55:33 EDT 2025
Fri Aug 01 03:41:23 EDT 2025
Thu Aug 07 07:27:20 EDT 2025
Thu Jul 31 06:40:19 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 8
Keywords Monge-Elkan distance
value
null distribution
p-value
Language English
License https://www.liebertpub.com/nv/resources-tools/text-and-data-mining-policy/121
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c295t-f662a2d5d8f096ea057b0c09e26920c676e489bbb84ae195aa07eca3af809ce3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-6597-6616
PMID 40488654
PQID 3216919297
PQPubID 23479
PageCount 16
ParticipantIDs proquest_miscellaneous_3216919297
pubmed_primary_40488654
crossref_primary_10_1089_cmb_2024_0854
maryannliebert_primary_10_1089_cmb_2024_0854
PublicationCentury 2000
PublicationDate 2025-08-01
PublicationDateYYYYMMDD 2025-08-01
PublicationDate_xml – month: 08
  year: 2025
  text: 2025-08-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of computational biology
PublicationTitleAlternate J Comput Biol
PublicationYear 2025
Publisher Mary Ann Liebert, Inc., publishers
Publisher_xml – name: Mary Ann Liebert, Inc., publishers
References B20
B21
B22
B23
B24
B25
B26
B27
B28
Bernstein S (B2) 1924
Cohen WW (B6) 2003
Monge AE (B18) 1996
B30
B31
B10
Marriott FHC (B17) 1979; 28
B11
B14
B16
B19
Levenshtein VI (B15) 1966; 10
B1
B3
B4
B5
B8
B9
Knuth D (B13) 1981; 2
References_xml – ident: B11
  doi: 10.1007/978-3-642-00382-0_45
– volume: 10
  start-page: 707
  issue: 8
  year: 1966
  ident: B15
  publication-title: Soviet Physics Doklady
– ident: B10
  doi: 10.1080/01621459.1963.10500830
– ident: B16
  doi: 10.1103/PhysRevE.72.020901
– start-page: 267
  volume-title: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96
  year: 1996
  ident: B18
– ident: B21
  doi: 10.1007/978-3-319-46349-0_18
– volume: 28
  start-page: 75
  issue: 1
  year: 1979
  ident: B17
  publication-title: Journal of the Royal Statistical Society. Series C (Applied Statistics)
– ident: B27
  doi: 10.1016/S0019-9958(85)80046-2
– ident: B1
  doi: 10.4018/IJISMD.2018100103
– ident: B9
  doi: 10.1002/j.1538-7305.1950.tb00463.x
– ident: B3
  doi: 10.1007/BF01178683
– ident: B5
  doi: 10.2307/3212444
– ident: B23
  doi: 10.1186/s13040-023-00329-x
– ident: B8
  doi: 10.1109/ICPR.2016.7899857
– ident: B31
  doi: 10.1186/s13059-017-1319-7
– volume: 2
  volume-title: The Art of Computer Programming (Seminumerical Algorithms)
  year: 1981
  ident: B13
– start-page: 73
  volume-title: Proceedings of the 2003 International Conference on Information Integration on the Web, IIWEB’03
  year: 2003
  ident: B6
– ident: B30
  doi: 10.1186/1471-2164-13-S3-S8
– ident: B4
  doi: 10.1007/978-3-642-41338-4_19
– year: 1924
  ident: B2
  publication-title: Ann Sci Inst Sav Ukraine, Sect. Math
– ident: B25
  doi: 10.1007/s10115-009-0254-7
– ident: B28
  doi: 10.1145/321796.321811
– ident: B20
  doi: 10.1007/978-3-319-68765-0_23
– ident: B26
  doi: 10.1007/11574620_45
– ident: B14
  doi: 10.1214/aoms/1177729694
– ident: B19
  doi: 10.1186/1471-2105-15-187
– ident: B22
  doi: 10.1007/s10618-018-0584-8
– ident: B24
  doi: 10.1080/17538947.2017.1371253
SSID ssj0013607
Score 2.4393797
Snippet The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially...
SourceID proquest
pubmed
crossref
maryannliebert
SourceType Aggregation Database
Index Database
Publisher
StartPage 797
SubjectTerms Algorithms
Computational Biology - methods
High-Throughput Nucleotide Sequencing - methods
Humans
Original Articles
Sequence Analysis, DNA - methods
Title An Algorithm to Calculate the p-Value of the Monge-Elkan Distance
URI https://www.liebertpub.com/doi/abs/10.1089/cmb.2024.0854
https://www.ncbi.nlm.nih.gov/pubmed/40488654
https://www.proquest.com/docview/3216919297
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELdgCDQkEIyv8iUjMV62jNTxR_KYtpsGCJ4K2ltku86QaJOKpUjbX885jp2WMfHxEqWxm0j3O59_9p3vEHrNbH09kuoIJlcdUUZgSImhiZgsqZBpnKhZm-3zEz_-TN-fsJO-CGJ7uqRRB_rit-dK_gdVeAa42lOy_4BseCk8gHvAF66AMFz_CuO82svnpzWs778uLIkcy7kNK22MOwC1F32R81UIA4Dhe2qiw_k3GNITSxs93pe5qW5rPfh9wi5RU_DNnO-O2W4-lD-sl3006cJ8Q5SvbR1NYDq7qPoOR3bfZn2LgbAQ4AYzRGcWGcxl3NVH8Xaz35dc-bhbZwSFi7i9ZJzj1OY21QsFy3JCD4Ds0fV-INvlokWKWrPCXesv2bB903V0gwjhHPPvPvR-Ix6LLpMqfO3txre20S3_7w0ScseeEZRVBZzfxrJfvdRoKcf0Hrrb4YFzB_x9dM1UO-imqx56voNufwwpd88eoFFe4aAMuKlxUAYMvfASO2XAddn-XlMG7JXhIZoeHU7Hx1FXICPSJGNNVHJOJJmxWVrCStRI4N4q1nFmCM9IrLnghqaZUiql0gwzJmUsjJaJLNM40yZ5hLaqujJPECaClgbIr5lxTkuYeZSiNNNCJJmEMasH6I0XWLF0aVCKNnwhzQoQcmGFXFghD9D-pjj_1P2VF3YBds06q2Rl6tVZkRCbxgnIuxigxw6F8CoP49MrW56h7V6Rn6Ot5vvKvAD22KiXrcL8BJ_AaOg
linkProvider Flying Publisher
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Algorithm+to+Calculate+the+p+-Value+of+the+Monge-Elkan+Distance&rft.jtitle=Journal+of+computational+biology&rft.au=Ry%C5%A1av%C3%BD%2C+Petr&rft.au=%C5%BDelezn%C3%BD%2C+Filip&rft.date=2025-08-01&rft.eissn=1557-8666&rft.volume=32&rft.issue=8&rft.spage=797&rft_id=info:doi/10.1089%2Fcmb.2024.0854&rft_id=info%3Apmid%2F40488654&rft.externalDocID=40488654
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1557-8666&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1557-8666&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1557-8666&client=summon