Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery

Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and...

Full description

Saved in:
Bibliographic Details
Main Authors Lahiri, Sounak, Pai, Sumit, Weninger, Tim, Bhattacharya, Sanmitra
Format Journal Article
LanguageEnglish
Published 29.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and cost-effectiveness. Although traditional approaches like BM25 or fine-tuned pre-trained models are common in eDiscovery, they face performance, computational, and interpretability challenges. In contrast, Large Language Model (LLM)-based methods prioritize interpretability but sacrifice performance and throughput. This paper introduces DISCOvery Graph (DISCOG), a hybrid approach that combines the strengths of two worlds: a heterogeneous graph-based method for accurate document relevance prediction and subsequent LLM-driven approach for reasoning. Graph representational learning generates embeddings and predicts links, ranking the corpus for a given request, and the LLMs provide reasoning for document relevance. Our approach handles datasets with balanced and imbalanced distributions, outperforming baselines in F1-score, precision, and recall by an average of 12%, 3%, and 16%, respectively. In an enterprise context, our approach drastically reduces document review costs by 99.9% compared to manual processes and by 95% compared to LLM-based classification methods
AbstractList Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and cost-effectiveness. Although traditional approaches like BM25 or fine-tuned pre-trained models are common in eDiscovery, they face performance, computational, and interpretability challenges. In contrast, Large Language Model (LLM)-based methods prioritize interpretability but sacrifice performance and throughput. This paper introduces DISCOvery Graph (DISCOG), a hybrid approach that combines the strengths of two worlds: a heterogeneous graph-based method for accurate document relevance prediction and subsequent LLM-driven approach for reasoning. Graph representational learning generates embeddings and predicts links, ranking the corpus for a given request, and the LLMs provide reasoning for document relevance. Our approach handles datasets with balanced and imbalanced distributions, outperforming baselines in F1-score, precision, and recall by an average of 12%, 3%, and 16%, respectively. In an enterprise context, our approach drastically reduces document review costs by 99.9% compared to manual processes and by 95% compared to LLM-based classification methods
Author Pai, Sumit
Weninger, Tim
Lahiri, Sounak
Bhattacharya, Sanmitra
Author_xml – sequence: 1
  givenname: Sounak
  surname: Lahiri
  fullname: Lahiri, Sounak
– sequence: 2
  givenname: Sumit
  surname: Pai
  fullname: Pai, Sumit
– sequence: 3
  givenname: Tim
  surname: Weninger
  fullname: Weninger, Tim
– sequence: 4
  givenname: Sanmitra
  surname: Bhattacharya
  fullname: Bhattacharya, Sanmitra
BackLink https://doi.org/10.48550/arXiv.2405.19164$$DView paper in arXiv
BookMark eNotj8tOwzAURL2ABRQ-gBX-gQS_67BDBQqSEVLpPrpJroul1q7sKKJ_DwRWI43mjHQuyVlMEQm54axWVmt2B_krTLVQTNe84UZdkA-HkGOIO-pzOlAXxrCDMaR4T9cZjp-FQhyoc2-F-pTpBscccIL9XG8QSprhECk-htKnCfPpipx72Be8_s8F2T4_bVcvlXtfv64eXAVmqSrjZQ99o5qB2V53YLkxFhA9sk5Y1g3ScsENLsXwM-s4eAnSaqGk4VyglQty-3c7W7XHHA6QT-2vXTvbyW-8ZUwE
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2405.19164
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2405_19164
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a674-6f3cac949d08c5ba81668aeefe0b280bd381216e72dac9b1af3a3852436112e83
IEDL.DBID GOX
IngestDate Fri May 31 12:10:23 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a674-6f3cac949d08c5ba81668aeefe0b280bd381216e72dac9b1af3a3852436112e83
OpenAccessLink https://arxiv.org/abs/2405.19164
ParticipantIDs arxiv_primary_2405_19164
PublicationCentury 2000
PublicationDate 2024-05-29
PublicationDateYYYYMMDD 2024-05-29
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-05-29
  day: 29
PublicationDecade 2020
PublicationYear 2024
Score 1.9231014
SecondaryResourceType preprint
Snippet Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Information Retrieval
Title Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery
URI https://arxiv.org/abs/2405.19164
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwELbaTiwIBKg85YHV4PgVhw0BbYVSkEqRulV-oiwpaloE_x7bCYKF1T4vZ0vfd7677wC4tE54nNsMcV8QxCzRqJBUIUy15cbzXPPYnDx9EpNX9rjgix6AP70wav1ZfbT6wLq5DnDDr0JEIVgf9AmJJVvj50WbnExSXJ39r13gmGnpD0iM9sBux-7gbXsd-6Dn6gPw0mmYvsHYywHLqpW1WNU3cBzlohsYonlYltMGBgYJZ2nGVXgAaXnmVJN-TGFVQ3dfNSbWXH4dgvnoYX43Qd0sA6REzpDw1ChTsMJiabhWMVsnlXPeYU0k1jYAJ8mEy4kNZjpTnioqOWFUBELkJD0Cg3pVuyGAXJGoumecopgZL2WGfYiabDhhJeb-GAyTB5bvrVzFMjpnmZxz8v_WKdghAa5jXpwUZ2CwWW_deYDbjb5IPv8Gul6AUg
link.rule.ids 228,230,780,885
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+from+Litigation%3A+Graphs+and+LLMs+for+Retrieval+and+Reasoning+in+eDiscovery&rft.au=Lahiri%2C+Sounak&rft.au=Pai%2C+Sumit&rft.au=Weninger%2C+Tim&rft.au=Bhattacharya%2C+Sanmitra&rft.date=2024-05-29&rft_id=info:doi/10.48550%2Farxiv.2405.19164&rft.externalDocID=2405_19164