Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery

Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and...

Full description

Saved in:

Bibliographic Details
Main Authors	Lahiri, Sounak, Pai, Sumit, Weninger, Tim, Bhattacharya, Sanmitra
Format	Journal Article
Language	English
Published	29.05.2024
Subjects	Computer Science - Artificial Intelligence Computer Science - Information Retrieval
Online Access	Get full text

Cover

Loading…

Abstract	Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and cost-effectiveness. Although traditional approaches like BM25 or fine-tuned pre-trained models are common in eDiscovery, they face performance, computational, and interpretability challenges. In contrast, Large Language Model (LLM)-based methods prioritize interpretability but sacrifice performance and throughput. This paper introduces DISCOvery Graph (DISCOG), a hybrid approach that combines the strengths of two worlds: a heterogeneous graph-based method for accurate document relevance prediction and subsequent LLM-driven approach for reasoning. Graph representational learning generates embeddings and predicts links, ranking the corpus for a given request, and the LLMs provide reasoning for document relevance. Our approach handles datasets with balanced and imbalanced distributions, outperforming baselines in F1-score, precision, and recall by an average of 12%, 3%, and 16%, respectively. In an enterprise context, our approach drastically reduces document review costs by 99.9% compared to manual processes and by 95% compared to LLM-based classification methods
AbstractList	Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and cost-effectiveness. Although traditional approaches like BM25 or fine-tuned pre-trained models are common in eDiscovery, they face performance, computational, and interpretability challenges. In contrast, Large Language Model (LLM)-based methods prioritize interpretability but sacrifice performance and throughput. This paper introduces DISCOvery Graph (DISCOG), a hybrid approach that combines the strengths of two worlds: a heterogeneous graph-based method for accurate document relevance prediction and subsequent LLM-driven approach for reasoning. Graph representational learning generates embeddings and predicts links, ranking the corpus for a given request, and the LLMs provide reasoning for document relevance. Our approach handles datasets with balanced and imbalanced distributions, outperforming baselines in F1-score, precision, and recall by an average of 12%, 3%, and 16%, respectively. In an enterprise context, our approach drastically reduces document review costs by 99.9% compared to manual processes and by 95% compared to LLM-based classification methods
Author	Pai, Sumit Weninger, Tim Lahiri, Sounak Bhattacharya, Sanmitra
Author_xml	– sequence: 1 givenname: Sounak surname: Lahiri fullname: Lahiri, Sounak – sequence: 2 givenname: Sumit surname: Pai fullname: Pai, Sumit – sequence: 3 givenname: Tim surname: Weninger fullname: Weninger, Tim – sequence: 4 givenname: Sanmitra surname: Bhattacharya fullname: Bhattacharya, Sanmitra
BackLink	https://doi.org/10.48550/arXiv.2405.19164$$DView paper in arXiv
BookMark	eNotj8tOwzAURL2ABRQ-gBX-gQS_67BDBQqSEVLpPrpJroul1q7sKKJ_DwRWI43mjHQuyVlMEQm54axWVmt2B_krTLVQTNe84UZdkA-HkGOIO-pzOlAXxrCDMaR4T9cZjp-FQhyoc2-F-pTpBscccIL9XG8QSprhECk-htKnCfPpipx72Be8_s8F2T4_bVcvlXtfv64eXAVmqSrjZQ99o5qB2V53YLkxFhA9sk5Y1g3ScsENLsXwM-s4eAnSaqGk4VyglQty-3c7W7XHHA6QT-2vXTvbyW-8ZUwE
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2405.19164
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2405_19164
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a674-6f3cac949d08c5ba81668aeefe0b280bd381216e72dac9b1af3a3852436112e83
IEDL.DBID	GOX
IngestDate	Fri May 31 12:10:23 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a674-6f3cac949d08c5ba81668aeefe0b280bd381216e72dac9b1af3a3852436112e83
OpenAccessLink	https://arxiv.org/abs/2405.19164
ParticipantIDs	arxiv_primary_2405_19164
PublicationCentury	2000
PublicationDate	2024-05-29
PublicationDateYYYYMMDD	2024-05-29
PublicationDate_xml	– month: 05 year: 2024 text: 2024-05-29 day: 29
PublicationDecade	2020
PublicationYear	2024
Score	1.9231014
SecondaryResourceType	preprint
Snippet	Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence Computer Science - Information Retrieval
Title	Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery
URI	https://arxiv.org/abs/2405.19164
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwELbaTiwIBKg85YHV4PgVhw0BbYVSkEqRulV-oiwpaloE_x7bCYKF1T4vZ0vfd7677wC4tE54nNsMcV8QxCzRqJBUIUy15cbzXPPYnDx9EpNX9rjgix6AP70wav1ZfbT6wLq5DnDDr0JEIVgf9AmJJVvj50WbnExSXJ39r13gmGnpD0iM9sBux-7gbXsd-6Dn6gPw0mmYvsHYywHLqpW1WNU3cBzlohsYonlYltMGBgYJZ2nGVXgAaXnmVJN-TGFVQ3dfNSbWXH4dgvnoYX43Qd0sA6REzpDw1ChTsMJiabhWMVsnlXPeYU0k1jYAJ8mEy4kNZjpTnioqOWFUBELkJD0Cg3pVuyGAXJGoumecopgZL2WGfYiabDhhJeb-GAyTB5bvrVzFMjpnmZxz8v_WKdghAa5jXpwUZ2CwWW_deYDbjb5IPv8Gul6AUg
link.rule.ids	228,230,780,885
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+from+Litigation%3A+Graphs+and+LLMs+for+Retrieval+and+Reasoning+in+eDiscovery&rft.au=Lahiri%2C+Sounak&rft.au=Pai%2C+Sumit&rft.au=Weninger%2C+Tim&rft.au=Bhattacharya%2C+Sanmitra&rft.date=2024-05-29&rft_id=info:doi/10.48550%2Farxiv.2405.19164&rft.externalDocID=2405_19164