Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery
Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
29.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Electronic Discovery (eDiscovery) involves identifying relevant documents
from a vast collection based on legal production requests. The integration of
artificial intelligence (AI) and natural language processing (NLP) has
transformed this process, helping document review and enhance efficiency and
cost-effectiveness. Although traditional approaches like BM25 or fine-tuned
pre-trained models are common in eDiscovery, they face performance,
computational, and interpretability challenges. In contrast, Large Language
Model (LLM)-based methods prioritize interpretability but sacrifice performance
and throughput. This paper introduces DISCOvery Graph (DISCOG), a hybrid
approach that combines the strengths of two worlds: a heterogeneous graph-based
method for accurate document relevance prediction and subsequent LLM-driven
approach for reasoning. Graph representational learning generates embeddings
and predicts links, ranking the corpus for a given request, and the LLMs
provide reasoning for document relevance. Our approach handles datasets with
balanced and imbalanced distributions, outperforming baselines in F1-score,
precision, and recall by an average of 12%, 3%, and 16%, respectively. In an
enterprise context, our approach drastically reduces document review costs by
99.9% compared to manual processes and by 95% compared to LLM-based
classification methods |
---|---|
DOI: | 10.48550/arxiv.2405.19164 |