Classification of Legal Documents in Portuguese Language Based on Summarization

Legal document classification in Portuguese language is a research area highly benefited by computational intelligence techniques as the availability of better processing with the easiness of digital text recording of juridic processes. Different techniques have been explored to achieve reliable res...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE Latin American Conference on Computational Intelligence (LA-CCI) pp. 1 - 6
Main Authors Medina, Marie Chantelle Cruz, Da Silva Oliveira, Lucas Matheus, Ferreira, Jean Felipe Coelho, Silva, Leandro Honorato De S., Rodrigues, Cleyton Mario O., De Oliveira, Joao Fausto L., Sobral, Paulo Christiano, Souza, Bruno, Feitosa, Dionizio, Fernandes, Bruno J. T.
Format Conference Proceeding
LanguageEnglish
Published IEEE 23.11.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Legal document classification in Portuguese language is a research area highly benefited by computational intelligence techniques as the availability of better processing with the easiness of digital text recording of juridic processes. Different techniques have been explored to achieve reliable results in real-world conditions; however, the most suitable configuration of methods remains to be an open problem. This study proposes a model consisting of four stages: preprocessing, extractive summarization using page rank algorithm, feature extraction with bag-of-words, and classification with Support Vector Classifier. Testing sessions were conducted using three versions of the model as a mean for comparison and evaluation. The first one was a basic classifier without preprocessing nor summarization stages, the second included preprocessing but not summarization, and the third one was an implementation of the complete proposed model. All three were evaluated using a separated set of examples falling into six different labeled categories and their performance was recorded calculating weighted average precision, recall, F1-score and accuracy values. The best performance obtained was the one presented by the proposed model with precision, recall and F-1 score values of 96% each, which represents a 2% improvement for all of them in comparison to the first version and a 1 % improvement for precision and recall in comparison to the second version. Specially F1-score pointing to the most balanced performance, the proposed model outperformed the versions of it itself excluding some stages, allowing to infer that preprocessing and extractive summarization have positive impacts in the text classification task for Portuguese-written legal documents.
DOI:10.1109/LA-CCI54402.2022.9981852