Named Entity Recognition for Public Interest Litigation Based on a Deep Contextualized Pretraining Approach

The named entity recognition (NER) in the field of public interest litigation can assist prosecutors in handling cases and provide them with specific entities in making legal documents. Previously, the context-free deep learning model is used to catch the semantic comprehension, in which the static...

Full description

Saved in:

Bibliographic Details
Published in	Scientific programming Vol. 2022; pp. 1 - 14
Main Authors	Dong, Hongsong, Kong, Yuehui, Gao, Wenlian, Liu, Jihua
Format	Journal Article
Language	English
Published	New York Hindawi 11.10.2022 Hindawi Limited
Subjects	Artificial intelligence Coders Conditional random fields Context Court decisions Criminal law Datasets Deep learning Dictionaries Environmental protection Litigation Machine learning Names Neural networks Recognition Segmentation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The named entity recognition (NER) in the field of public interest litigation can assist prosecutors in handling cases and provide them with specific entities in making legal documents. Previously, the context-free deep learning model is used to catch the semantic comprehension, in which the static word vector is obtained without considering the context. Moreover, this kind of method relies on word segmentation technology and cannot solve the error transmission caused by word segmentation inaccuracy, which brings great challenges to the Chinese NER task. To tackle the above issues, an entity recognition method based on pretraining is proposed. First, based on the basic entities, three legal ontologies, NERP, NERCGP, and NERFPP are developed to expand the named entity recognition corpus in the judicial field. Second, a variant of the pretrained model BERT (Bidirectional Encoder Representations from Transformer) called BERT-WWM (whole-word mask)-EXT(extra) is introduced to catch the text character-level word vector hierarchical and the context bidirectional features, which effectively solve the problem of task boundary division of named entities. Then, to further improve the model recognition effect, the general knowledge learned from the pretrained model is used to fit the downstream neural network BiLSTM (bi-long short-term memory), and at the end of the architecture, CRF (conditional random fields) is introduced to restrict the label relationship. Finally, the experimental results show that the proposed method is more effective than the existing methods, which reach 96% and 90% in the F1 index of NER and NERP entities, respectively.
ISSN:	1058-9244 1875-919X
DOI:	10.1155/2022/7682373