BiLSTM-CRF for geological named entity recognition from the geoscience literature

Many detailed geoscience reports lie unused, offering both challenges and opportunities for information extraction. In geoscience research, geological named entity recognition (GNER) is an important task in the field of geoscience information extraction. Regarding numerical geoscience data, research...

Full description

Saved in:
Bibliographic Details
Published inEarth science informatics Vol. 12; no. 4; pp. 565 - 579
Main Authors Qiu, Qinjun, Xie, Zhong, Wu, Liang, Tao, Liufeng, Li, Wenjia
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.12.2019
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Many detailed geoscience reports lie unused, offering both challenges and opportunities for information extraction. In geoscience research, geological named entity recognition (GNER) is an important task in the field of geoscience information extraction. Regarding numerical geoscience data, research on information extraction remains limited. Most conventional NER approaches are heavily dependent on feature engineering, and such sentence-level-based methods suffer from the tagging inconsistency problem. Based on the above observations, this paper proposes a neural network approach, namely, attention-based bidirectional long short-term memory with a conditional random field layer (Att-BiLSTM-CRF), for name entity recognition to extract information entities describing geoscience information from geoscience reports. This approach leverages global information learned from an attention mechanism to enforce tagging consistency across multiple instances of the same token in a document. Experiments on the constructed dataset show that our method achieves comparable performance to that of other state-of-the-art systems. Additionally, our method achieved an average F1 score of 91.47% in the NER extraction task.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1865-0473
1865-0481
DOI:10.1007/s12145-019-00390-3