Tuning Out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature

ABSTRACT Named Entity Recognition (NER), the automated identification and tagging of entities in text, is a popular natural language processing task, and has the power to transform restricted data into open datasets of entities for further research. This project benchmarks four NER models–Stanford N...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the Association for Information Science and Technology Vol. 60; no. 1; pp. 681 - 685
Main Authors Parulian, Nikolaus Nova, Dubnicek, Ryan, Evans, Daniel J., Hu, Yuerong, Layne‐Worthey, Glen, Downie, J. Stephen, Heaton, Raina, Lu, Kun, Orr, Raymond I., Magni, Isabella, Walsh, John A.
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 01.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:ABSTRACT Named Entity Recognition (NER), the automated identification and tagging of entities in text, is a popular natural language processing task, and has the power to transform restricted data into open datasets of entities for further research. This project benchmarks four NER models–Stanford NER, BookNLP, spaCy‐trf and RoBERTa–to identify the most accurate approach and generate an open‐access, gold‐standard dataset of human annotated entities. To meet a real‐world use case, we benchmark these models on a sample dataset of sentences from Native American authored literature, identifying edge cases and areas of improvement for future NER work.
ISSN:2373-9231
2373-9231
DOI:10.1002/pra2.839