Learning adaptive representations for entity recognition in the biomedical domain

Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biomedical semantics Vol. 12; no. 1; p. 10
Main Authors	Lauriola, Ivano, Aiolli, Fabio, Lavelli, Alberto, Rinaldi, Fabio
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 17.05.2021 BioMed Central BMC
Subjects	Algorithms Computational linguistics Dictionaries Ensemble HIV Human immunodeficiency virus Kernel methods Knowledge representation Language Language processing Learning algorithms Machine learning Medical informatics Methods Named entity recognition Natural language interfaces Natural language processing Neural networks Semantic networks Support vector machines Italy Kernel methods Neural networks Ensemble Named entity recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task. This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F score. Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2041-1480 2041-1480
DOI:	10.1186/s13326-021-00238-0