New Arabic Medical Dataset for Diseases Classification

The Arabic language suffers from a great shortage of datasets suitable for training deep learning models, and the existing ones include general non-specialized classifications. In this work, we introduce a new Arab medical dataset, which includes two thousand medical documents collected from several...

Full description

Saved in:

Bibliographic Details
Published in	Intelligent Data Engineering and Automated Learning - IDEAL 2021 Vol. 13113; pp. 196 - 203
Main Authors	Hammoud, Jaafar, Vatian, Aleksandra, Dobrenko, Natalia, Vedernikov, Nikolai, Shalyto, Anatoly, Gusarova, Natalia
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2021 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Arabic Medical Text classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The Arabic language suffers from a great shortage of datasets suitable for training deep learning models, and the existing ones include general non-specialized classifications. In this work, we introduce a new Arab medical dataset, which includes two thousand medical documents collected from several Arabic medical websites, in addition to the Arab Medical Encyclopedia. The dataset was built for the task of classifying texts and includes 10 classes (Blood, Bone, Cardiovascular, Ear, Endocrine, Eye, Gastrointestinal, Immune, Liver and Nephrological) diseases. Experiments on the dataset were performed by fine-tuning three pre-trained models: BERT from Google, Arabert that based on BERT with large Arabic corpus, and AraBioNER that based on Arabert with Arabic medical corpus.
ISBN:	3030916073 9783030916077
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-91608-4_20