Exploring Textual Features for Multi-label Classification of Portuguese Film Synopses
The multi-label classification of film genres by using features extracted from their synopses has recently gained some attention from the scientific community, however, the number of studies is still limited. These studies are even scarcer for languages other than English. In this work we present th...
Saved in:
Published in | Progress in Artificial Intelligence Vol. 11805; pp. 669 - 681 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2019
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The multi-label classification of film genres by using features extracted from their synopses has recently gained some attention from the scientific community, however, the number of studies is still limited. These studies are even scarcer for languages other than English. In this work we present the P-TMDb dataset, which contains 13, 394 Portuguese film synopses, and explore the film genre classification by experimenting with nine different groups of textual features and four multi-label algorithms. As our dataset is unbalanced, we also conducted experiments with an oversampled version of the dataset. The best result obtained for the original dataset was achieved by a TF-IDF based classifier, presenting an average F1 score of 0.478, while the best result for the oversampled dataset was achieved by a combination of several feature groups and presented an average F1 score of 0.611. |
---|---|
ISBN: | 9783030302436 3030302431 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-030-30244-3_55 |