Exploring Textual Features for Multi-label Classification of Portuguese Film Synopses

The multi-label classification of film genres by using features extracted from their synopses has recently gained some attention from the scientific community, however, the number of studies is still limited. These studies are even scarcer for languages other than English. In this work we present th...

Full description

Saved in:

Bibliographic Details
Published in	Progress in Artificial Intelligence Vol. 11805; pp. 669 - 681
Main Authors	Portolese, Giuseppe, Domingues, Marcos Aurélio, Feltrim, Valéria Delisandra
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2019 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Film genre Multi-label classification Natural Language Processing Textual features
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The multi-label classification of film genres by using features extracted from their synopses has recently gained some attention from the scientific community, however, the number of studies is still limited. These studies are even scarcer for languages other than English. In this work we present the P-TMDb dataset, which contains 13, 394 Portuguese film synopses, and explore the film genre classification by experimenting with nine different groups of textual features and four multi-label algorithms. As our dataset is unbalanced, we also conducted experiments with an oversampled version of the dataset. The best result obtained for the original dataset was achieved by a TF-IDF based classifier, presenting an average F1 score of 0.478, while the best result for the oversampled dataset was achieved by a combination of several feature groups and presented an average F1 score of 0.611.
ISBN:	9783030302436 3030302431
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-30244-3_55