Exploring Textual Features for Multi-label Classification of Portuguese Film Synopses

The multi-label classification of film genres by using features extracted from their synopses has recently gained some attention from the scientific community, however, the number of studies is still limited. These studies are even scarcer for languages other than English. In this work we present th...

Full description

Saved in:
Bibliographic Details
Published inProgress in Artificial Intelligence Vol. 11805; pp. 669 - 681
Main Authors Portolese, Giuseppe, Domingues, Marcos Aurélio, Feltrim, Valéria Delisandra
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2019
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The multi-label classification of film genres by using features extracted from their synopses has recently gained some attention from the scientific community, however, the number of studies is still limited. These studies are even scarcer for languages other than English. In this work we present the P-TMDb dataset, which contains 13, 394 Portuguese film synopses, and explore the film genre classification by experimenting with nine different groups of textual features and four multi-label algorithms. As our dataset is unbalanced, we also conducted experiments with an oversampled version of the dataset. The best result obtained for the original dataset was achieved by a TF-IDF based classifier, presenting an average F1 score of 0.478, while the best result for the oversampled dataset was achieved by a combination of several feature groups and presented an average F1 score of 0.611.
ISBN:9783030302436
3030302431
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-30244-3_55