Algorithms and Corpora for Persian Plagiarism Detection Overview of PAN at FIRE 2016

The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead...

Full description

Saved in:

Bibliographic Details
Published in	Text Processing pp. 61 - 79
Main Authors	Asghari, Habibollah, Mohtaj, Salar, Fatemi, Omid, Faili, Heshaam, Rosso, Paolo, Potthast, Martin
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing 04.01.2018
Series	Lecture Notes in Computer Science
Subjects	Evaluation framework Persian PlagDet Plagiarism detection Shared task TIRA platform
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus. The goal of this shared task is to bring together researchers and practitioners around the exciting topic of plagiarism detection and text-reuse detection. We report on the outcome of the shared task, which divides into two subtasks: text alignment and corpus construction. In the first subtask, nine teams participated, whereas the best result achieved was a PlagDet score of 0.92. For the second subtask of corpus construction, five teams submitted a corpus, which were evaluated using the systems submitted for the first subtask. The results show that significant challenges remain in evaluating newly constructed corpora.
ISBN:	3319736051 9783319736051
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-73606-8_5