Algorithms and Corpora for Persian Plagiarism Detection Overview of PAN at FIRE 2016

The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead...

Full description

Saved in:
Bibliographic Details
Published inText Processing pp. 61 - 79
Main Authors Asghari, Habibollah, Mohtaj, Salar, Fatemi, Omid, Faili, Heshaam, Rosso, Paolo, Potthast, Martin
Format Book Chapter
LanguageEnglish
Published Cham Springer International Publishing 04.01.2018
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus. The goal of this shared task is to bring together researchers and practitioners around the exciting topic of plagiarism detection and text-reuse detection. We report on the outcome of the shared task, which divides into two subtasks: text alignment and corpus construction. In the first subtask, nine teams participated, whereas the best result achieved was a PlagDet score of 0.92. For the second subtask of corpus construction, five teams submitted a corpus, which were evaluated using the systems submitted for the first subtask. The results show that significant challenges remain in evaluating newly constructed corpora.
AbstractList The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus. The goal of this shared task is to bring together researchers and practitioners around the exciting topic of plagiarism detection and text-reuse detection. We report on the outcome of the shared task, which divides into two subtasks: text alignment and corpus construction. In the first subtask, nine teams participated, whereas the best result achieved was a PlagDet score of 0.92. For the second subtask of corpus construction, five teams submitted a corpus, which were evaluated using the systems submitted for the first subtask. The results show that significant challenges remain in evaluating newly constructed corpora.
Author Mohtaj, Salar
Fatemi, Omid
Rosso, Paolo
Faili, Heshaam
Potthast, Martin
Asghari, Habibollah
Author_xml – sequence: 1
  givenname: Habibollah
  surname: Asghari
  fullname: Asghari, Habibollah
  email: habib.asghari@ictrc.ac.ir
  organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
– sequence: 2
  givenname: Salar
  surname: Mohtaj
  fullname: Mohtaj, Salar
  email: salar.mohtaj@ictrc.ac.ir
  organization: ICT Research Institute, Academic Center for Education, Culture and Research (ACECR), Tehran, Iran
– sequence: 3
  givenname: Omid
  orcidid: 0000-0001-9654-0607
  surname: Fatemi
  fullname: Fatemi, Omid
  email: omid@fatemi.net
  organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
– sequence: 4
  givenname: Heshaam
  surname: Faili
  fullname: Faili, Heshaam
  email: hfaili@ut.ac.ir
  organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
– sequence: 5
  givenname: Paolo
  surname: Rosso
  fullname: Rosso, Paolo
  email: prosso@dsic.upv.es
  organization: PRHLT Research Center, Universitat Politècnica de València, Valencia, Spain
– sequence: 6
  givenname: Martin
  surname: Potthast
  fullname: Potthast, Martin
  email: martin.potthast@uni-weimar.de
  organization: Bauhaus-Universität Weimar, Weimar, Germany
BookMark eNpFkNtKAzEYhKNWcLf6BN7sC0T_P4dNclnqEQr2QsG7kKbJutpuSrLvj2sVvBpmBgbmq8lsSEMg5BrhBgHUrVGacsrRUMVbaKm28oTUfAqO_v2UVNgiUs6FOfsvJM5IBRwYNUrwC1KX8gkATBlWEbXYdSn348e-NG7YNsuUDym7JqbcrEMuvRua9c51vct92Td3YQx-7NNwSc6j25Vw9adz8vZw_7p8oquXx-flYkULIpMUt0wbhtIzbINEyRFiRO0VMA9ReMBWa-YEOBONwsi0RCF8NCK6uPGBzwn-7pZD7ocuZLtJ6atYBPvDxE5MLLfTU3tkYCcm_BshlVDW
ContentType Book Chapter
Copyright Springer International Publishing AG 2018
Copyright_xml – notice: Springer International Publishing AG 2018
DOI 10.1007/978-3-319-73606-8_5
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 331973606X
9783319736068
EISSN 1611-3349
Editor Mehta, Parth
Majumder, Prasenjit
Mitra, Mandar
Sankhavara, Jainisha
Editor_xml – sequence: 1
  givenname: Prasenjit
  surname: Majumder
  fullname: Majumder, Prasenjit
  email: prasenjt.majumder@gmail.com
– sequence: 2
  givenname: Mandar
  surname: Mitra
  fullname: Mitra, Mandar
  email: mandar.mitra@gmail.com
– sequence: 3
  givenname: Parth
  orcidid: 0000-0002-4509-1298
  surname: Mehta
  fullname: Mehta, Parth
  email: parth.mehta126@gmail.com
– sequence: 4
  givenname: Jainisha
  surname: Sankhavara
  fullname: Sankhavara, Jainisha
  email: jainisha.sankhavara@gmail.com
EndPage 79
GroupedDBID -DT
-~X
29L
2HA
2HV
ACGFS
ADCXD
ALMA_UNASSIGNED_HOLDINGS
EJD
F5P
LAS
LDH
P2P
RSU
~02
ID FETCH-LOGICAL-s1125-1d289215c216e515310ff18c702c0f4c016882a40a9f971f285144cf94fafbce3
ISBN 3319736051
9783319736051
ISSN 0302-9743
IngestDate Tue Jul 29 20:11:20 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s1125-1d289215c216e515310ff18c702c0f4c016882a40a9f971f285144cf94fafbce3
ORCID 0000-0001-9654-0607
PageCount 19
ParticipantIDs springer_books_10_1007_978_3_319_73606_8_5
PublicationCentury 2000
PublicationDate 20180104
PublicationDateYYYYMMDD 2018-01-04
PublicationDate_xml – month: 01
  year: 2018
  text: 20180104
  day: 04
PublicationDecade 2010
PublicationPlace Cham
PublicationPlace_xml – name: Cham
PublicationSeriesSubtitle Information Systems and Applications, incl. Internet/Web, and HCI
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle FIRE 2016 International Workshop, Kolkata, India, December 7–10, 2016, Revised Selected Papers
PublicationTitle Text Processing
PublicationYear 2018
Publisher Springer International Publishing
Publisher_xml – name: Springer International Publishing
RelatedPersons Kleinberg, Jon M.
Mattern, Friedemann
Naor, Moni
Mitchell, John C.
Terzopoulos, Demetri
Steffen, Bernhard
Pandu Rangan, C.
Kanade, Takeo
Kittler, Josef
Weikum, Gerhard
Hutchison, David
Tygar, Doug
RelatedPersons_xml – sequence: 1
  givenname: David
  surname: Hutchison
  fullname: Hutchison, David
  organization: Lancaster University, Lancaster, United Kingdom
– sequence: 2
  givenname: Takeo
  surname: Kanade
  fullname: Kanade, Takeo
  organization: Carnegie Mellon University, Pittsburgh, USA
– sequence: 3
  givenname: Josef
  surname: Kittler
  fullname: Kittler, Josef
  organization: University of Surrey, Guildford, United Kingdom
– sequence: 4
  givenname: Jon M.
  surname: Kleinberg
  fullname: Kleinberg, Jon M.
  organization: Cornell University, Ithaca, USA
– sequence: 5
  givenname: Friedemann
  surname: Mattern
  fullname: Mattern, Friedemann
  organization: ETH Zurich, Zurich, Switzerland
– sequence: 6
  givenname: John C.
  surname: Mitchell
  fullname: Mitchell, John C.
  organization: Stanford University, Stanford, USA
– sequence: 7
  givenname: Moni
  surname: Naor
  fullname: Naor, Moni
  organization: Weizmann Institute of Science, Rehovot, Israel
– sequence: 8
  givenname: C.
  surname: Pandu Rangan
  fullname: Pandu Rangan, C.
  organization: Indian Institute of Technology, Chennai, India
– sequence: 9
  givenname: Bernhard
  surname: Steffen
  fullname: Steffen, Bernhard
  organization: TU Dortmund University, Dortmund, Germany
– sequence: 10
  givenname: Demetri
  surname: Terzopoulos
  fullname: Terzopoulos, Demetri
  organization: University of California, Los Angeles, USA
– sequence: 11
  givenname: Doug
  surname: Tygar
  fullname: Tygar, Doug
  organization: University of California, Berkeley, USA
– sequence: 12
  givenname: Gerhard
  surname: Weikum
  fullname: Weikum, Gerhard
  organization: Max Planck Institute for Informatics, Saarbrücken, Germany
SSID ssj0002792
ssj0001987138
Score 1.557318
Snippet The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the...
SourceID springer
SourceType Publisher
StartPage 61
SubjectTerms Evaluation framework
Persian PlagDet
Plagiarism detection
Shared task
TIRA platform
Subtitle Overview of PAN at FIRE 2016
Title Algorithms and Corpora for Persian Plagiarism Detection
URI http://link.springer.com/10.1007/978-3-319-73606-8_5
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pb9MwFLZYuSAO46dgMOQDJ1BQ3DhJc-AwoU3VNAaHDe0WOY7dFjWttIYLfz3fs52ka3cZlyi1miZ9n_P8fn5m7GOtdAKjLY_qKleRNMJEKjNpREkjq2ITZ8axfV5m02t5fpPeDMEc113SVl_033v7Sv4HVYwBV-qSfQCy_Y9iAOfAF0cgjOOO8Xs3zOoBhlrt6vy79YeQ28zmyrePT1W1qAjnPuT7fT1v1W8fCV6qvi73DAZn4y750XQV7jS68L3TU0oLqWZ7dp0sZ-vbRTtvPMVz4EN2NYtUU09a4-dSzRa0xWEDpda6ii8_A0g0ZvP1ImQvLtetKwr73G0w0emb7YCEmLiAxH5AciekOUTV7niwCVRAnsCnEluKL4GWhp_jFZ_xijkjusXE05sGZetZ3MOy7bek2VsQtmtAqF-L7pVFkzI9YAf5JB2xxyen5xe_hrBcAQ8yGQjniV_RJ6L8I1F7UP_InsBp-NyzWnni4p077uXanQlz9Yw9pbYWTv0mENhz9sisXrDDTuY8yPwlywdoOaDlAVoOaHmAlg_Q8h7aV-z67PTq2zQKu2tEG9jYaSRq-Now-PRY4OXEwidia8VE5_FYx1Zq-ALwvpSMVWGLXNgxbHMptS2kVbbSJnnNRqv1yrxhHC5pXlcmS43OpBVKwS2XdV2NDexLWci37FP3z0t6XzZlR5YNMZVJCTGVTkwlxHT0kC-_Y0-GKfiejdrbP-YYVmJbfQjI_gOVmV4v
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Text+Processing&rft.au=Asghari%2C+Habibollah&rft.au=Mohtaj%2C+Salar&rft.au=Fatemi%2C+Omid&rft.au=Faili%2C+Heshaam&rft.atitle=Algorithms+and+Corpora+for+Persian+Plagiarism+Detection&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2018-01-04&rft.pub=Springer+International+Publishing&rft.isbn=9783319736051&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=61&rft.epage=79&rft_id=info:doi/10.1007%2F978-3-319-73606-8_5
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon