Algorithms and Corpora for Persian Plagiarism Detection Overview of PAN at FIRE 2016
The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead...
Saved in:
Published in | Text Processing pp. 61 - 79 |
---|---|
Main Authors | , , , , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer International Publishing
04.01.2018
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus. The goal of this shared task is to bring together researchers and practitioners around the exciting topic of plagiarism detection and text-reuse detection. We report on the outcome of the shared task, which divides into two subtasks: text alignment and corpus construction. In the first subtask, nine teams participated, whereas the best result achieved was a PlagDet score of 0.92. For the second subtask of corpus construction, five teams submitted a corpus, which were evaluated using the systems submitted for the first subtask. The results show that significant challenges remain in evaluating newly constructed corpora. |
---|---|
AbstractList | The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus. The goal of this shared task is to bring together researchers and practitioners around the exciting topic of plagiarism detection and text-reuse detection. We report on the outcome of the shared task, which divides into two subtasks: text alignment and corpus construction. In the first subtask, nine teams participated, whereas the best result achieved was a PlagDet score of 0.92. For the second subtask of corpus construction, five teams submitted a corpus, which were evaluated using the systems submitted for the first subtask. The results show that significant challenges remain in evaluating newly constructed corpora. |
Author | Mohtaj, Salar Fatemi, Omid Rosso, Paolo Faili, Heshaam Potthast, Martin Asghari, Habibollah |
Author_xml | – sequence: 1 givenname: Habibollah surname: Asghari fullname: Asghari, Habibollah email: habib.asghari@ictrc.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 2 givenname: Salar surname: Mohtaj fullname: Mohtaj, Salar email: salar.mohtaj@ictrc.ac.ir organization: ICT Research Institute, Academic Center for Education, Culture and Research (ACECR), Tehran, Iran – sequence: 3 givenname: Omid orcidid: 0000-0001-9654-0607 surname: Fatemi fullname: Fatemi, Omid email: omid@fatemi.net organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 4 givenname: Heshaam surname: Faili fullname: Faili, Heshaam email: hfaili@ut.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 5 givenname: Paolo surname: Rosso fullname: Rosso, Paolo email: prosso@dsic.upv.es organization: PRHLT Research Center, Universitat Politècnica de València, Valencia, Spain – sequence: 6 givenname: Martin surname: Potthast fullname: Potthast, Martin email: martin.potthast@uni-weimar.de organization: Bauhaus-Universität Weimar, Weimar, Germany |
BookMark | eNpFkNtKAzEYhKNWcLf6BN7sC0T_P4dNclnqEQr2QsG7kKbJutpuSrLvj2sVvBpmBgbmq8lsSEMg5BrhBgHUrVGacsrRUMVbaKm28oTUfAqO_v2UVNgiUs6FOfsvJM5IBRwYNUrwC1KX8gkATBlWEbXYdSn348e-NG7YNsuUDym7JqbcrEMuvRua9c51vct92Td3YQx-7NNwSc6j25Vw9adz8vZw_7p8oquXx-flYkULIpMUt0wbhtIzbINEyRFiRO0VMA9ReMBWa-YEOBONwsi0RCF8NCK6uPGBzwn-7pZD7ocuZLtJ6atYBPvDxE5MLLfTU3tkYCcm_BshlVDW |
ContentType | Book Chapter |
Copyright | Springer International Publishing AG 2018 |
Copyright_xml | – notice: Springer International Publishing AG 2018 |
DOI | 10.1007/978-3-319-73606-8_5 |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 331973606X 9783319736068 |
EISSN | 1611-3349 |
Editor | Mehta, Parth Majumder, Prasenjit Mitra, Mandar Sankhavara, Jainisha |
Editor_xml | – sequence: 1 givenname: Prasenjit surname: Majumder fullname: Majumder, Prasenjit email: prasenjt.majumder@gmail.com – sequence: 2 givenname: Mandar surname: Mitra fullname: Mitra, Mandar email: mandar.mitra@gmail.com – sequence: 3 givenname: Parth orcidid: 0000-0002-4509-1298 surname: Mehta fullname: Mehta, Parth email: parth.mehta126@gmail.com – sequence: 4 givenname: Jainisha surname: Sankhavara fullname: Sankhavara, Jainisha email: jainisha.sankhavara@gmail.com |
EndPage | 79 |
GroupedDBID | -DT -~X 29L 2HA 2HV ACGFS ADCXD ALMA_UNASSIGNED_HOLDINGS EJD F5P LAS LDH P2P RSU ~02 |
ID | FETCH-LOGICAL-s1125-1d289215c216e515310ff18c702c0f4c016882a40a9f971f285144cf94fafbce3 |
ISBN | 3319736051 9783319736051 |
ISSN | 0302-9743 |
IngestDate | Tue Jul 29 20:11:20 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-s1125-1d289215c216e515310ff18c702c0f4c016882a40a9f971f285144cf94fafbce3 |
ORCID | 0000-0001-9654-0607 |
PageCount | 19 |
ParticipantIDs | springer_books_10_1007_978_3_319_73606_8_5 |
PublicationCentury | 2000 |
PublicationDate | 20180104 |
PublicationDateYYYYMMDD | 2018-01-04 |
PublicationDate_xml | – month: 01 year: 2018 text: 20180104 day: 04 |
PublicationDecade | 2010 |
PublicationPlace | Cham |
PublicationPlace_xml | – name: Cham |
PublicationSeriesSubtitle | Information Systems and Applications, incl. Internet/Web, and HCI |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | FIRE 2016 International Workshop, Kolkata, India, December 7–10, 2016, Revised Selected Papers |
PublicationTitle | Text Processing |
PublicationYear | 2018 |
Publisher | Springer International Publishing |
Publisher_xml | – name: Springer International Publishing |
RelatedPersons | Kleinberg, Jon M. Mattern, Friedemann Naor, Moni Mitchell, John C. Terzopoulos, Demetri Steffen, Bernhard Pandu Rangan, C. Kanade, Takeo Kittler, Josef Weikum, Gerhard Hutchison, David Tygar, Doug |
RelatedPersons_xml | – sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David organization: Lancaster University, Lancaster, United Kingdom – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo organization: Carnegie Mellon University, Pittsburgh, USA – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef organization: University of Surrey, Guildford, United Kingdom – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. organization: Cornell University, Ithaca, USA – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann organization: ETH Zurich, Zurich, Switzerland – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. organization: Stanford University, Stanford, USA – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni organization: Weizmann Institute of Science, Rehovot, Israel – sequence: 8 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. organization: Indian Institute of Technology, Chennai, India – sequence: 9 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard organization: TU Dortmund University, Dortmund, Germany – sequence: 10 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri organization: University of California, Los Angeles, USA – sequence: 11 givenname: Doug surname: Tygar fullname: Tygar, Doug organization: University of California, Berkeley, USA – sequence: 12 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard organization: Max Planck Institute for Informatics, Saarbrücken, Germany |
SSID | ssj0002792 ssj0001987138 |
Score | 1.557318 |
Snippet | The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the... |
SourceID | springer |
SourceType | Publisher |
StartPage | 61 |
SubjectTerms | Evaluation framework Persian PlagDet Plagiarism detection Shared task TIRA platform |
Subtitle | Overview of PAN at FIRE 2016 |
Title | Algorithms and Corpora for Persian Plagiarism Detection |
URI | http://link.springer.com/10.1007/978-3-319-73606-8_5 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pb9MwFLZYuSAO46dgMOQDJ1BQ3DhJc-AwoU3VNAaHDe0WOY7dFjWttIYLfz3fs52ka3cZlyi1miZ9n_P8fn5m7GOtdAKjLY_qKleRNMJEKjNpREkjq2ITZ8axfV5m02t5fpPeDMEc113SVl_033v7Sv4HVYwBV-qSfQCy_Y9iAOfAF0cgjOOO8Xs3zOoBhlrt6vy79YeQ28zmyrePT1W1qAjnPuT7fT1v1W8fCV6qvi73DAZn4y750XQV7jS68L3TU0oLqWZ7dp0sZ-vbRTtvPMVz4EN2NYtUU09a4-dSzRa0xWEDpda6ii8_A0g0ZvP1ImQvLtetKwr73G0w0emb7YCEmLiAxH5AciekOUTV7niwCVRAnsCnEluKL4GWhp_jFZ_xijkjusXE05sGZetZ3MOy7bek2VsQtmtAqF-L7pVFkzI9YAf5JB2xxyen5xe_hrBcAQ8yGQjniV_RJ6L8I1F7UP_InsBp-NyzWnni4p077uXanQlz9Yw9pbYWTv0mENhz9sisXrDDTuY8yPwlywdoOaDlAVoOaHmAlg_Q8h7aV-z67PTq2zQKu2tEG9jYaSRq-Now-PRY4OXEwidia8VE5_FYx1Zq-ALwvpSMVWGLXNgxbHMptS2kVbbSJnnNRqv1yrxhHC5pXlcmS43OpBVKwS2XdV2NDexLWci37FP3z0t6XzZlR5YNMZVJCTGVTkwlxHT0kC-_Y0-GKfiejdrbP-YYVmJbfQjI_gOVmV4v |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Text+Processing&rft.au=Asghari%2C+Habibollah&rft.au=Mohtaj%2C+Salar&rft.au=Fatemi%2C+Omid&rft.au=Faili%2C+Heshaam&rft.atitle=Algorithms+and+Corpora+for+Persian+Plagiarism+Detection&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2018-01-04&rft.pub=Springer+International+Publishing&rft.isbn=9783319736051&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=61&rft.epage=79&rft_id=info:doi/10.1007%2F978-3-319-73606-8_5 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon |