Comparison of document similarity measurements in scientific writing using Jaro-Winkler Distance method and Paragraph Vector method
The purpose of this research is to study the methods of measuring the similarity of documents and tell us which is the most suitable for Indonesian Scientific Writing. This research method used was Jaro-Winkler Distance as method. Jaro-Winkler is a method that calculates the distance between strings...
Saved in:
Published in | IOP conference series. Materials Science and Engineering Vol. 662; no. 5; pp. 52016 - 52024 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Bristol
IOP Publishing
01.11.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The purpose of this research is to study the methods of measuring the similarity of documents and tell us which is the most suitable for Indonesian Scientific Writing. This research method used was Jaro-Winkler Distance as method. Jaro-Winkler is a method that calculates the distance between strings and then measures the similarity. Doc2Vec (Paragraph Vector) is a method that aims to represent documents in vector form for comparison with the machine learning process. The results of this study compare the results of plagiarism detection between the Jaro-Winkler Distance method and the Doc2Vec method. The best measurement comparison method used is the accuracy of the comparison of documents and their speed. Using the dataset created, Doc2Vec outperformed the Jaro-Winkler Distance algorithm in comparing document similarities. Therefore, the development of a document similarity method will be easier in the future by using Doc2Vec (Paragraph Vector) in Indonesian scientific works. |
---|---|
ISSN: | 1757-8981 1757-899X |
DOI: | 10.1088/1757-899X/662/5/052016 |