Implementation of K-Nearest Neighbor with Cosine Similarity for Classification Abstract International Journal of Computer Science

The spread of international journals or scientific articles as research supporting material is increasing. This makes the number of available journal documents make it more difficult to find and present journals. Text processing techniques are needed that can categorize large amounts of text documen...

Full description

Saved in:

Bibliographic Details
Published in	2018 International Conference on Information Technology Systems and Innovation (ICITSI) pp. 43 - 48
Main Authors	Nursalman, Muhammad, Kusnendar, Jajang, Fadhila, Ulva Fatma
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2018
Subjects	classification Classification algorithms Computer science Computer security cosine similarity Data models holdout method k-nearest neighbor ten-fold cross-validation Testing Training Training data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The spread of international journals or scientific articles as research supporting material is increasing. This makes the number of available journal documents make it more difficult to find and present journals. Text processing techniques are needed that can categorize large amounts of text documents according to their type, the information available can be accessed properly and easily accessed according to user needs. One of the problems solving in categorizing text documents can be solved by using a text mining method that is classification. This research uses the K-Nearest Neighbor classification algorithm with cosine similarity to categorize the maximum journal document. In general, this research conducted four stages. The initial stage is a process of preprocessing data consisting of case folding, character removal, tokenizing and stopwords removal. The second stage performs TF-IDF weighting on each term and splits a dataset of 450 journals for data sharing training and data testing. The splitting method used is the data holdout method and ten-fold cross-validation. The third stage is modeling the classification using cosine similarity, and K-Nearest Neighbor with the value of k used is 3,6,7,9. There are three abstract categories including Computer and Education, Computer and Security and Computer in Human Behavior. The last stage of analyzing the results of the classification performed. The result shows 10 fold cross validation method can give better result at k = 9 with precision value 80,18% recall 51,11% and f1-measure 62,42%.
DOI:	10.1109/ICITSI.2018.8696072