Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification

K-mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k-mers. These were obtained by generating the possible combi...

Full description

Saved in:
Bibliographic Details
Published inJournal of ICT Research and Applications Vol. 12; no. 2; p. 123
Main Authors Pekuwali, Arini, Kusuma, Wisnu Ananta, Buono, Agus
Format Journal Article
LanguageEnglish
Published ITB Journal Publisher 01.01.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:K-mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k-mers. These were obtained by generating the possible combinations of match positions and don't care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k-mers could reduce the size of the k-mer frequency feature's dimension. To measure the accuracy of the proposed method we used the naïve Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k-mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k-mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.
ISSN:2337-5787
2338-5499
DOI:10.5614/itbj.ict.res.appl.2018.12.2.2