Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Regression Applied to Semantic Textual Similarity

Semantic textual similarity(STS) is a common task in natural language processing(NLP). STS measures the degree of semantic equivalence of two textual snippets. Recently, machine learning methods have been applied to this task, including methods based on support vector regression(SVR). However, there...

Full description

Saved in:
Bibliographic Details
Published inShanghai jiao tong da xue xue bao Vol. 20; no. 2; pp. 143 - 148
Main Author 苏柏桦 王英林
Format Journal Article
LanguageEnglish
Published Heidelberg Shanghai Jiaotong University Press 01.04.2015
Subjects
Online AccessGet full text
ISSN1007-1172
1995-8188
DOI10.1007/s12204-015-1602-2

Cover

Loading…
More Information
Summary:Semantic textual similarity(STS) is a common task in natural language processing(NLP). STS measures the degree of semantic equivalence of two textual snippets. Recently, machine learning methods have been applied to this task, including methods based on support vector regression(SVR). However, there exist amounts of features involved in the learning process, part of which are noisy features and irrelative to the result.Furthermore, different parameters will significantly influence the prediction performance of the SVR model. In this paper, we propose genetic algorithm(GA) to select the effective features and optimize the parameters in the learning process, simultaneously. To evaluate the proposed approach, we adopt the STS-2012 dataset in the experiment. Compared with the grid search, the proposed GA-based approach has better regression performance.
Bibliography:31-1943/U
support vector regression(SVR),feature selection,semantic textural similarity(STS)
SU Bai-hua, WANG Ying-lin(1.Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200240, China ; 2. Department of Computer Science and Technology, Shanghai University of Finance and Economics, Shanghai 200433, China)
Semantic textual similarity(STS) is a common task in natural language processing(NLP). STS measures the degree of semantic equivalence of two textual snippets. Recently, machine learning methods have been applied to this task, including methods based on support vector regression(SVR). However, there exist amounts of features involved in the learning process, part of which are noisy features and irrelative to the result.Furthermore, different parameters will significantly influence the prediction performance of the SVR model. In this paper, we propose genetic algorithm(GA) to select the effective features and optimize the parameters in the learning process, simultaneously. To evaluate the proposed approach, we adopt the STS-2012 dataset in the experiment. Compared with the grid search, the proposed GA-based approach has better regression performance.
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1007-1172
1995-8188
DOI:10.1007/s12204-015-1602-2