Identifying biological terms from text by support vector machine

In contemporary society, an increasing number of people are involved in the biomedical research. However there is still a large amount of biological knowledge in the various unstructured documents so that it is difficult to analyze biological data. How to identify biological terms effectively from t...

Full description

Saved in:

Bibliographic Details
Published in	2011 6th IEEE Conference on Industrial Electronics and Applications pp. 455 - 458
Main Authors	Zhenfei Ju, Meichen Zhou, Fei Zhu
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2011
Subjects	Biological Data Mining Biological Terms Identification Biology Conferences Data mining Hidden Markov models Machine Learning Support Vector Machine Support vector machines Testing Training
Online Access	Get full text
ISBN	9781424487547 1424487544
ISSN	2156-2318
DOI	10.1109/ICIEA.2011.5975627

Cover

Loading…

More Information
Summary:	In contemporary society, an increasing number of people are involved in the biomedical research. However there is still a large amount of biological knowledge in the various unstructured documents so that it is difficult to analyze biological data. How to identify biological terms effectively from text is one of the important problems in the area of bioinformatics. Nowadays the precision of the best biological terms identification system has reached more than 80%, but is lower than the one of general system. Here we aim to recognize the name of the specified type from biological data set. We choose support vector machine (SVM) to do the work. With the help of GENIA corpus which is a collection of Medline abstracts, we get the precision rate= 84% and recall rate=81% in total for the two categories classification problem. When meeting the multiple categories classification problem, SVM can identify biological terms accurately, but the recall rate is very low. The increasing number of test data will not result in a decrease of precision, and the recall rate will increase.
ISBN:	9781424487547 1424487544
ISSN:	2156-2318
DOI:	10.1109/ICIEA.2011.5975627