An N-Gram Based Method for Bengali Keyphrase Extraction
Keyphrases provide the subject metadata that gives the clues about the content of a document. In this paper, we present a new method for Bengali keyphrase extraction. The proposed method has several steps such as extraction of n-grams, identification of candidate keyphrases and assigning scores to t...
Saved in:
Published in | Information Systems for Indian Languages pp. 36 - 41 |
---|---|
Main Author | |
Format | Book Chapter |
Language | English |
Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
2011
|
Series | Communications in Computer and Information Science |
Subjects | |
Online Access | Get full text |
ISBN | 9783642194023 3642194028 |
ISSN | 1865-0929 1865-0937 |
DOI | 10.1007/978-3-642-19403-0_6 |
Cover
Summary: | Keyphrases provide the subject metadata that gives the clues about the content of a document. In this paper, we present a new method for Bengali keyphrase extraction. The proposed method has several steps such as extraction of n-grams, identification of candidate keyphrases and assigning scores to the candidate keyphrases. Since Bengali is a highly inflectional language, we have developed a lightweight stemmer for stemming the candidate keyphrases. The proposed method has been tested on a collection of Bengali documents selected from a Bengali corpus downloadable from TDIL website. |
---|---|
ISBN: | 9783642194023 3642194028 |
ISSN: | 1865-0929 1865-0937 |
DOI: | 10.1007/978-3-642-19403-0_6 |