An N-Gram Based Method for Bengali Keyphrase Extraction

Keyphrases provide the subject metadata that gives the clues about the content of a document. In this paper, we present a new method for Bengali keyphrase extraction. The proposed method has several steps such as extraction of n-grams, identification of candidate keyphrases and assigning scores to t...

Full description

Saved in:
Bibliographic Details
Published inInformation Systems for Indian Languages pp. 36 - 41
Main Author Sarkar, Kamal
Format Book Chapter
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 2011
SeriesCommunications in Computer and Information Science
Subjects
Online AccessGet full text
ISBN9783642194023
3642194028
ISSN1865-0929
1865-0937
DOI10.1007/978-3-642-19403-0_6

Cover

More Information
Summary:Keyphrases provide the subject metadata that gives the clues about the content of a document. In this paper, we present a new method for Bengali keyphrase extraction. The proposed method has several steps such as extraction of n-grams, identification of candidate keyphrases and assigning scores to the candidate keyphrases. Since Bengali is a highly inflectional language, we have developed a lightweight stemmer for stemming the candidate keyphrases. The proposed method has been tested on a collection of Bengali documents selected from a Bengali corpus downloadable from TDIL website.
ISBN:9783642194023
3642194028
ISSN:1865-0929
1865-0937
DOI:10.1007/978-3-642-19403-0_6