基于混合方法的中文微博自动摘要技术研究
针对微博内容驳杂、信息稀疏的问题,深入研究传统自动摘要技术,结合微博数据特点,在微博事件提取的基础上提出一种基于统计和理解的混合摘要方法。首先根据词频、句子位置等文本特征得到基于统计的初始摘要;然后通过语义词典,计算句子相似度、确定事件主体进行基于语义理解的可读性加工,使最终摘要更具可读性;最后采用合理的摘要评价方法评价所得摘要。实验结果表明,该方法在不同压缩比例下均能获得质量稳定且可读性良好的摘要。...
Saved in:
Published in | 计算机工程与科学 Vol. 38; no. 6; pp. 1257 - 1261 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
内蒙古科技大学信息工程学院,内蒙古包头,014000
2016
|
Subjects | |
Online Access | Get full text |
ISSN | 1007-130X |
DOI | 10.3969/j.issn.1007-130X.2016.06.029 |
Cover
Loading…
Summary: | 针对微博内容驳杂、信息稀疏的问题,深入研究传统自动摘要技术,结合微博数据特点,在微博事件提取的基础上提出一种基于统计和理解的混合摘要方法。首先根据词频、句子位置等文本特征得到基于统计的初始摘要;然后通过语义词典,计算句子相似度、确定事件主体进行基于语义理解的可读性加工,使最终摘要更具可读性;最后采用合理的摘要评价方法评价所得摘要。实验结果表明,该方法在不同压缩比例下均能获得质量稳定且可读性良好的摘要。 |
---|---|
Bibliography: | Micro-blog features complex contents and sparse information. In order to solve these prob- lems, on the basis of in-depth study on traditional automatic abstract techniques, combing with the data of micro-blog features, we propose a hybrid automatic summarization method based on statistics and comprehension for micro-blog event extraction. Firstly, we obtain the initial abstract based on the statistics according to word frequency and the location of sentences. Then we calculate sentence similarity through the semantic dictionary, determine the event subject, process the semantic understanding based readability, and make the final abstract more readable. Finally, a reasonable abstract evaluation method is adopted to evaluate the obtained abstract. Experimental results show that the proposed method can obtain a good summary of stable quality and readability under different compression ratios. 43-1258/TP GAO Yong-bing, ZHONG Zhen-hua, WANG Yu, MA Zhan-fei (College of Information Engineering,Inner Mongolia Universi |
ISSN: | 1007-130X |
DOI: | 10.3969/j.issn.1007-130X.2016.06.029 |