Automatic Stopword Detection Using Term Ranking between Written and Machine Speech Recognition Transcribed Reviews
Video feedback and machine speech recognition are fast-becoming a popular choice for companies to gain insight into their products. In conjunction with this, text analytics can be used to extract insight from these video translations. Currently, there is little work in the area to analyse and compar...
Saved in:
Published in | 2019 12th International Conference on Developments in eSystems Engineering (DeSE) pp. 301 - 308 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.10.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Video feedback and machine speech recognition are fast-becoming a popular choice for companies to gain insight into their products. In conjunction with this, text analytics can be used to extract insight from these video translations. Currently, there is little work in the area to analyse and compare techniques for natural language processing, information retrieval and information extraction. A commonly practiced technique in text analytics is the extraction of stop words; words whose presence do not contribute context or information to a document. In this paper, we explore statistical techniques for the automated extraction of stop words, comparing 4 datasets from written and translated reviews. Using statistical variations of the successful technique 'term ranking', we evaluate their performance using a common list of stop words. Results suggest that variation, TFnormIDFnorm, was the most successful with a best performing precision rate of 46.7% and a recall rate of 86.6%. The best results were seen in the largest dataset using written reviews, however comparison of the remaining 3 datasets revealed that spoken text performed 0.4% better in precision than the next best dataset and 2.6% better in recall. Initial results show marginally better performance in machine speech recognition transcribed texts from videos in comparison to comparably size datasets of written reviews. |
---|---|
ISSN: | 2161-1351 |
DOI: | 10.1109/DeSE.2019.00063 |