Automatic Stopword Detection Using Term Ranking between Written and Machine Speech Recognition Transcribed Reviews

Video feedback and machine speech recognition are fast-becoming a popular choice for companies to gain insight into their products. In conjunction with this, text analytics can be used to extract insight from these video translations. Currently, there is little work in the area to analyse and compar...

Full description

Saved in:
Bibliographic Details
Published in2019 12th International Conference on Developments in eSystems Engineering (DeSE) pp. 301 - 308
Main Authors Hind, Jade JK, Mahyoub, Mohamed, Woods, David, Wong, Carl, Hussain, Abir, Al-Jumeily, Dhiya
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Video feedback and machine speech recognition are fast-becoming a popular choice for companies to gain insight into their products. In conjunction with this, text analytics can be used to extract insight from these video translations. Currently, there is little work in the area to analyse and compare techniques for natural language processing, information retrieval and information extraction. A commonly practiced technique in text analytics is the extraction of stop words; words whose presence do not contribute context or information to a document. In this paper, we explore statistical techniques for the automated extraction of stop words, comparing 4 datasets from written and translated reviews. Using statistical variations of the successful technique 'term ranking', we evaluate their performance using a common list of stop words. Results suggest that variation, TFnormIDFnorm, was the most successful with a best performing precision rate of 46.7% and a recall rate of 86.6%. The best results were seen in the largest dataset using written reviews, however comparison of the remaining 3 datasets revealed that spoken text performed 0.4% better in precision than the next best dataset and 2.6% better in recall. Initial results show marginally better performance in machine speech recognition transcribed texts from videos in comparison to comparably size datasets of written reviews.
ISSN:2161-1351
DOI:10.1109/DeSE.2019.00063