Two improvements to detect duplicates in Stack Overflow

Stack Overflow is one of the most popular question-and-answer sites for programmers. However, there are a great number of duplicate questions that are expected to be detected automatically in a short time. In this paper, we introduce two approaches to improve the detection accuracy: splitting body i...

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) pp. 563 - 564
Main Authors Mizobuchi, Yuji, Takayama, Kuniharu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.02.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Stack Overflow is one of the most popular question-and-answer sites for programmers. However, there are a great number of duplicate questions that are expected to be detected automatically in a short time. In this paper, we introduce two approaches to improve the detection accuracy: splitting body into different types of data and using word-embedding to treat word ambiguities that are not contained in the general corpuses. The evaluation shows that these approaches improve the accuracy compared with the traditional method.
DOI:10.1109/SANER.2017.7884678