Two improvements to detect duplicates in Stack Overflow
Stack Overflow is one of the most popular question-and-answer sites for programmers. However, there are a great number of duplicate questions that are expected to be detected automatically in a short time. In this paper, we introduce two approaches to improve the detection accuracy: splitting body i...
Saved in:
Published in | 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) pp. 563 - 564 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.02.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Stack Overflow is one of the most popular question-and-answer sites for programmers. However, there are a great number of duplicate questions that are expected to be detected automatically in a short time. In this paper, we introduce two approaches to improve the detection accuracy: splitting body into different types of data and using word-embedding to treat word ambiguities that are not contained in the general corpuses. The evaluation shows that these approaches improve the accuracy compared with the traditional method. |
---|---|
DOI: | 10.1109/SANER.2017.7884678 |