The Importance of the Correlation in Crossover Experiments

Context: In empirical software engineering, crossover designs are popular for experiments comparing software engineering techniques that must be undertaken by human participants. However, their value depends on the correlation (<inline-formula><tex-math notation="LaTeX">r</t...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on software engineering Vol. 48; no. 8; pp. 2802 - 2813
Main Authors Kitchenham, Barbara, Madeyski, Lech, Scanniello, Giuseppe, Gravino, Carmine
Format Journal Article
LanguageEnglish
Published New York IEEE 01.08.2022
IEEE Computer Society
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Context: In empirical software engineering, crossover designs are popular for experiments comparing software engineering techniques that must be undertaken by human participants. However, their value depends on the correlation (<inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq1-3070480.gif"/> </inline-formula>) between the outcome measures on the same participants. Software engineering theory emphasizes the importance of individual skill differences, so we would expect the values of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq2-3070480.gif"/> </inline-formula> to be relatively high. However, few researchers have reported the values of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq3-3070480.gif"/> </inline-formula>. Goal: To investigate the values of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq4-3070480.gif"/> </inline-formula> found in software engineering experiments. Method: We undertook simulation studies to investigate the theoretical and empirical properties of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq5-3070480.gif"/> </inline-formula>. Then we investigated the values of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq6-3070480.gif"/> </inline-formula> observed in 35 software engineering crossover experiments. Results: The level of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq7-3070480.gif"/> </inline-formula> obtained by analysing our 35 crossover experiments was small. Estimates based on means, medians, and random effect analysis disagreed but were all between 0.2 and 0.3. As expected, our analyses found large variability among the individual <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq8-3070480.gif"/> </inline-formula> estimates for small sample sizes, but no indication that <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq9-3070480.gif"/> </inline-formula> estimates were larger for the experiments with larger sample sizes that exhibited smaller variability. Conclusions: Low observed <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq10-3070480.gif"/> </inline-formula> values cast doubts on the validity of crossover designs for software engineering experiments. However, if the cause of low <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq11-3070480.gif"/> </inline-formula> values relates to training limitations or toy tasks, this affects all Software Engineering (SE) experiments involving human participants. For all human-intensive SE experiments, we recommend more intensive training and then tracking the improvement of participants as they practice using specific techniques, before formally testing the effectiveness of the techniques.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2021.3070480