The Importance of the Correlation in Crossover Experiments

Context: In empirical software engineering, crossover designs are popular for experiments comparing software engineering techniques that must be undertaken by human participants. However, their value depends on the correlation (<inline-formula><tex-math notation="LaTeX">r</t...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on software engineering Vol. 48; no. 8; pp. 2802 - 2813
Main Authors	Kitchenham, Barbara, Madeyski, Lech, Scanniello, Giuseppe, Gravino, Carmine
Format	Journal Article
Language	English
Published	New York IEEE 01.08.2022 IEEE Computer Society
Subjects	Atmospheric measurements Correlation crossover design crossover experiments Empirical analysis Empirical software engineering Estimates Experiments Mathematical model Particle measurements repeated measures correlation Size measurement Software engineering Time measurement Training Within-subjects design
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Context: In empirical software engineering, crossover designs are popular for experiments comparing software engineering techniques that must be undertaken by human participants. However, their value depends on the correlation (<inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq1-3070480.gif"/> </inline-formula>) between the outcome measures on the same participants. Software engineering theory emphasizes the importance of individual skill differences, so we would expect the values of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq2-3070480.gif"/> </inline-formula> to be relatively high. However, few researchers have reported the values of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq3-3070480.gif"/> </inline-formula>. Goal: To investigate the values of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq4-3070480.gif"/> </inline-formula> found in software engineering experiments. Method: We undertook simulation studies to investigate the theoretical and empirical properties of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq5-3070480.gif"/> </inline-formula>. Then we investigated the values of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq6-3070480.gif"/> </inline-formula> observed in 35 software engineering crossover experiments. Results: The level of <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq7-3070480.gif"/> </inline-formula> obtained by analysing our 35 crossover experiments was small. Estimates based on means, medians, and random effect analysis disagreed but were all between 0.2 and 0.3. As expected, our analyses found large variability among the individual <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq8-3070480.gif"/> </inline-formula> estimates for small sample sizes, but no indication that <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq9-3070480.gif"/> </inline-formula> estimates were larger for the experiments with larger sample sizes that exhibited smaller variability. Conclusions: Low observed <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq10-3070480.gif"/> </inline-formula> values cast doubts on the validity of crossover designs for software engineering experiments. However, if the cause of low <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="madeyski-ieq11-3070480.gif"/> </inline-formula> values relates to training limitations or toy tasks, this affects all Software Engineering (SE) experiments involving human participants. For all human-intensive SE experiments, we recommend more intensive training and then tracking the improvement of participants as they practice using specific techniques, before formally testing the effectiveness of the techniques.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0098-5589 1939-3520
DOI:	10.1109/TSE.2021.3070480