Using simulation to evaluate prediction techniques [for software]

The need for accurate software prediction systems increases as software becomes larger and more complex. A variety of techniques have been proposed, but none has proved consistently accurate. The underlying characteristics of the data set influence the choice of the prediction system to be used. It...

Full description

Saved in:
Bibliographic Details
Published inProceedings Seventh International Software Metrics Symposium pp. 349 - 359
Main Authors Shepperd, M., Kadoda, G.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2001
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The need for accurate software prediction systems increases as software becomes larger and more complex. A variety of techniques have been proposed, but none has proved consistently accurate. The underlying characteristics of the data set influence the choice of the prediction system to be used. It has proved difficult to obtain significant results over small data sets; consequently, we required large validation data sets. Moreover, we wished to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system and data set characteristics. Our solution has been to simulate data, allowing both control and the possibility of large validation cases. We compared regression, rule induction and nearest neighbours (a form of case-based reasoning). The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider the prediction context when evaluating competing prediction systems. We also observed that the more "messy" the data and the more complex the relationship with the dependent variable, the more variability in the results. This became apparent since we sampled two different training sets from each simulated population of data. In the more complex cases, we observed significantly different results depending upon the training set. This suggests that researchers will need to exercise caution when comparing different approaches and utilise procedures such as bootstrapping in order to generate multiple samples for training purposes.
AbstractList The need for accurate software prediction systems increases as software becomes larger and more complex. A variety of techniques have been proposed, but none has proved consistently accurate. The underlying characteristics of the data set influence the choice of the prediction system to be used. It has proved difficult to obtain significant results over small data sets; consequently, we required large validation data sets. Moreover, we wished to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system and data set characteristics. Our solution has been to simulate data, allowing both control and the possibility of large validation cases. We compared regression, rule induction and nearest neighbours (a form of case-based reasoning). The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider the prediction context when evaluating competing prediction systems. We also observed that the more "messy" the data and the more complex the relationship with the dependent variable, the more variability in the results. This became apparent since we sampled two different training sets from each simulated population of data. In the more complex cases, we observed significantly different results depending upon the training set. This suggests that researchers will need to exercise caution when comparing different approaches and utilise procedures such as bootstrapping in order to generate multiple samples for training purposes.
Author Kadoda, G.
Shepperd, M.
Author_xml – sequence: 1
  givenname: M.
  surname: Shepperd
  fullname: Shepperd, M.
  organization: Empirical Software Eng. Res. Group, Bournemouth Univ., Poole, UK
– sequence: 2
  givenname: G.
  surname: Kadoda
  fullname: Kadoda, G.
BookMark eNp9jrsKwjAUQC-o4PMHnPID1pu20WYUUXRwkTqJlKC3GmmTmrSKf--gs9OBc5bTh7axhgDGHAPOUU53q3S_XQYhIg8kFyIOWzCS8wTnMyk4xpFsQ4-LCCc8jkQX-t7fETEJZdyDxcFrc2Vel02ham0Nqy2jpyoaVROrHF30-avpfDP60ZBnx9w65m1ev5Sj0xA6uSo8jX4cwHi9SpebiSairHK6VO6dfceiv_EDM7A9ng
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/METRIC.2001.915542
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Physics
EndPage 359
ExternalDocumentID 915542
GroupedDBID 29N
29O
6IE
6IF
6IK
6IL
6IN
AAJGR
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
JC5
OCL
RIE
RIL
RNS
ID FETCH-ieee_primary_9155423
IEDL.DBID RIE
ISBN 9780769510439
0769510434
ISSN 1530-1435
IngestDate Wed Jun 26 19:23:30 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-ieee_primary_9155423
ParticipantIDs ieee_primary_915542
PublicationCentury 2000
PublicationDate 20010000
PublicationDateYYYYMMDD 2001-01-01
PublicationDate_xml – year: 2001
  text: 20010000
PublicationDecade 2000
PublicationTitle Proceedings Seventh International Software Metrics Symposium
PublicationTitleAbbrev METRIC
PublicationYear 2001
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008294
ssj0000451555
Score 2.619042
Snippet The need for accurate software prediction systems increases as software becomes larger and more complex. A variety of techniques have been proposed, but none...
SourceID ieee
SourceType Publisher
StartPage 349
SubjectTerms Accuracy
Computational modeling
Control systems
Design engineering
Predictive models
Software engineering
Software systems
Statistical analysis
Systems engineering and theory
Uncertainty
Title Using simulation to evaluate prediction techniques [for software]
URI https://ieeexplore.ieee.org/document/915542
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFL3oQPBpWic6P8iDr-3aNa3No4yNKUxEJgxERpvegYjt6AeCv97cpN1Q9iD0oc1Dm1Bu7klyzzkAN4HKKiuU5BGG0ubCE3aUSs8mW-tQqIyfIHGHZ4_h9IU_LIJFo7OtuTCIqIvP0KFbfZaf5rKmrbIBaZlzNd_uR-7QULU22ykkkxIEW-QbDbUHoopn1yZIYFbsBCe4zxvhnfZZtGQaVwxm4_nz_YjWjZ5jPvfLdkVnnUnX0LlLLVZIxSYfTl0ljvz-I-X4zwEdQW9L72NPm8R1DHuYWdBt_R1YE-4WHOjyUFmewJ2uLGDl-2fj9sWqnDVC4cjWBZ32mOZWErZkrwoOs1LN8l9xgW896E_G89HUpu4t10bjYml65p9CJ8szPAOmYIhM_Jj7CpKroOeRuPVklIoVD1HNEfE5WDte0N_ZegGHppSLrkvoVEWNVyq3V8m1_qs_oRigYA
link.rule.ids 310,311,786,790,795,796,802,4069,4070,27956,55107
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7IRPQ0nROdv3Lw2q61adccZWxsug2RCQOR0WZvIMN2rB2Cf715Sbuh7CD00ObQJpSX70vyvu8B3PkKVeYoqUYYSosLV1jhTLoWlbUOhEL8GEk7PBwFvVf-OPEnhc-21sIgok4-Q5tu9Vn-LJVr2iprkpc5V_PtvoJ5p2XEWpsNFTJK8f0t9w3vdRVEFdGORaTArNmJUHCPF9Y75bMo5TSOaA4745d-m1aOrm0--KvwisadbtUIujNtV0jpJgt7nce2_P5j5vjPIR1DfSvwY88b6DqBPUxqUC0rPLAi4GtwoBNEZXYKDzq3gGUfn0W9L5anrLAKR7Zc0XmPaS5NYTP2pggxy9Q8_xWt8L0OjW5n3O5Z1L3p0rhcTE3PvDOoJGmC58AUEZGxF3FPkXIV9jwULVeGMzHnAapZIrqA2o4XNHa23sJhbzwcTAf90dMlHJnELrquoJKv1nitkD6Pb_Qf_gFKUKO0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+Seventh+International+Software+Metrics+Symposium&rft.atitle=Using+simulation+to+evaluate+prediction+techniques+%5Bfor+software%5D&rft.au=Shepperd%2C+M.&rft.au=Kadoda%2C+G.&rft.date=2001-01-01&rft.pub=IEEE&rft.isbn=9780769510439&rft.issn=1530-1435&rft.spage=349&rft.epage=359&rft_id=info:doi/10.1109%2FMETRIC.2001.915542&rft.externalDocID=915542
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-1435&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-1435&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-1435&client=summon