Champion-challenger based predictive model selection

The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment sta...

Full description

Saved in:
Bibliographic Details
Published inProceedings 2007 IEEE SoutheastCon p. 254
Main Author Shyam Varan Nath
Format Conference Proceeding
LanguageEnglish
Published 01.03.2007
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment stays the most effective one as newer data comes in. Here we will address the issue of how to continually strive for the best model even after a predictive model is deployed for production use. In the champion-challenger based model selection paradigm, the historical data is used for creating the best or the champion predictive model using criteria like misclassification rate for a given cost matrix. Apart from the champion models, a number of other models are selected which are not as good as the champion model in predictive accuracy using same data. These models are termed as challengers to the current champion model. These models may differ from the champion model in the underlying predictive algorithm, algorithm tuning parameters or in use of model attributes. The predictive modeling starts with the conventional processes such as identifying the business problem that warrants the need for predictive modeling, finding the significant attributes for modeling, data quality analysis, followed by the actual modeling building and evaluation of the models. However, the emphasis is not at finding just the top or the champion model but to find the other models that are close in terms of model performance. The guiding principle here is that the selection of the best predictive model based on the current set of historical data, is not the stamp of approval till eternity. Real world systems that use predictive modeling are complex and dynamic processes and need to incorporate means to capture that. When the champion model is deployed in a production system and is used for predictions, these results are saved in a table. Likewise, the challenger models are also used to score a subset of the data and save the results. The predictions of the challenger models do not impact the real-time predictive use of the system. Based on the time intervals for future predictions, when the future time arrives, the actual results are captured for the same instances of data.
AbstractList The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment stays the most effective one as newer data comes in. Here we will address the issue of how to continually strive for the best model even after a predictive model is deployed for production use. In the champion-challenger based model selection paradigm, the historical data is used for creating the best or the champion predictive model using criteria like misclassification rate for a given cost matrix. Apart from the champion models, a number of other models are selected which are not as good as the champion model in predictive accuracy using same data. These models are termed as challengers to the current champion model. These models may differ from the champion model in the underlying predictive algorithm, algorithm tuning parameters or in use of model attributes. The predictive modeling starts with the conventional processes such as identifying the business problem that warrants the need for predictive modeling, finding the significant attributes for modeling, data quality analysis, followed by the actual modeling building and evaluation of the models. However, the emphasis is not at finding just the top or the champion model but to find the other models that are close in terms of model performance. The guiding principle here is that the selection of the best predictive model based on the current set of historical data, is not the stamp of approval till eternity. Real world systems that use predictive modeling are complex and dynamic processes and need to incorporate means to capture that. When the champion model is deployed in a production system and is used for predictions, these results are saved in a table. Likewise, the challenger models are also used to score a subset of the data and save the results. The predictions of the challenger models do not impact the real-time predictive use of the system. Based on the time intervals for future predictions, when the future time arrives, the actual results are captured for the same instances of data.
Author Shyam Varan Nath
Author_xml – sequence: 1
  surname: Shyam Varan Nath
  fullname: Shyam Varan Nath
  organization: Oracle Corp
BookMark eNpFjMtKw0AUQEetYFP9AHGTH0i8d3Knk1lKqA8odmEX7so8buxIXiRF8O8NKLg6HA6cRCy6vmMhbhFyRDD3b5tq95pLAJ0XJEujz0SCJIkQpIFzsUSlygxU-X7xH0q5mAMYzAAUXIlkmj4BJBCqpaDqaNsh9l3mj7ZpuPvgMXV24pAOI4foT_GL07YP3KQTNzx7312Ly9o2E9_8cSX2j5t99Zxtd08v1cM2i6jVKfOGnHHBaalQoTNBWqTgyVIRLFAAjeu1rX09G2jrvVSuhLrwRpFlV6zE3e82MvNhGGNrx-8DIWmSuvgBklhLDg
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/SECON.2007.342897
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 1424410290
9781424410293
EISSN 1558-058X
EndPage 254
ExternalDocumentID 4147427
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
JC5
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i175t-c94b9bdb725151b9d2a14dc4a43da04d07166afcfda007acc25b80f3c954aeb3
IEDL.DBID RIE
ISBN 1424410282
9781424410286
ISSN 1091-0050
IngestDate Wed Jun 26 19:32:19 EDT 2024
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-c94b9bdb725151b9d2a14dc4a43da04d07166afcfda007acc25b80f3c954aeb3
PageCount 1
ParticipantIDs ieee_primary_4147427
PublicationCentury 2000
PublicationDate 2007-March
PublicationDateYYYYMMDD 2007-03-01
PublicationDate_xml – month: 03
  year: 2007
  text: 2007-March
PublicationDecade 2000
PublicationTitle Proceedings 2007 IEEE SoutheastCon
PublicationTitleAbbrev SECON
PublicationYear 2007
SSID ssj0020415
ssj0001764036
Score 1.6764696
Snippet The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a...
SourceID ieee
SourceType Publisher
StartPage 254
SubjectTerms Accuracy
Costs
Data analysis
Data mining
Iterative algorithms
Prediction algorithms
Predictive models
Production systems
Real time systems
Testing
Title Champion-challenger based predictive model selection
URI https://ieeexplore.ieee.org/document/4147427
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8MwGH7ZdtKLH5v4TQ8ezZa2b9LmPBxDUAQn7DbyVRRxG1t38debpO2m4sFbU3LISyDP-_k8ADfCs9pwlpGCp5a4eEOR3EpKWCFzqmluYhnYPh_5-AXvp2zagtvtLIy1NjSf2b7_DLV8s9AbnyobYIwuksva0M6EqGa1dvmUjCP12t11sOVHz0Ol03f2UEaboS4PqEnD9VSveV3udJsHz15AsKI2TJ1rLn7KrgTUGR3AQ3Peqtnkvb8pVV9__qJy_K9Bh9DbzfdFT1vkOoKWnR_D_jdqwi7g8FV-LN2lEd3IrawiD3kmWq58ccc_k1HQ0YnWQUvH7e3BZHQ3GY5JrbBA3pzbUBItUAllVOa8HBYrYRIZo9EoMTWSonH-B-ey0IVb0UxqnTCV0yLVgqF0YfgJdOaLuT2FyFCd6AJD3ygWqcnRKK5T95JKIRizZ9D15s-WFYfGrLb8_O_fF7BX5VB9r9cldMrVxl458C_Vdbj1L_k1pso
link.rule.ids 310,311,783,787,792,793,799,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8IwGH6DeFAvfoDx2x08Wui2tlvPRIIKxERMuJF-LRoiEBwXf71tt4EaD97WpYe9WdP3eb-eB-CGO1YbRhOUsdggG29IlBqBEc1EihVOdSg82-eQ9V7Iw5iOa3C7noUxxvjmM9Nyj76Wr-dq5VJlbRISG8klW7BtcXXKimmtTUYlYQQ79e4y3HLD577W6Xp7MMXVWJdzqVHF9lSuWVnwtJvbz05CsCA3jC045z-FV7zf6e7DoPriot1k2lrlsqU-f5E5_tekA2huJvyCp7XvOoSamR3B3jdywgaQzqt4X9jfhlQluLIMnNPTwWLpyjvuogy8kk7w4dV07N4mjLp3o04PlRoL6M0ChxwpTiSXWiYW59BQch2JkGhFBIm1wERbBMKYyFRmVzgRSkVUpjiLFadE2ED8GOqz-cycQKCxilRGfOcoyWKdEi2Ziu1dKjin1JxCw5k_WRQsGpPS8rO_X1_DTm806E_698PHc9gtMqqu8-sC6vlyZS4tFMjllT8BX-pJqhU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+2007+IEEE+SoutheastCon&rft.atitle=Champion-challenger+based+predictive+model+selection&rft.au=Shyam+Varan+Nath&rft.date=2007-03-01&rft.isbn=9781424410286&rft.issn=1091-0050&rft.eissn=1558-058X&rft.spage=254&rft.epage=254&rft_id=info:doi/10.1109%2FSECON.2007.342897&rft.externalDocID=4147427
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1091-0050&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1091-0050&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1091-0050&client=summon