Champion-challenger based predictive model selection
The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment sta...
Saved in:
Published in | Proceedings 2007 IEEE SoutheastCon p. 254 |
---|---|
Main Author | |
Format | Conference Proceeding |
Language | English |
Published |
01.03.2007
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment stays the most effective one as newer data comes in. Here we will address the issue of how to continually strive for the best model even after a predictive model is deployed for production use. In the champion-challenger based model selection paradigm, the historical data is used for creating the best or the champion predictive model using criteria like misclassification rate for a given cost matrix. Apart from the champion models, a number of other models are selected which are not as good as the champion model in predictive accuracy using same data. These models are termed as challengers to the current champion model. These models may differ from the champion model in the underlying predictive algorithm, algorithm tuning parameters or in use of model attributes. The predictive modeling starts with the conventional processes such as identifying the business problem that warrants the need for predictive modeling, finding the significant attributes for modeling, data quality analysis, followed by the actual modeling building and evaluation of the models. However, the emphasis is not at finding just the top or the champion model but to find the other models that are close in terms of model performance. The guiding principle here is that the selection of the best predictive model based on the current set of historical data, is not the stamp of approval till eternity. Real world systems that use predictive modeling are complex and dynamic processes and need to incorporate means to capture that. When the champion model is deployed in a production system and is used for predictions, these results are saved in a table. Likewise, the challenger models are also used to score a subset of the data and save the results. The predictions of the challenger models do not impact the real-time predictive use of the system. Based on the time intervals for future predictions, when the future time arrives, the actual results are captured for the same instances of data. |
---|---|
AbstractList | The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment stays the most effective one as newer data comes in. Here we will address the issue of how to continually strive for the best model even after a predictive model is deployed for production use. In the champion-challenger based model selection paradigm, the historical data is used for creating the best or the champion predictive model using criteria like misclassification rate for a given cost matrix. Apart from the champion models, a number of other models are selected which are not as good as the champion model in predictive accuracy using same data. These models are termed as challengers to the current champion model. These models may differ from the champion model in the underlying predictive algorithm, algorithm tuning parameters or in use of model attributes. The predictive modeling starts with the conventional processes such as identifying the business problem that warrants the need for predictive modeling, finding the significant attributes for modeling, data quality analysis, followed by the actual modeling building and evaluation of the models. However, the emphasis is not at finding just the top or the champion model but to find the other models that are close in terms of model performance. The guiding principle here is that the selection of the best predictive model based on the current set of historical data, is not the stamp of approval till eternity. Real world systems that use predictive modeling are complex and dynamic processes and need to incorporate means to capture that. When the champion model is deployed in a production system and is used for predictions, these results are saved in a table. Likewise, the challenger models are also used to score a subset of the data and save the results. The predictions of the challenger models do not impact the real-time predictive use of the system. Based on the time intervals for future predictions, when the future time arrives, the actual results are captured for the same instances of data. |
Author | Shyam Varan Nath |
Author_xml | – sequence: 1 surname: Shyam Varan Nath fullname: Shyam Varan Nath organization: Oracle Corp |
BookMark | eNpFjMtKw0AUQEetYFP9AHGTH0i8d3Knk1lKqA8odmEX7so8buxIXiRF8O8NKLg6HA6cRCy6vmMhbhFyRDD3b5tq95pLAJ0XJEujz0SCJIkQpIFzsUSlygxU-X7xH0q5mAMYzAAUXIlkmj4BJBCqpaDqaNsh9l3mj7ZpuPvgMXV24pAOI4foT_GL07YP3KQTNzx7312Ly9o2E9_8cSX2j5t99Zxtd08v1cM2i6jVKfOGnHHBaalQoTNBWqTgyVIRLFAAjeu1rX09G2jrvVSuhLrwRpFlV6zE3e82MvNhGGNrx-8DIWmSuvgBklhLDg |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/SECON.2007.342897 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISBN | 1424410290 9781424410293 |
EISSN | 1558-058X |
EndPage | 254 |
ExternalDocumentID | 4147427 |
Genre | orig-research |
GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP JC5 OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i175t-c94b9bdb725151b9d2a14dc4a43da04d07166afcfda007acc25b80f3c954aeb3 |
IEDL.DBID | RIE |
ISBN | 1424410282 9781424410286 |
ISSN | 1091-0050 |
IngestDate | Wed Jun 26 19:32:19 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i175t-c94b9bdb725151b9d2a14dc4a43da04d07166afcfda007acc25b80f3c954aeb3 |
PageCount | 1 |
ParticipantIDs | ieee_primary_4147427 |
PublicationCentury | 2000 |
PublicationDate | 2007-March |
PublicationDateYYYYMMDD | 2007-03-01 |
PublicationDate_xml | – month: 03 year: 2007 text: 2007-March |
PublicationDecade | 2000 |
PublicationTitle | Proceedings 2007 IEEE SoutheastCon |
PublicationTitleAbbrev | SECON |
PublicationYear | 2007 |
SSID | ssj0020415 ssj0001764036 |
Score | 1.6764696 |
Snippet | The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 254 |
SubjectTerms | Accuracy Costs Data analysis Data mining Iterative algorithms Prediction algorithms Predictive models Production systems Real time systems Testing |
Title | Champion-challenger based predictive model selection |
URI | https://ieeexplore.ieee.org/document/4147427 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8MwGH7ZdtKLH5v4TQ8ezZa2b9LmPBxDUAQn7DbyVRRxG1t38debpO2m4sFbU3LISyDP-_k8ADfCs9pwlpGCp5a4eEOR3EpKWCFzqmluYhnYPh_5-AXvp2zagtvtLIy1NjSf2b7_DLV8s9AbnyobYIwuksva0M6EqGa1dvmUjCP12t11sOVHz0Ol03f2UEaboS4PqEnD9VSveV3udJsHz15AsKI2TJ1rLn7KrgTUGR3AQ3Peqtnkvb8pVV9__qJy_K9Bh9DbzfdFT1vkOoKWnR_D_jdqwi7g8FV-LN2lEd3IrawiD3kmWq58ccc_k1HQ0YnWQUvH7e3BZHQ3GY5JrbBA3pzbUBItUAllVOa8HBYrYRIZo9EoMTWSonH-B-ey0IVb0UxqnTCV0yLVgqF0YfgJdOaLuT2FyFCd6AJD3ygWqcnRKK5T95JKIRizZ9D15s-WFYfGrLb8_O_fF7BX5VB9r9cldMrVxl458C_Vdbj1L_k1pso |
link.rule.ids | 310,311,783,787,792,793,799,27937,55086 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8IwGH6DeFAvfoDx2x08Wui2tlvPRIIKxERMuJF-LRoiEBwXf71tt4EaD97WpYe9WdP3eb-eB-CGO1YbRhOUsdggG29IlBqBEc1EihVOdSg82-eQ9V7Iw5iOa3C7noUxxvjmM9Nyj76Wr-dq5VJlbRISG8klW7BtcXXKimmtTUYlYQQ79e4y3HLD577W6Xp7MMXVWJdzqVHF9lSuWVnwtJvbz05CsCA3jC045z-FV7zf6e7DoPriot1k2lrlsqU-f5E5_tekA2huJvyCp7XvOoSamR3B3jdywgaQzqt4X9jfhlQluLIMnNPTwWLpyjvuogy8kk7w4dV07N4mjLp3o04PlRoL6M0ChxwpTiSXWiYW59BQch2JkGhFBIm1wERbBMKYyFRmVzgRSkVUpjiLFadE2ED8GOqz-cycQKCxilRGfOcoyWKdEi2Ziu1dKjin1JxCw5k_WRQsGpPS8rO_X1_DTm806E_698PHc9gtMqqu8-sC6vlyZS4tFMjllT8BX-pJqhU |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+2007+IEEE+SoutheastCon&rft.atitle=Champion-challenger+based+predictive+model+selection&rft.au=Shyam+Varan+Nath&rft.date=2007-03-01&rft.isbn=9781424410286&rft.issn=1091-0050&rft.eissn=1558-058X&rft.spage=254&rft.epage=254&rft_id=info:doi/10.1109%2FSECON.2007.342897&rft.externalDocID=4147427 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1091-0050&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1091-0050&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1091-0050&client=summon |