Champion-challenger based predictive model selection

The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment sta...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings 2007 IEEE SoutheastCon p. 254
Main Author	Shyam Varan Nath
Format	Conference Proceeding
Language	English
Published	01.03.2007
Subjects	Accuracy Costs Data analysis Data mining Iterative algorithms Prediction algorithms Predictive models Production systems Real time systems Testing
Online Access	Get full text

Cover

Loading…

Abstract	The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment stays the most effective one as newer data comes in. Here we will address the issue of how to continually strive for the best model even after a predictive model is deployed for production use. In the champion-challenger based model selection paradigm, the historical data is used for creating the best or the champion predictive model using criteria like misclassification rate for a given cost matrix. Apart from the champion models, a number of other models are selected which are not as good as the champion model in predictive accuracy using same data. These models are termed as challengers to the current champion model. These models may differ from the champion model in the underlying predictive algorithm, algorithm tuning parameters or in use of model attributes. The predictive modeling starts with the conventional processes such as identifying the business problem that warrants the need for predictive modeling, finding the significant attributes for modeling, data quality analysis, followed by the actual modeling building and evaluation of the models. However, the emphasis is not at finding just the top or the champion model but to find the other models that are close in terms of model performance. The guiding principle here is that the selection of the best predictive model based on the current set of historical data, is not the stamp of approval till eternity. Real world systems that use predictive modeling are complex and dynamic processes and need to incorporate means to capture that. When the champion model is deployed in a production system and is used for predictions, these results are saved in a table. Likewise, the challenger models are also used to score a subset of the data and save the results. The predictions of the challenger models do not impact the real-time predictive use of the system. Based on the time intervals for future predictions, when the future time arrives, the actual results are captured for the same instances of data.
AbstractList	The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment stays the most effective one as newer data comes in. Here we will address the issue of how to continually strive for the best model even after a predictive model is deployed for production use. In the champion-challenger based model selection paradigm, the historical data is used for creating the best or the champion predictive model using criteria like misclassification rate for a given cost matrix. Apart from the champion models, a number of other models are selected which are not as good as the champion model in predictive accuracy using same data. These models are termed as challengers to the current champion model. These models may differ from the champion model in the underlying predictive algorithm, algorithm tuning parameters or in use of model attributes. The predictive modeling starts with the conventional processes such as identifying the business problem that warrants the need for predictive modeling, finding the significant attributes for modeling, data quality analysis, followed by the actual modeling building and evaluation of the models. However, the emphasis is not at finding just the top or the champion model but to find the other models that are close in terms of model performance. The guiding principle here is that the selection of the best predictive model based on the current set of historical data, is not the stamp of approval till eternity. Real world systems that use predictive modeling are complex and dynamic processes and need to incorporate means to capture that. When the champion model is deployed in a production system and is used for predictions, these results are saved in a table. Likewise, the challenger models are also used to score a subset of the data and save the results. The predictions of the challenger models do not impact the real-time predictive use of the system. Based on the time intervals for future predictions, when the future time arrives, the actual results are captured for the same instances of data.
Author	Shyam Varan Nath
Author_xml	– sequence: 1 surname: Shyam Varan Nath fullname: Shyam Varan Nath organization: Oracle Corp
BookMark	eNpFjMtKw0AUQEetYFP9AHGTH0i8d3Knk1lKqA8odmEX7so8buxIXiRF8O8NKLg6HA6cRCy6vmMhbhFyRDD3b5tq95pLAJ0XJEujz0SCJIkQpIFzsUSlygxU-X7xH0q5mAMYzAAUXIlkmj4BJBCqpaDqaNsh9l3mj7ZpuPvgMXV24pAOI4foT_GL07YP3KQTNzx7312Ly9o2E9_8cSX2j5t99Zxtd08v1cM2i6jVKfOGnHHBaalQoTNBWqTgyVIRLFAAjeu1rX09G2jrvVSuhLrwRpFlV6zE3e82MvNhGGNrx-8DIWmSuvgBklhLDg
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/SECON.2007.342897
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	1424410290 9781424410293
EISSN	1558-058X
EndPage	254
ExternalDocumentID	4147427
Genre	orig-research
GroupedDBID	29O 6IE 6IF 6IH 6IK 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP JC5 OCL RIE RIL RIO
ID	FETCH-LOGICAL-i175t-c94b9bdb725151b9d2a14dc4a43da04d07166afcfda007acc25b80f3c954aeb3
IEDL.DBID	RIE
ISBN	1424410282 9781424410286
ISSN	1091-0050
IngestDate	Wed Jun 26 19:32:19 EDT 2024
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i175t-c94b9bdb725151b9d2a14dc4a43da04d07166afcfda007acc25b80f3c954aeb3
PageCount	1
ParticipantIDs	ieee_primary_4147427
PublicationCentury	2000
PublicationDate	2007-March
PublicationDateYYYYMMDD	2007-03-01
PublicationDate_xml	– month: 03 year: 2007 text: 2007-March
PublicationDecade	2000
PublicationTitle	Proceedings 2007 IEEE SoutheastCon
PublicationTitleAbbrev	SECON
PublicationYear	2007
SSID	ssj0020415 ssj0001764036
Score	1.6764696
Snippet	The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a...
SourceID	ieee
SourceType	Publisher
StartPage	254
SubjectTerms	Accuracy Costs Data analysis Data mining Iterative algorithms Prediction algorithms Predictive models Production systems Real time systems Testing
Title	Champion-challenger based predictive model selection
URI	https://ieeexplore.ieee.org/document/4147427
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8MwGH7ZdtKLH5v4TQ8ezZa2b9LmPBxDUAQn7DbyVRRxG1t38debpO2m4sFbU3LISyDP-_k8ADfCs9pwlpGCp5a4eEOR3EpKWCFzqmluYhnYPh_5-AXvp2zagtvtLIy1NjSf2b7_DLV8s9AbnyobYIwuksva0M6EqGa1dvmUjCP12t11sOVHz0Ol03f2UEaboS4PqEnD9VSveV3udJsHz15AsKI2TJ1rLn7KrgTUGR3AQ3Peqtnkvb8pVV9__qJy_K9Bh9DbzfdFT1vkOoKWnR_D_jdqwi7g8FV-LN2lEd3IrawiD3kmWq58ccc_k1HQ0YnWQUvH7e3BZHQ3GY5JrbBA3pzbUBItUAllVOa8HBYrYRIZo9EoMTWSonH-B-ey0IVb0UxqnTCV0yLVgqF0YfgJdOaLuT2FyFCd6AJD3ygWqcnRKK5T95JKIRizZ9D15s-WFYfGrLb8_O_fF7BX5VB9r9cldMrVxl458C_Vdbj1L_k1pso
link.rule.ids	310,311,783,787,792,793,799,27937,55086
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8IwGH6DeFAvfoDx2x08Wui2tlvPRIIKxERMuJF-LRoiEBwXf71tt4EaD97WpYe9WdP3eb-eB-CGO1YbRhOUsdggG29IlBqBEc1EihVOdSg82-eQ9V7Iw5iOa3C7noUxxvjmM9Nyj76Wr-dq5VJlbRISG8klW7BtcXXKimmtTUYlYQQ79e4y3HLD577W6Xp7MMXVWJdzqVHF9lSuWVnwtJvbz05CsCA3jC045z-FV7zf6e7DoPriot1k2lrlsqU-f5E5_tekA2huJvyCp7XvOoSamR3B3jdywgaQzqt4X9jfhlQluLIMnNPTwWLpyjvuogy8kk7w4dV07N4mjLp3o04PlRoL6M0ChxwpTiSXWiYW59BQch2JkGhFBIm1wERbBMKYyFRmVzgRSkVUpjiLFadE2ED8GOqz-cycQKCxilRGfOcoyWKdEi2Ziu1dKjin1JxCw5k_WRQsGpPS8rO_X1_DTm806E_698PHc9gtMqqu8-sC6vlyZS4tFMjllT8BX-pJqhU
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+2007+IEEE+SoutheastCon&rft.atitle=Champion-challenger+based+predictive+model+selection&rft.au=Shyam+Varan+Nath&rft.date=2007-03-01&rft.isbn=9781424410286&rft.issn=1091-0050&rft.eissn=1558-058X&rft.spage=254&rft.epage=254&rft_id=info:doi/10.1109%2FSECON.2007.342897&rft.externalDocID=4147427
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1091-0050&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1091-0050&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1091-0050&client=summon