Stacked Ensemble Learning for Propensity Score Methods in Observational Studies

Propensity score methods account for selection bias in observational studies. However, the consistency of the propensity score estimators strongly depends on a correct specification of the propensity score model. Logistic regression and, with increasing popularity, machine learning tools are used to...

Full description

Saved in:
Bibliographic Details
Published inJournal of educational data mining Vol. 13; no. 1; pp. 24 - 189
Main Authors Autenrieth, Maximilian, Levine, Richard A, Fan, Juanjuan, Guarcello, Maureen A
Format Journal Article
LanguageEnglish
Published International Educational Data Mining 30.06.2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Propensity score methods account for selection bias in observational studies. However, the consistency of the propensity score estimators strongly depends on a correct specification of the propensity score model. Logistic regression and, with increasing popularity, machine learning tools are used to estimate propensity scores. We introduce a stacked generalization ensemble learning approach to improve propensity score estimation by fitting a meta learner on the predictions of a suitable set of diverse base learners. We perform a comprehensive Monte Carlo simulation study, implementing a broad range of scenarios that mimic characteristics of typical data sets in educational studies. The population average treatment effect is estimated using the propensity score in Inverse Probability of Treatment Weighting. Our proposed stacked ensembles, especially using gradient boosting machines as a meta learner trained on a set of 12 base learner predictions, led to superior reduction of bias compared to the current state-of-the-art in propensity score estimation. Further, our simulations imply that commonly used balance measures (averaged standardized absolute mean differences) might be misleading as propensity score model selection criteria. We apply our proposed model -- which we call GBM-Stack -- to assess the population average treatment effect of a Supplemental Instruction (SI) program in an introductory psychology (PSY 101) course at San Diego State University. Our analysis provides evidence that moving the whole population to SI attendance would on average lead to 1.69 times higher odds to pass the PSY 101 class compared to not offering SI, with a 95% bootstrap confidence interval of (1.31, 2.20).
AbstractList Propensity score methods account for selection bias in observational studies. However, the consistency of the propensity score estimators strongly depends on a correct specification of the propensity score model. Logistic regression and, with increasing popularity, machine learning tools are used to estimate propensity scores. We introduce a stacked generalization ensemble learning approach to improve propensity score estimation by fitting a meta learner on the predictions of a suitable set of diverse base learners. We perform a comprehensive Monte Carlo simulation study, implementing a broad range of scenarios that mimic characteristics of typical data sets in educational studies. The population average treatment effect is estimated using the propensity score in Inverse Probability of Treatment Weighting. Our proposed stacked ensembles, especially using gradient boosting machines as a meta learner trained on a set of 12 base learner predictions, led to superior reduction of bias compared to the current state-of-the-art in propensity score estimation. Further, our simulations imply that commonly used balance measures (averaged standardized absolute mean differences) might be misleading as propensity score model selection criteria. We apply our proposed model -- which we call GBM-Stack -- to assess the population average treatment effect of a Supplemental Instruction (SI) program in an introductory psychology (PSY 101) course at San Diego State University. Our analysis provides evidence that moving the whole population to SI attendance would on average lead to 1.69 times higher odds to pass the PSY 101 class compared to not offering SI, with a 95% bootstrap confidence interval of (1.31, 2.20).
Audience Higher Education
Postsecondary Education
Author Levine, Richard A
Fan, Juanjuan
Guarcello, Maureen A
Autenrieth, Maximilian
Author_xml – sequence: 1
  fullname: Autenrieth, Maximilian
– sequence: 2
  fullname: Levine, Richard A
– sequence: 3
  fullname: Fan, Juanjuan
– sequence: 4
  fullname: Guarcello, Maureen A
BackLink http://eric.ed.gov/ERICWebPortal/detail?accno=EJ1320634$$DView record in ERIC
BookMark eNpNjEtLAzEYAINUsNZevQn5A1vz3GSPUuqLlRW295LHtxrdJiVZhfrrFfTgXGZOc45mMUVA6JKSlWSaXn9BTD6tJBFaMHmC5oxKVTFKyOxfn6FlKW_kBylUo_Qcdf1k3Dt4vIkF9nYE3ILJMcQXPKSMn3M6QCxhOuLepQz4CabX5AsOEXe2QP40U0jRjLifPnyAcoFOBzMWWP55gba3m-36vmq7u4f1TVtBw6YKaqUtoWogANI6KQdmKZHa6EYQrgzTjfS1sFIP3IGg0pmGeU1qJpwgtuYLdPW7hRzc7pDD3uTjbvNIOSM1F_wbW7lQwQ
ContentType Journal Article
DBID ERI
GA5
DOI 10.5281/zenodo.5048425
DatabaseName ERIC
ERIC - Full Text Only (Discovery)
DatabaseTitle ERIC
DatabaseTitleList ERIC
Database_xml – sequence: 1
  dbid: ERI
  name: ERIC
  url: https://eric.ed.gov/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Education
EISSN 2157-2100
ERIC EJ1320634
ExternalDocumentID EJ1320634
GeographicLocations California (San Diego)
GeographicLocations_xml – name: California (San Diego)
GroupedDBID AAHSB
ABOPQ
ALMA_UNASSIGNED_HOLDINGS
ERI
FRS
GA5
OK1
ID FETCH-LOGICAL-e92t-e678b017f0ee5bc55f2b1058a894037a2895d64b58f3ce415ca92d80624c40b63
IEDL.DBID ERI
ISSN 2157-2100
IngestDate Tue Sep 02 18:53:11 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-e92t-e678b017f0ee5bc55f2b1058a894037a2895d64b58f3ce415ca92d80624c40b63
ORCID 0000-0001-5279-1543
0000-0002-7553-4264
OpenAccessLink http://eric.ed.gov/ERICWebPortal/detail?accno=EJ1320634
PageCount 166
ParticipantIDs eric_primary_EJ1320634
PublicationCentury 2000
PublicationDate 2021-06-30
PublicationDateYYYYMMDD 2021-06-30
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-06-30
  day: 30
PublicationDecade 2020
PublicationTitle Journal of educational data mining
PublicationYear 2021
Publisher International Educational Data Mining
Publisher_xml – name: International Educational Data Mining
SSID ssj0000547978
Score 2.1644955
Snippet Propensity score methods account for selection bias in observational studies. However, the consistency of the propensity score estimators strongly depends on a...
SourceID eric
SourceType Open Access Repository
StartPage 24
SubjectTerms Artificial Intelligence
College Students
Computation
Educational Research
Observation
Outcomes of Education
Probability
Statistical Bias
Title Stacked Ensemble Learning for Propensity Score Methods in Observational Studies
URI http://eric.ed.gov/ERICWebPortal/detail?accno=EJ1320634
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA3aS72IX8VvcvAajfna5CjSUgq1ghV6KzvZrAgapV0v_nonu8vai-B5YRcmmX3zhjdvCLniUKgMkQC5ifJMOZAMbImcpxQOcgm2sGkaefpgxs9qstCL39bFfxWVkzT5a6TaTiOMboP1NJbeKnP1PxjxLGNIanhj16iFvb35DhEJ37XGq4v3tU_63ctayfMGvoz2yG5bGNK75iT3yVaIB2mncqu_OCQzrAsx5Qo6jOvwDm-Btt6oLxQLT_qYuuoxKSzoU7KmpNN6N_SavkY6g673ip9opYNHZD4azu_HrF2HwIITFQsIK4D5U_IQNHiN0QQsjmxuneIyy5E56cIo0LaUPiAu-9yJwnIjlFccjByQXvyI4ZjQ4JXwssRcNFqljWCOC0DikaV9uJjjJ2SQArH8bAwvll2ETv96cEZ2RBJ71EK6c9KrVl_hAtG6gsv6bH4AxF-O5A
linkProvider ERIC Clearinghouse on Information & Technology
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Stacked+Ensemble+Learning+for+Propensity+Score+Methods+in+Observational+Studies&rft.jtitle=Journal+of+educational+data+mining&rft.au=Autenrieth%2C+Maximilian&rft.au=Levine%2C+Richard+A&rft.au=Fan%2C+Juanjuan&rft.au=Guarcello%2C+Maureen+A&rft.date=2021-06-30&rft.pub=International+Educational+Data+Mining&rft.issn=2157-2100&rft.eissn=2157-2100&rft.volume=13&rft.issue=1&rft.spage=24&rft_id=info:doi/10.5281%2Fzenodo.5048425&rft.externalDocID=EJ1320634
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2157-2100&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2157-2100&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2157-2100&client=summon