My Research
Posted: 1 May 2026 | PROM05 Week 1
My Research
Posted: 1 May 2026 | PROM05 Week 1
My research sits at the intersection of causal inference and machine learning, specifically within the subfield known as uplift modelling. While conventional predictive modelling asks what is likely to happen, uplift modelling asks a more precise and consequential question: what is the causal effect of a specific intervention on this individual? Devriendt, Moldovan and Verbeke (2018, p.13) define this as estimating the causal effect of an action on an individual, distinguishing it from conventional classification by its orientation toward incremental rather than absolute outcomes. In practice, this distinction determines which individuals receive healthcare outreach, financial products, or educational support, meaning poor model selection carries direct real-world consequences (Olaya et al., 2020).
The motivation for this project stems from a recognised and unresolved problem in the field. The two most widely used evaluation metrics, the Qini coefficient (Radcliffe, 2007) and the Area Under the Uplift Curve (AUUC), are applied inconsistently across studies without systematic examination of whether they produce stable, reproducible model rankings (Devriendt, Guns and Verbeke, 2020; Bokelmann and Lessmann, 2024). Gubela and Lessmann (2021) provide the most direct evidence of the problem: conventional metrics including Qini and AUUC produce different model rankings for the same models on the same data. For practitioners making deployment decisions, this inconsistency means consequential choices may rest on an unreliable evidential basis.
The overall research aim is to investigate whether this divergence can be characterised and quantified through a structured comparative evaluation. The study will use the Criteo Uplift dataset (Criteo, 2018), comprising approximately 13.9 million observations from a randomised controlled trial, providing a benchmark in which the causal structure is known by design.
The practical component involves developing a modular Python framework to implement and benchmark three uplift estimators: the S-Learner and T-Learner (meta-learner approaches) and a tree-based uplift model using KL-divergence splitting, as surveyed by Devriendt et al. (2018), alongside a random targeting baseline. Models will be evaluated against both Qini and AUUC, with ranking consistency quantified using Kendall's tau and Spearman's rho across fifty bootstrap subsamples, following the statistical comparison framework established by Demsar (2006). Zhao and Harinen's (2020) work on meta-learner extensions informs the model selection rationale, though cost-heterogeneous variants are deferred as a future extension.
The research objectives are to critically evaluate the existing uplift modelling literature with particular attention to evaluation methodology; implement a reproducible experimental pipeline; compare model rankings across Qini and AUUC under controlled conditions; and analyse the conditions under which metric divergence is most pronounced. The full analysis plan will be pre-registered on OSF, and the framework will be containerised using Docker to ensure computational reproducibility.
This blog will document the development of the project across the coming weeks, including the decisions made, the challenges encountered, and the evolving understanding that emerges as the work progresses.
References
Bokelmann, B. and Lessmann, S. (2024) 'Improving uplift model evaluation on randomized controlled trial data', European Journal of Operational Research, 313(2), pp. 691-707. Available at: https://doi.org/10.1016/j.ejor.2023.09.018 (Accessed: 8 April 2026).
Criteo (2018) Criteo Uplift Prediction Dataset. Available at: https://ailab.criteo.com/criteo-uplift-prediction-dataset/ (Accessed: 5 March 2026).
Demsar, J. (2006) 'Statistical comparisons of classifiers over multiple data sets', Journal of Machine Learning Research, 7, pp. 1-30. Available at: https://www.jmlr.org/papers/volume7/demsar06a/demsar06a.pdf (Accessed: 4 April 2026).
Devriendt, F., Guns, T. and Verbeke, W. (2020) 'Learning to rank for uplift modeling', arXiv preprint arXiv:2002.05897 [Preprint]. Available at: https://arxiv.org/pdf/2002.05897 (Accessed: 7 March 2026).
Devriendt, F., Moldovan, D. and Verbeke, W. (2018) 'A literature survey and experimental evaluation of the state-of-the-art in uplift modelling: a stepping stone toward the development of prescriptive analytics', Big Data, 6(1), pp. 13-41. Available at: https://doi.org/10.1089/big.2017.0104 (Accessed: 7 March 2026).
Gubela, R.M. and Lessmann, S. (2021) 'Uplift modeling with value-driven evaluation metrics', Decision Support Systems, 155, article 113648. Available at: https://doi.org/10.1016/j.dss.2021.113648 (Accessed: 9 March 2026).
Olaya, D., Vásquez, J., Maldonado, S., Miranda, J. and Verbeke, W. (2020) 'Uplift modeling for preventing student dropout in higher education', Decision Support Systems, 134, article 113320. Available at: https://doi.org/10.1016/j.dss.2020.113320 (Accessed: 11 April 2026).
Radcliffe, N.J. (2007) 'Using control groups to target on predicted lift: building and assessing uplift models', Direct Marketing Analytics Journal, 1, pp. 14-21.
Zhao, Z. and Harinen, T. (2020) 'Uplift modeling for multiple treatments with cost optimization', arXiv preprint arXiv:1908.05372v3 [Preprint]. Available at: https://arxiv.org/pdf/1908.05372 (Accessed: 7 March 2026).