Measuring What Actually Matters

Posted: 22 May 2026 | PROM05 Week 4

Author: Kemone S-G Brown

There is a well-known problem in project management: projects can fail while their metrics show no issues. This happens when the wrong things are being measured, or when measurements are chosen because they are easy to collect rather than because they reflect what genuinely matters for the project's success. Wingate (2025) is direct about this risk, noting that diligently monitoring metrics that give the appearance of progress, rather than identifying actual risks, wastes resources and makes it impossible to address real issues in a timely manner.

This week I have been thinking carefully about what success actually means for this project, and what measurements will tell me whether I am genuinely progressing toward it.

Defining Success

Wingate (2025) argues that success is subjective and must be explicitly defined and agreed upon before the project begins, because different stakeholders can assess the same outcome differently. For a research project, this complicates the picture in an interesting way: success is not simply delivering the dissertation on time. It has at least three distinct dimensions that must all be achieved simultaneously.

At the project management level, success means completing all deliverables within the PROM06 timeline to the required academic standard. At the research level, success means answering the primary research question with sufficient statistical precision to support a defensible conclusion about whether the Qini coefficient and AUUC produce consistent model rankings. At the contribution level, success means producing a reproducible, openly pre-registered evaluation framework that others can verify and extend.

The third dimension is the one most easily overlooked in a standard project management framework, but it is arguably the most important for the long-term value of the work. A dissertation that answers the research question but cannot be independently verified is a weaker contribution than one where every analytical decision is documented, every parameter is fixed before results are seen, and the entire pipeline can be re-run by any researcher with access to the Criteo dataset and the public repository.

What Gets Measured

Wingate (2025) distinguishes between measurements that track schedule adherence and measurements that track progress along the research trajectory. For R&D projects specifically, he argues that the latter are more valuable. A metric that tracks whether the data ingestion module was completed by a particular date measures schedule adherence. A metric that tracks whether the module produces identical outputs across three independent runs on different machines measures the reproducibility commitment that is central to the project's entire research contribution.

Applying this distinction to the project, the key performance indicators I am tracking are: the mean Kendall's tau across fifty bootstrap subsamples measured against the 0.7 threshold, a conservative pre-registered decision rule within Evans's (1996) strong correlation band (0.60-0.79); a binary reproducibility check confirming that re-running the analysis pipeline produces identical outputs; the pre-registration status on OSF, which must be completed before any model is trained; and the width of the bootstrap confidence intervals, which must be sufficiently narrow to support a directional conclusion.

None of these are schedule metrics. All of them reflect whether the research itself is being conducted with the integrity the question demands.

These KPIs are not simply measurement criteria; they are decision rules. If the mean Kendall's tau confidence interval is too wide at the fifty-subsample mark to support a directional conclusion, a pre-specified sensitivity analysis using one hundred subsamples is triggered before any interpretation proceeds. If a reproducibility check fails at any point during the implementation phase, the affected module returns to development regardless of what the schedule says. If the OSF pre-registration gate is not passed before the first model is trained, the analysis does not begin. These are commitments made now, before any results exist, so that the project-level response to deviation is not a judgement call made under pressure but a predetermined course of action. Wingate (2025) frames this as the difference between active management and reactive management: active management defines what it will do before problems arise; reactive management decides what to do after they have already cost time and credibility.

Planning at the Forefront of a Discipline

The module prompt for this week asks how a project plan would be tailored to research at the forefront of a discipline. For causal machine learning and uplift modelling, the answer is that standard project management frameworks need two significant adaptations.

The first is the pre-registration requirement. In a field where metric inconsistency is the research question, the analytical choices cannot be made after the results are visible. The pre-registration on OSF functions as a gate review in Wingate's (2025) terms: the project cannot proceed to the analysis phase until this gate has been passed. This is not a bureaucratic constraint but a methodological one; it is what makes the findings credible rather than convenient.

The second adaptation is the treatment of computational reproducibility as a project management concern rather than a technical afterthought. Docker containerisation, fixed random seeds, and version-controlled implementation are not finishing touches; they are success criteria that must be verified throughout the implementation phase, not only at submission. Phelps, Fisher and Ellis (2007) identify writing as the most difficult element to schedule because it does not fit neatly into a timeline; for computational research, reproducibility verification has the same character. It requires repeated validation at multiple points and cannot be rushed or deferred.

Davis (2019) adds a leadership dimension to this that I find genuinely useful. The interdependent leader, in her framing, declares goals before having a plan and invites co-creation rather than presenting a polished outcome. Pre-registration is exactly this: it is a public declaration of what the project is trying to find out, before the answer is known, made to the research community rather than to a small group of stakeholders. The openness is not a vulnerability but a commitment.

References

Davis, L. (2019) A guide to collaborative leadership [TED Talk]. Available at: https://www.ted.com/talks/lorna_davis_a_guide_to_collaborative_leadership (Accessed: 10 May 2026).

Evans, J.D. (1996) Straightforward Statistics for the Behavioral Sciences. Pacific Grove, CA: Brooks/Cole.

Phelps, R., Fisher, K. and Ellis, A.H. (2007) Organizing and Managing Your Research: A Practical Guide for Postgraduates. London: SAGE Publications. Available at: https://ebookcentral.proquest.com/lib/sunderland/detail.action?docID=354865&pq-origsite=primo (Accessed: 3 May 2026).

Wingate, L.M. (2025) Project Management for Research and Development: Guiding Innovation for Positive R&D Outcomes. 2nd edn. Auerbach Publications. Available at: https://learning.oreilly.com/library/view/project-management-for/9781040326671/ (Accessed: 10 May 2026).