Book Review: The Tyranny of Metrics

The Tyranny of Metrics is a book by Jerry Z Muller about the use and (mostly) abuse of metrics. It is worth reading if you work in Education, striking a tone both radical and pragmatic. The opening section of the book makes a point – which seems less and less whimsical as the argument progresses – about how the word ‘accountable’ can mean ‘responsible to’ or ‘capable of being counted’

The book touches on Universities, but makes much broader points through a general discussion and a series of case studies on (e.g.) hospitals, schools, charities, armies, and police forces. I drew some wisdom from it, which I’ve tried to organise thematically below.

Whether to measure.

1. Metrics can be necessary in organisations too complex for their managers to fully comprehend.

2. Metrics are risky in organisations with multiple purposes. Focusing on metrics for one purpose will distort the way the other purposes are approached, and can even truncate the organisation’s mission: a focus on the measurable can lead managers and actors to neglect the unmeasurable.

3. Generally, performance metrics can help to allocate blame but not to encourage success.

4. Metrics cost something (usually staff time) and this is normally not recognised.

5. The tension between managerial metrics and professional behaviour can be actively productive if it helps align goals; it can be actively destructive if it is used to railroad one party.

6. Technology makes metrics easier to produce; it does not make them easier to use.

Descriptions or Targets?

1. Making targets out of metrics distorts the reliability of that metric (‘Campbell’s Law [aka ‘Goodhart’s Law’]). Actors respond to targets in subversive and unexpected ways, corrupting the interpretability of the metric. This is particularly true if metrics are linked to rewards (e.g. promotion, TEF gold etc).

2. Commitment to metrics can promote short-termism and curb entrepreneurship.

3. The benefits from measuring something the first time are not the same as measuring it repeatedly: new metrics can identify weaknesses, but the marginal cost of creating rolling metrics for everyone in perpetuity is huge.

4. It is inappropriate to frequently measure everyone using tools designed to discover extreme misconduct.

What to measure?

1. Metrics can be useful where they help actors to measure the things they feel professionally invested in.

2. Metrics can help describe complex situations, when paired with careful interpretation.

3. Metrics handed down from above are demoralising and deprofessionalising, degrading the stimulation and pride which actors can take in their work.

4. Measuring inputs is not the same as measuring outputs.

How to interpret?

1. Standardisation is seductive for comparing different situations, but standardised numbers might have been shorn of the very context which makes them meaningful: creating a consistent data set can drastically restrict the quality of information derived from it.

2. Metrics are not a substitute for judgement. In fact measurement demands judgement about whether what and how to measure, how to evaluate, and how to attach significance to the evaluation (e.g. pay, promotion).

I feel that three sets of metrics affect me significantly in my professional life: the workload model my employer uses, the student evaluation of teaching format my institution uses, and the sector-wide Excellence Frameworks the government uses. This book has helped me frame some of my interactions with these metrics in a more critical (and probably more constructive) way.

Reflections

I wonder often how to evaluate teaching. For my own practice, I am fairly uninterested in metrics; the most useful evaluation usually comes from face-to-face conversations with my students. As programme director (and, increasingly, as I look for ways to evidence that I teach well), I saw the value in having indicative metrics. Chemistry was hard to grapple with in this context; the disciplinary unit of delivery is much smaller than the University’s 20 credit standard module size, so standardised evaluation at the module level averages out students’ perceptions of (say) 5 lecturers and 8 lab demonstrators. Interpreting this data meaningfully is… difficult.

This problem is writ large in the NSS (and therefore TEF). The ‘accountability’ instrument is misaligned with existing levers for improvement – what does it mean when students give a high Student Voice score or a low Assessment and Feedback score? How do we even interpret these numbers? How should we respond? Perhaps the increased interest in answering these questions is positive, but the distance between the metric evaluation and the point of teaching delivery is hard to bridge. I found it tough interpreting data on modules happening in my own department; developing a robust institution-level response must be an absolute nightmare.

Michael O'NeillJune 1, 2019