Monthly Archives: October 2014

Rapid, real-time feedback from evaluation, as well as programme flexibility, is vital to health service improvement

By Tom Woodcock


We should learn from Melanesian Islanders that feedback provides the deep understanding of interventions which is key to wide-scale, successful roll-out, says Tom Woodcock

Osmar White, the Australian war correspondent, told an amazing story about how, after the Second World War, when military bases had closed in the Pacific, Melanesian Islanders built crude imitation landing strips, aircraft and radio equipment, and mimicked the behaviour that they had observed of the military personnel operating them. White’s book, ‘Parliament of a Thousand Tribes’, explains how the islanders were trying to reproduce the glut of goods and products that Japanese and American combatants had brought to the region. They believed that they could actually summon these goods again.

In her recent paper published in Implementation Science*, Professor Mary Dixon-Woods of Leicester University highlights this extraordinary story as a graphic illustration of how an innovation fails to be replicated successfully in different circumstances because there is poor understanding of the original intervention. It illuminates the difficulties that can arise when one tries to implement and roll out improvement programmes.  Deep understanding of the intervention is vital.

How do we achieve that understanding? It’s a big issue for NIHR’s Collaboration for Leadership in Applied Health Research and Care (CLAHRC). In Northwest London, we’re funded for the next five years to accelerate the translation of health research into patient care. Our experience is that rapid and continuous real-time feedback from evaluation, combined with flexibility in programme adaptation, is vital to ensure rapid improvement of health service practice. It is also central to meeting the longer-term challenges of achieving sustainability and reproducibility of change.

Challenges of transferring successful interventions

The nature of the challenge was highlighted by the Michigan Central Line Project. This was a highly successful US quality improvement project designed to reduce central line infections. Mortality was reduced significantly. ‘Matching Michigan’ was a subsequent initiative in 200 English hospitals to replicate Michigan’s results. It didn’t work as well as hoped. Drawing parallels with the Melanesian story, Professor Dixon-Woods’ paper argues that the Michigan innovation transfer likewise demonstrated inadequate understanding of the true intervention.

How can real-time evaluation help to avoid these misunderstandings? First, it offers a better chance to optimise interventions in their original settings, as well as in subsequent roll-out sites. Secondly, it can lead to a richer, more real understanding of the system and how it works. This can lead, I believe, to a fuller evaluation and more successful transferability.  The opportunity offered by real-time evaluation might be at a specific project level, implementing an intervention at a specific setting, but its strengths are also useful at higher policy levels and in the support and training levels lying between policy and practice.

Why does testing an intervention in situ with real time evaluative feedback produce a better eventual implementation? That’s partly due to being able to fit the intervention to its context effectively. The project team gain much better insight into what is actually currently happening during implementation, which is sometimes highly complex, making it easy to miss key aspects of what is occurring. There can also be early checks on the intended impacts – if an intervention is being implemented successfully but not improving outcomes, there are statistical approaches that allow evaluators to explore the reasons quickly and take appropriate action. Feedback also increases motivation and engagement within the initiative, encouraging reflective thought.

A closer working relationship between evaluators and the team can expose underlying assumptions within an intervention which might otherwise be obscured. Typically, members of the team also better appreciate the value of evaluation, leading them to develop higher quality data. Team challenges to the data – observations that ‘this does not make sense to me’ – can be illuminating and help create both between and within site consistency. In her ‘Matching Michigan’ study, Mary Dixon-Woods highlights huge inconsistencies between the data collected in the different sites despite each site supposedly working to an agreed, common operational framework.  Achieving such consistency is extremely difficult.  Close working between the evaluation and implementation teams can help and it provides greater access to the mechanism in which, and by which, an intervention works. It offers a lot of information about sensitivity and specificity of measures.

Challenges of real time evaluation

Real time feedback and evaluation does have problems, being more resource intensive and potentially blurring the lines between an evaluation and the intervention itself. There are methodological challenges – if early feedback is followed by a working and responsive change, then the evaluation is, in theory, dealing with a different intervention from the one it began to examine.  Inevitably, there are questions about the impartiality of the evaluators if they work very closely with the implementation team.

At CLAHRC Northwest London, we reckon that the increased costs of real time feedback are more than outweighed by the benefits.  It helps that the very nature of the interactive feedback implies starting on a smaller scale, which can allow an initial programme to build in the interactive feedback and then later findings can be used to roll out.

It is vital to clarify the intervention.  Laura J Damschroder’s 2009 paper** published in Implementation Science reviews the literature to articulate a framework distinguishing the core and the periphery of an intervention. The core represents the defining characteristics which should be the same wherever implemented, but there is also the flexible, context-sensitive periphery.

Regarding concerns about compromising objectivity, that is essentially a case of planning carefully, delivering against the protocol and then justifying and accurately reporting any additional analyses or modifications so that anyone reading an evaluation understands what was planned originally and what was added as part of the interactive feedback.

Typically, people tend to think of two distinct processes – implementation and evaluation. In CLAHRC NWL, there is much more overlap.  The CLAHRC NWL support team essentially perform an evaluative role and attend implementation team meetings to provide real time evaluation feedback on the project measures. Biannually, Professor James Barlow and his team at Imperial College London provide evaluation of the CLARC NWL programme, predominantly at higher levels, but there is still an interactive process going on.

Clarity about interventions

Take, for example, our programme to improve the management of chronic obstructive pulmonary disease (COPD).  There are some high level factors that we wish to influence by implementing the intervention, including reduced patient smoking, increased capacity to use inhalers properly when patients are out of hospital plus better general fitness and levels of exercise. There are a whole series of interventions, ranging from general availability of correct inhaler advice to much more specific provision of specialist staff education sessions, improving their ability to train patients in inhaler techniques. This is a useful way of separating the core of the intervention from the periphery – the more one is discussing generalities, the closer one is to the core of the intervention, whereas detailed particular measures are more sensitive to local context. So, for example, it may be in one hospital, there is already an embedded staff training programme on inhaler technique, so it is unnecessary to implement this peripheral intervention in that situation.

Implementation is clearly complex. Real time feedback, I believe, can help improvement programmes develop and to be implemented successfully.  It also can make for a better evaluation as well, but that requires very particular approaches to ensure rigour.

Dr Tom Woodcock is Head of Information at NIHR CLAHRC Northwest London and Health Foundation Improvement Science Fellow. This piece is based on a presentation that Dr Woodcock gave at the meeting ‘Evaluation – making it timely, useful, independent and rigorous’ on 4 July 2014, organised by PIRU at the London School of Hygiene and Tropical Medicine, in association with the NIHR School for Public Health Research and the Public Health Research Consortium (PHRC).


* Dixon-Woods, M. et al (2013) “Explaining Matching Michigan: an ethnographic study of a patient safety program”, Implementation Science 8:70.

** Damschroder, L.J. (2009) “Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science”, Implementation Science 4:50.

Let’s be honest that pilots are not just about testing: they’re also about engineering the politics of change

By Stefanie Ettelt

There is more to policy piloting than evaluation – piloting is a policy tool in itself, not only a means for conducting research, says Stefanie Ettelt


Pilot evaluation tends to frustrate and disappoint some or all of its stakeholders, be they policy-makers, local implementers or evaluators, according to a study I have been working on for PIRU. Policy makers typically want robust, defensible proofs of success, ideally peppered with useful tips to avoid roll-out embarrassments. But they are distinctly uncomfortable with potentially negative or politically damaging conclusions that can also spring from rigorous evaluation.

Meanwhile, implementers of pilots at a local level don’t welcome the ambivalence that evaluation suggests, particularly when randomised controlled trials (RCTs) are used, given the associated assumption of uncertain outcome (equipoise). Implementers understandably worry that all their hard work putting change into action might turn out to have been a waste of time, producing insufficient improvement and leading to a programme being scrapped.

The evaluators may prefer a more nuanced approach than either of the above want, in order to capture the complex results and uncertainties of change. But this approach might find little favour with those commissioning the work.  Evaluators are often dissatisfied with the narrow time frames and limited sets of questions that are allowed for their investigations. They may feel tasked with gathering what they consider to be over-simplistic measures of success as well as being disappointed to discover that a roll-out has begun regardless of –  or even in advance of – their findings.

Keeping all of these stakeholders happy is a big ask. It’s probably impossible, not least because satisfying any one of these stakeholders may mutually exclude contentment among the others.  Why do we find ourselves in such a difficult situation?

Why is it so hard to satisfy everyone about pilots?

Perhaps this tricky issue is linked to the particular way in which British policy-making is institutionalised. These days, policy-making in the UK seems to be less ideologically-driven – or least supported by ideology – than it was in the past. With this loss of some ideological defences has also gone some of the perceived – albeit sometimes flawed – certainties that may once have protected policies from criticism. As a result, there are sometimes overblown expectations of research evidence in the UK and sometimes illusory beliefs that evidence can create new certainties.

The institutional design of the Westminster system perhaps invites excessive expectations that policy can be highly informed by evidence, because political centralisation means that there seem to be fewer actors who can veto decisions, than in some countries, for example, in Germany.  There are more regional players in Germany’s federal system who can veto, obstruct or influence a decision. Relatively minor coalition partners in Berlin also have a long standing tradition of providing strong checks and balances on the larger governing party. So, in Germany, there is more need for consensus and agreement at the initial policy-making stage. This participative process tends to reduce expectations of what a policy can deliver and also, perhaps, the importance of evidence in legitimising that policy.

Britain compared with Germany

In contrast, the comparatively centralised Westminster system seems more prone to making exaggerated claims for policy development and more in need of other sources of legitimacy. Piloting may, thus, at times become a proxy for consensus policy-making and a means of securing credibility for decisions. It might help to reduce expectations, and thus avoid frustration, if policy makers were clearer about their rationale for piloting. So, for example, they might explain whether a pilot is designed to promote policy or to question if the policy is actually a good way forward. If the core purpose is to promote policy, then some forms of evaluation such as RCTs may be inappropriate.

Evaluators understandably find it difficult to accept that the purpose of piloting and evaluation might first and foremost be for policy-makers to demonstrate good policy practice and to confirm prior judgements (i.e. ‘symbolic’). But there should be recognition that piloting sometimes does have such a political nature which is genuinely distinct from it having a purely evaluative role.

Of course, such a distinction is not made any easier by policy makers who tend to use rhetoric such as ‘what works’ and ‘policy trials/experiments’ when they already know that the purpose of the exercise is simply to affirm what they are doing. If policy makers – including politicians and civil servants – use such language, they really are inviting, and should be prepared to accept, robust evaluation and acknowledge that sometimes the findings will be negative and uncomfortable for them.

Improving piloting and evaluation

There are ways in which we can improve evaluation methods to make them more acceptable to all concerned. More attention could be given to identifying the purpose of piloting to avoid disappointment and manage the expectations of evaluators, policy-makers and local implementers. If the intention is to promote local and national policy learning more participation from local implementers in the objectives and design of evaluations of pilots would be desirable, so that these stakeholders might feel less worried by the process. Evaluators might also be more satisfied with more extensive use of ‘realist evaluation’. This approach particularly explores how context influences the outcomes of an intervention or policy, which is useful information for roll-out.

I would like to see local stakeholders to be more directly involved in policy-making and their role more institutionalised. So their involvement would be ongoing and not abandoned if it was considered unhelpful by a different incoming government. These are roles that need time to grow, to become embedded and for skills to develop.  Such a change would enhance the localism agenda.  It would also acknowledge that local implementers are already key contributors to national policy learning through all the local trial and error that they employ.

Dr Stefanie Ettelt is a Lecturer in Health Policy at the London School of Hygiene and Tropical Medicine. She contributes to PIRU through her work on piloting and through participating in the evaluation of the Direct Payments in Residential Care Trailblazers. She also currently explores the role of evidence in health policy, comparing England and Germany, as part of the “Getting evidence into policy” project at the LSHTM.