Author Archives: Nicholas Mays

‘Research units are performing a difficult balancing act … but we’re still smiling.’

by nicholas mays

Our ambition to co-produce evidence with advisors and officials is fraught with challenges, but remains a worthy goal with valuable benefits, explains PIRU director, Nicholas Mays.

When PIRU was set up three and a half years ago, there was a great deal of ambition on all sides. The Department of Health, as funder, wanted us ‘to strengthen the use of evidence in the initial stages of policy making’. That was the distinctive, exciting bit for us. We were to support or undertake evaluation of policy pilots or demonstration initiatives across all aspects of the Department’s policy activity – public health, health services and social care.

We were also brave, seeking to ‘co-produce’ evidence by working closely with policy advisors and officials, aiming to break down conventional sequences in which evaluation tends to follow policy development. We wanted early involvement from horizon scanning to innovation design and implementation design, plus support work for evaluations or to do them ourselves. It was clear that if we could be engaged, flexible and responsive, officials would be more likely to work with us.

Some researchers prefer planned, longer term work. They see the responsive element as regrettably necessary to pay the mortgage. In fact, our more responsive work has often turned out to be the most interesting:  some of it we would probably have planned to do in any case; other parts have led to substantial pieces of research. It can be highly productive, not least because policy advisors are fired up about the findings.

Wide-ranging roles

In our first years, we have tried hard to work across all stages of policy development. To support the early stages of policy innovation, we did some rapid evidence syntheses.  We have advised on the feasibility of a number of potential evaluations – for example, we looked at the Innovation Health and Wealth Strategy to examine which of the strategy’s 26 actions could credibly be evaluated. We have advised on the commissioning and management of early stage policy evaluations. We have also helped define more precisely what the intervention is in a particular pilot because, in pilot schemes or demonstrations, the ‘what’ is often presumed, but can actually be rather unclear.

We had expected to guide roll-out, using the learning from evaluations, but that’s not always easy for academic evaluators. PIRU often works with different parts of the social care and health policy system, perhaps for quite short periods of time, which is a very different relationship from working, say, with clinicians for an extended period.  Also, in policy and management, unlike the clinical world, people change jobs fairly frequently making it difficult to sustain relationships.

We have also advised on modelling and simulation, which is useful for playing out possible effects of innovations and to debate potential designs. However, that work typically tends to happen within government rather than through outsiders such as PIRU.


Indeed, we have found it difficult to become involved in the early stages of policy development, partly because health and social policy decision-making in England has been restructured and become more complicated as a result of the Health and Social Care Act 2012. There are new agencies and new people, altering long-established relationships between policy makers and evaluators.

Engaging us early on is also demanding. It requires greater openness and communication within government, so that research managers actually know when an initiative is starting, and a willingness to share early intelligence with outsiders in the research community. Some policy makers also find that the perceived benefits of sharing new thinking with us fails to outweigh the perceived risks of having us at the table early on.


There have been other big issues. How close should evaluators get to those who commission an evaluation? How candid – and sometimes negative – should we be?  Should we refuse to do an impact evaluation because we know that too little time will be allowed to elapse to demonstrate a difference?  Should we actively create dissonance with customers who are also funders through a process of constructive challenge? Strangely, the researchers are sometimes the ones saying, ‘No, we should not be looking at outcomes. You are better doing a process evaluation or no evaluation at this stage.’ In some cases, the researchers are asking for less evaluation and the policy makers are asking for more.

Can it be predicted that certain pilots do not realistically lend themselves to being evaluated? For example, we conducted a study of a pilot scheme allowing patients to either visit or register with GP practices outside the area in which they live.  We highlighted in our report that we couldn’t look at the full range of impacts in the 12 months for which the pilot ran.  Nevertheless, critics of the policy were annoyed with the evaluation because it was seen to legitimise what was, in their minds, an inadequate pilot of a wrong-headed policy.

We frequently have to say that the policy pilot will take a lot longer than expected to be implemented. However, the commissioners of evaluation often have no time to wait and want the results right away. The danger is that lots of time is spent interviewing people and looking for implementation effects, only to discover that not very much has happened yet.

So we face many challenges. But that’s hardly surprising. In an ideal world, we would have closer sets of relationships with a defined set of potential users. In reality, we are working across a very wide range of policy issues with an overriding expectation that we should engage at an early stage and speedily. It’s a difficult but rewarding balancing act.

Nicholas Mays is Professor of Health Policy at the London School of Hygiene and Tropical Medicine and Director of PIRU. This piece is based on a presentation that Professor Mays gave at the meeting ‘Evaluation – making it timely, useful, independent and rigorous’ on 4 July 2014, organised by PIRU at the London School of Hygiene and Tropical Medicine, in association with the NIHR School for Public Health Research and the Public Health Research Consortium (PHRC).


Rapid, real-time feedback from evaluation, as well as programme flexibility, is vital to health service improvement

By Tom Woodcock


We should learn from Melanesian Islanders that feedback provides the deep understanding of interventions which is key to wide-scale, successful roll-out, says Tom Woodcock

Osmar White, the Australian war correspondent, told an amazing story about how, after the Second World War, when military bases had closed in the Pacific, Melanesian Islanders built crude imitation landing strips, aircraft and radio equipment, and mimicked the behaviour that they had observed of the military personnel operating them. White’s book, ‘Parliament of a Thousand Tribes’, explains how the islanders were trying to reproduce the glut of goods and products that Japanese and American combatants had brought to the region. They believed that they could actually summon these goods again.

In her recent paper published in Implementation Science*, Professor Mary Dixon-Woods of Leicester University highlights this extraordinary story as a graphic illustration of how an innovation fails to be replicated successfully in different circumstances because there is poor understanding of the original intervention. It illuminates the difficulties that can arise when one tries to implement and roll out improvement programmes.  Deep understanding of the intervention is vital.

How do we achieve that understanding? It’s a big issue for NIHR’s Collaboration for Leadership in Applied Health Research and Care (CLAHRC). In Northwest London, we’re funded for the next five years to accelerate the translation of health research into patient care. Our experience is that rapid and continuous real-time feedback from evaluation, combined with flexibility in programme adaptation, is vital to ensure rapid improvement of health service practice. It is also central to meeting the longer-term challenges of achieving sustainability and reproducibility of change.

Challenges of transferring successful interventions

The nature of the challenge was highlighted by the Michigan Central Line Project. This was a highly successful US quality improvement project designed to reduce central line infections. Mortality was reduced significantly. ‘Matching Michigan’ was a subsequent initiative in 200 English hospitals to replicate Michigan’s results. It didn’t work as well as hoped. Drawing parallels with the Melanesian story, Professor Dixon-Woods’ paper argues that the Michigan innovation transfer likewise demonstrated inadequate understanding of the true intervention.

How can real-time evaluation help to avoid these misunderstandings? First, it offers a better chance to optimise interventions in their original settings, as well as in subsequent roll-out sites. Secondly, it can lead to a richer, more real understanding of the system and how it works. This can lead, I believe, to a fuller evaluation and more successful transferability.  The opportunity offered by real-time evaluation might be at a specific project level, implementing an intervention at a specific setting, but its strengths are also useful at higher policy levels and in the support and training levels lying between policy and practice.

Why does testing an intervention in situ with real time evaluative feedback produce a better eventual implementation? That’s partly due to being able to fit the intervention to its context effectively. The project team gain much better insight into what is actually currently happening during implementation, which is sometimes highly complex, making it easy to miss key aspects of what is occurring. There can also be early checks on the intended impacts – if an intervention is being implemented successfully but not improving outcomes, there are statistical approaches that allow evaluators to explore the reasons quickly and take appropriate action. Feedback also increases motivation and engagement within the initiative, encouraging reflective thought.

A closer working relationship between evaluators and the team can expose underlying assumptions within an intervention which might otherwise be obscured. Typically, members of the team also better appreciate the value of evaluation, leading them to develop higher quality data. Team challenges to the data – observations that ‘this does not make sense to me’ – can be illuminating and help create both between and within site consistency. In her ‘Matching Michigan’ study, Mary Dixon-Woods highlights huge inconsistencies between the data collected in the different sites despite each site supposedly working to an agreed, common operational framework.  Achieving such consistency is extremely difficult.  Close working between the evaluation and implementation teams can help and it provides greater access to the mechanism in which, and by which, an intervention works. It offers a lot of information about sensitivity and specificity of measures.

Challenges of real time evaluation

Real time feedback and evaluation does have problems, being more resource intensive and potentially blurring the lines between an evaluation and the intervention itself. There are methodological challenges – if early feedback is followed by a working and responsive change, then the evaluation is, in theory, dealing with a different intervention from the one it began to examine.  Inevitably, there are questions about the impartiality of the evaluators if they work very closely with the implementation team.

At CLAHRC Northwest London, we reckon that the increased costs of real time feedback are more than outweighed by the benefits.  It helps that the very nature of the interactive feedback implies starting on a smaller scale, which can allow an initial programme to build in the interactive feedback and then later findings can be used to roll out.

It is vital to clarify the intervention.  Laura J Damschroder’s 2009 paper** published in Implementation Science reviews the literature to articulate a framework distinguishing the core and the periphery of an intervention. The core represents the defining characteristics which should be the same wherever implemented, but there is also the flexible, context-sensitive periphery.

Regarding concerns about compromising objectivity, that is essentially a case of planning carefully, delivering against the protocol and then justifying and accurately reporting any additional analyses or modifications so that anyone reading an evaluation understands what was planned originally and what was added as part of the interactive feedback.

Typically, people tend to think of two distinct processes – implementation and evaluation. In CLAHRC NWL, there is much more overlap.  The CLAHRC NWL support team essentially perform an evaluative role and attend implementation team meetings to provide real time evaluation feedback on the project measures. Biannually, Professor James Barlow and his team at Imperial College London provide evaluation of the CLARC NWL programme, predominantly at higher levels, but there is still an interactive process going on.

Clarity about interventions

Take, for example, our programme to improve the management of chronic obstructive pulmonary disease (COPD).  There are some high level factors that we wish to influence by implementing the intervention, including reduced patient smoking, increased capacity to use inhalers properly when patients are out of hospital plus better general fitness and levels of exercise. There are a whole series of interventions, ranging from general availability of correct inhaler advice to much more specific provision of specialist staff education sessions, improving their ability to train patients in inhaler techniques. This is a useful way of separating the core of the intervention from the periphery – the more one is discussing generalities, the closer one is to the core of the intervention, whereas detailed particular measures are more sensitive to local context. So, for example, it may be in one hospital, there is already an embedded staff training programme on inhaler technique, so it is unnecessary to implement this peripheral intervention in that situation.

Implementation is clearly complex. Real time feedback, I believe, can help improvement programmes develop and to be implemented successfully.  It also can make for a better evaluation as well, but that requires very particular approaches to ensure rigour.

Dr Tom Woodcock is Head of Information at NIHR CLAHRC Northwest London and Health Foundation Improvement Science Fellow. This piece is based on a presentation that Dr Woodcock gave at the meeting ‘Evaluation – making it timely, useful, independent and rigorous’ on 4 July 2014, organised by PIRU at the London School of Hygiene and Tropical Medicine, in association with the NIHR School for Public Health Research and the Public Health Research Consortium (PHRC).


* Dixon-Woods, M. et al (2013) “Explaining Matching Michigan: an ethnographic study of a patient safety program”, Implementation Science 8:70.

** Damschroder, L.J. (2009) “Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science”, Implementation Science 4:50.

Follow Africa’s lead in meticulous evaluation of P4P schemes for healthcare

By Mylene Lagarde

Working with researchers to evaluate the introduction of financial incentives in developed healthcare economies would yield vital knowledge, explains Mylene Lagarde

The jury is very much out on pay-for-performance (P4P) schemes in healthcare – at least as far as the research community is concerned. Lots of unanswered questions remain over their effectiveness and hidden costs, as well as potential unintended consequences and their merit relative to other potential approaches. Yet many policy makers seem to have made their minds up already. These schemes, which link financial rewards to healthcare performance, make sense intuitively. They are being introduced widely.

This disconnection between the research and policy-making worlds means that we are almost certainly not getting the best out of P4P initiatives. Perhaps more worrying, there is a danger that the tree will hide the forest – that the attractive, sometimes faddish, simplicity of pay-for-performance may obscure other, perhaps more complicated but possibly more cost-effective ways to improve healthcare. As systems struggle to configure themselves to address modern demographics and disease profiles, harnessing latest technologies, we need to know what works best to reshape behaviours.

There are three key issues that weaken case for P4P in healthcare, as we set out in the PIRU report “Challenges of payment-for-performance in health care and other public services – design, implementation and evaluation”. These concern a lack of evidence about their costs and effectiveness and for identifying which particular P4P designs may work better than others.

First, costs. P4P schemes are complex to design. They usually involve lots of preliminary meetings between the many participants. Yet studies have largely ignored these transaction costs and frequently also fail to track and record carefully the considerable costs of monitoring performance.

Second, the effectiveness of P4P is often impossible to assess with enough certainty. Typically, introduction of a new scheme does not include a control group. For example, if a scheme incentivises reduced hospital length of stay or emergency admissions for one hospital, it may be difficult to find a comparable hospital to serve as a counterfactual. That makes it harder to attribute a particular change to P4P – maybe it would have happened anyway.

Furthermore, only small groups of outcomes are usually monitored by P4P schemes, so evaluators may be left with a narrow, and thus weak, selection of effects. For example, reductions in hospital lengths of stay may be identified, but these may coincide with poorer outcomes elsewhere in the system, such as increased admissions to nursing homes. These unintended effects, perhaps reflecting a shift rather than a reduction in costs and problems, are often not collected by the programme. That makes whole system analysis difficult.

Third, P4P is not a unique and uni-dimensional intervention. It is a family of interventions. They are all based on the premise that financial incentives can support change, but there are many variables: the size of the reward; how frequently it is offered; whether it is focussed on relative or absolute targets; whether it is linked to competition between providers or it is universally awarded. Very often, one type of intervention is used but another might equally well be employed. Each variation can produce different results, yet we still know little about the relative performance of alternative designs for these incentive schemes.

Researchers are not completely in the dark about P4P in healthcare. We are beginning to understand factors that characterise successful schemes. These typically involve a long lead-in time to plan, test and reflect carefully on the different elements of a programme. However, we must strengthen evaluation.

The first step would be to involve researchers at an early stage of the programme design. That’s the moment to spot where in the system you might need data to be collected. It’s also the time to identify control groups so that the causal impacts of these programmes can eventually be attributed more confidently.

Good evaluation requires political willingness to evaluate, which is sometimes lacking. When an initiative has a political breeze behind it, policy makers worry that researchers will let the wind out of the sails. But some Low and Middle Income Countries are taking the risk. There have been large numbers of randomised controlled trials over the last few years in African countries, looking at the effects of P4P schemes. Most are ongoing, but, so far, the evidence is promising. Rwanda was one of the first African countries to evaluate these financial incentives, mainly for increasing uptake of primary healthcare. Its programme is now being scaled up.

Why is Africa leading the way in setting high standards for P4P evaluation? Because the funders of these schemes, typically external donors (e.g. the World Bank, DfID, USAID), are well placed to demand meticulous evaluation by the receiving governmental authorities as a condition for the cash. Researchers, particularly in developed countries, rarely enjoy such firm leverage over national policy makers. And national policy-makers in these countries do not apply to themselves the degree of scrutiny they exercise with international aid recipients. Yet, if we are to get the best out of P4P – and not attach potentially false hopes to this healthcare innovation – we need more of the disciplined approach that is currently being used in Africa.

Dr Mylene Lagarde is a Senior Lecturer in Health Economics at the London School of Hygiene and Tropical Medicine. “Challenges of payment-for-performance in health care and other public services – design, implementation and evaluation” by Mylene Lagarde, Michael Wright, Julie Nossiter and Nicholas Mays is published by PIRU.

Don’t ditch evaluations just because pilots are hitch-free

By Nicholas Mays

PIRU’s experience with the ‘Choice of GP Practice Pilot’ suggests the need for continuing independent evaluation of policy roll-outs, 

The greatest benefits – and potential disbenefits – of any piloted policy change are usually felt in the longer term and after roll-out. Yet evaluations are often quite short term, sometimes ending before really important issues emerge and possibly even cast a shadow over the enterprise. So we should think carefully before we ditch evaluations once initial pilots show few or no major hitches.

PIRU has evaluated a pilot that let patients register with a GP practice even if they lived outside the practice’s catchment area.  Some 43 practices in three urban areas, half of them in Westminster, were involved in the 12 month pilot and just over a thousand patients registered ‘out of area’.  About a third taking advantage were commuters, often young, working and in good health. About a quarter were moving house and keen to retain their GP practice, while another quarter had picked a local practice only to find that, though they were technically outside the practice catchment area, they were able to register. Finally, about one in seven used the option to register out-of-area for different reasons, such as wanting a practice that offered specialisation in a particular condition.

In short, the pilot revealed a small number of generally positive patients. There were a few practical problems but they did not seem insurmountable. Armed with these findings, the Government recently announced the scheme’s roll-out across the country on a voluntary basis. Our evaluation finished when the pilot ended.

Yet that is really just the beginning, rather than the end of the story. Roll-out will affect not just a thousand but possibly hundreds of thousands of patients, as well as hundreds of practices  – not just in the pilot areas of Westminster, Salford, Manchester and Nottingham City. The pilot was for 12 months, but some of the practices did not register any ‘out of area’ patients until six months and a quarter of the practices didn’t register any at all. The roll-out will carry on until further notice. It is likely to gather momentum as the option of ‘out of area’ registration becomes increasingly widely known. But we don’t really know the full consequences. Why? Because the roll-out is essentially an experiment. Yet the evaluation has ceased.

What should any further evaluation look at?  It would be good to be able to look at the set up and running of this scheme on a national basis and to assess the overall impacts in terms of costs, usage and health outcomes. There are some important other questions to answer.

First, will there be problems managing GP capacity in areas with large inward and outward flows of patients? For example, a GP from a rural area expressed concern to me about the potential flight of mainly young, relatively healthy commuters, who might prefer to register close to their work (as our evaluation suggested). These comparatively infrequent, fitter users of health services partly cross-subsidise older, more frequent users. The GP feared that their loss might challenge practice viability in rural areas.

At the other end, some GPs in London have expressed concerns about striking the right balance of care between residents and incomers. Some GPs feel their practices are already over-stretched by a high-need, elderly population with multiple long-term conditions. They worry about resources being diverted by an influx of younger commuters attending with mainly self-limiting conditions. Practices might end up not having the capacity to register local residents who would then have to travel further and register out-of-area themselves.  GPs also worry about the consequences of patients staying on their lists when they move house even short distances beyond the practice catchment, particularly if they are elderly and require home visits.  In a congested city, this could make a big difference to the number of patients that the doctor can see in a day.

More broadly, we have yet to see whether loosening the rules of registration may lead to lists becoming socio-economically segregated and how that shift might be managed in terms of the allocation of finance to different practices.

Second, there are also the unexplored issues of the challenges and costs to CCGs of funding diagnostics and hospital care for those registered with GPs far from their homes. It will be important that the numbers of out-of–area patients registered with practices within CCGs are kept up to date, so undercounting does not lead to underfunding of the CCG.

The system will also need to be sensitive to the possibly rapidly changing needs of patients registered out of area. For example, a pregnant woman might wish to receive her ante-natal care close to work in London, but access peri-natal, delivery, post-natal and paediatric care closer to home. Similar issues may arise with patients requiring continuing care. Will GP practices be flexible about de-registering and re-registering patients in such circumstances?  And how well will emergency primary care be provided to patients near where they live when they are registered with practices elsewhere?

We can expect that at least some of these issues will cause problems. The fact that the ‘pilot’ phase of this scheme has not been long enough to explore them raises questions about the purpose of pilots. Researchers tend to think of a pilot as an experiment before a programme’s adoption. Others see pilots simply as feasibility studies. More often than not, the fact that a pilot has been set up shows that it already has a lot of government support and roll-out is essentially a done deal, with the pilot designed to spot any big wrinkles and to deal with critics.

Whatever the truth about pilots – and it probably varies across government – we do need to appreciate that roll-outs often remain, as in this case, experiments just as much as the initial pilots. No doubt NHS England will monitor developments. However, knowledge and policy would benefit from further in-depth, independent evaluation of how things are working out.

Nicholas Mays is Professor of Health Policy at the London School of Hygiene and Tropical Medicine and Director of PIRU. He is lead author of “Evaluation of the choice of GP practice pilot, 2012-13: Final Report”, published by PIRU in March 2014.

Too much spin can seriously damage the health of spin-offs

By Nick Mays

The recent Department of Health report, ‘Innovation, Health and Wealth’* tells an intriguing story about the potential economic benefits of the NHS. It goes further than rehearsing how it helps to develop a healthy, productive and economically active population. The report also states that the NHS supports the life sciences industry. So far, so good. But more controversially, it contends that, ‘by exporting innovation, ideas and expertise’, the NHS provides new business opportunities abroad for UK-based companies.

This interesting argument for NHS innovation is on top of the report’s central case that innovation helps ‘deliver more health benefit for a given public resource’. The report engages directly with understandable concerns that commerce might overtake health considerations by stressing the importance of robust evaluations that do not bend to industry pressures. Indeed, it makes some sound proposals to strengthen the hand of NICE, whose rigour and independence are exemplary.

Nevertheless, I would sound a warning. Amid the talk of promoting UK plc, there remains a risk of getting carried away and heaping NHS praise on developments of dubious overall benefit – if only to help secure overseas markets. Such a step would not only be bad for the NHS, it could also damage UK business in the long-run.

How might this happen? Surprisingly easily. All of us in research are aware of how studies and evaluations, however robust, can quickly lose their nuanced, shaded complexities in the hands of eager advocates. A study full of doubts, caveats and question marks can, once passed to an enthusiastic PR department, suddenly be transformed into a panacea.

Both the NHS R&D community – and UK business – should bear in mind the example of the US Food and Drug Administration (FDA). Although FDA approval is often refused, that’s not necessarily bad for business: the FDA’s rigour – and willingness to say ‘no’ – means that anything bearing its kite mark will be acceptable for sale anywhere. We should, likewise, be careful to maintain the NHS as a globally trusted brand.

So we should be wary not just about ensuring robust evaluation. We should also keep a close eye on the way those evaluations are presented publicly and, in particular, to the media. It is easier to lose a reputation than it is to gain one. In the long-run, too much spin could seriously harm the Health Service’s valuable business spin-offs.

Nicholas Mays is Director of the Policy Research Unit in Policy Innovation Research (PIRU) and Professor of Health Policy at the London School of Hygiene and Tropical Medicine.

*To read ‘Innovation, Health and Wealth’ go to: