How to commission evaluations of national policy pilots


Evaluations of national policy pilots are often embarked on with high expectations and end with a sense of frustration on all sides.  Policy-makers often expect clearer, and more positive, verdicts from evaluation than researchers are able to provide; researchers hope for their findings to be more influential; and implementers in pilot sites struggle to put in place what they think they are being expected to deliver within the limited timescale of the pilot while wondering what they have gained from either the pilot programme or the national evaluation.

To ease some of these frustrations, we have developed guidance aimed primarily at national level staff involved in policy-making and in initiating policy-relevant pilots and their evaluations.  We think the guidance will also be helpful to evaluators. Our advice stems from both experience and analysis of the fate of policy pilots (Ettelt et al, 2015a; Ettelt et al, 2015b).  Two observations, in particular, from evaluating policy pilots in health and social care have shaped our thinking.

The first observation is that many times it is not clear what an evaluation is intended to contribute to policy development.  This lack of clarity is often a symptom of a deeper problem which has more to do with confusion and conflicts over the reasons for piloting than with the evaluation itself.  Indeed, the objectives of the evaluation can be perfectly clearly expressed, and yet it can entirely ‘miss the point’ if the purpose of piloting is not thought through.  As we have argued elsewhere, policy pilots can serve different purposes, many of which have more to do with the realities of policy-making, and the dynamics of policy formulation and implementation, than with piloting for the purpose of testing effectiveness (Ettelt et al, 2015a).  Different groups involved in a policy pilot can have different ideas about the purpose of piloting.  Also, these purposes often change over time, for example, as a consequence of a ministerial decision to roll out the policy irrespective of whether the evaluation has been completed or not.  For example, the Direct Payments in Residential Care pilots, which PIRU is evaluating, were rebranded early in the life of the programme to become ‘trailblazers’ as it was decided, ahead of the results of the pilots, that direct payments would be rolled out nationally in 2016 alongside other aspects of the 2014 Care Act.  However, the policy context of the ‘trailblazers’ continues to change.  As a result, the Department of Health is currently reconsidering whether direct payments should move forward at the same speed as expected earlier.

We think it is important that the goals of such programmes are stated explicitly and that their implications are thought through carefully at the beginning of a pilot programme while it is still possible to make adjustments more easily than later in the process.  This is also the time to identify the target audience for the evaluation.  Whose knowledge is the evaluation aiming to contribute to?  There are likely to be important differences in the information needs and preferences of national policy-makers and local implementers that require some forethought if they are to be addressed adequately.

The second observation is that, under the influence of the mantra of ‘evidence-based policy’, policy-makers increasingly feel that they should prioritise specific research designs for the evaluations of policy pilots, especially experimental designs.  Yet, this consideration often comes too early in the discussion about pilot evaluations and is introduced for reasons that have more to do with the reputation of the design as producing particularly ‘valid’ evidence of policy effectiveness than with its appropriateness to generate insights given the objectives of the specific pilot programme.  The choice of research design does not make a programme more or less effective.  Conducting an RCT is pointless if the purpose of a pilot is to find out whether or not, and, if so, how, a policy can be implemented.  In such a situation, the ‘active ingredients’ of the intervention have not yet been determined and thus cannot be easily experimented with.  The Partnerships for Older People Projects (POPPs) pilots, conducted in the mid-2000s, are an example of a pilot programme that brought together a large number of local projects (of which about 150 were considered ‘core’), indicating an intention to foster diverse local innovations in care, with an evaluation commissioned and designed accordingly.  However, this did not stop national policy-makers subsequently changing direction and demanding a robust outcome analysis from a pilot programme and related evaluation which were both established to meet a different set of objectives.

A similar tension between piloting to encourage local actors to develop their own solutions to problems of service delivery and the desire for definitive (cost-) effectiveness evaluation of ‘what works’ can be seen in other pilot programmes.  For example, the Integrated Care and Support Pioneers were selected as leaders in their potential ability to develop and implement their own solutions to overcoming the barriers to integrating health and social care.  Yet, the evaluation requirement includes a focus on assessing the cost-effectiveness of integrated care and support.  This is extremely challenging in the face of such a diverse programme.

Beyond our two initial observations, the question of ‘evaluability’, which is relevant to all policy evaluation, is particularly pertinent in relation to RCTs and similar experimental designs.  RCTs require a substantial degree of researcher control over both the implementation of the pilots (e.g. a degree of consistency to ensure comparability) and the implementation of the evaluation (e.g. compliance with a randomised research protocol).  This level of control is not a given, and the influence of researchers on pilot sites is much more likely to be based on negotiation and goodwill than compliance.  This does not mean that conducting RCTs is impossible, but that pilot evaluations of this type require a significant and sustained commitment from pilot sites and policy-makers for the duration of the pilot programme to stick with the research protocol, and manage the added risk and complexity associated with the trial.

To help policy-makers to make these decisions and plan (national) pilot programmes and their evaluations better, we have developed a guidance document.  ‘Advice on commissioning external academic evaluations of policy pilots in health and social care’ is available as a discussion paper here   We are keen to receive comments, addressed to .

This is an expanded version of an article written for the December 2015 edition of ‘Research Matters’, the quarterly magazine for members of the Social Research Association.


Ettelt, S., Mays, N. and P. Allen (2015a) ‘The multiple purposes of policy piloting and their consequences: Three examples from national health and social care policy in England’. Journal of Social Policy 44 (2): 319-337.

Ettelt, S., Mays, N. and P. Allen (2015b) ‘Policy experiments: investigating effectiveness or confirming direction?’ Evaluation 21 (3): 292-307.