Dream of Social Impact Bonds should not blind us to their dangers

By Mildred E. Warner

In the US, SIBs are the height of social policy fashion, but the risks are clear and numerous.

The dream of Social Impact Bonds (SIBs) sounds pretty exciting. It’s that you can invest in something good for society and at the same time get a market return so everybody wins. The investor wins, the client wins and, because it’s a better programme, society wins.

SIBs are growing around the world and mainly focus on prevention, which is good to see, because cure is often more costly than prevention. There is considerable enthusiasm in the US where 35 states are building programmes to encourage SIBs. Only ten SIB projects are active in the US, but there are probably hundreds in the pipeline.

Political support
In Congress, support is coming from both sides of the aisle. The US Government Accountability Office told me: “You know, Mildred, the Republicans like this because it lets the private sector get access to public social welfare dollars and the Democrats like it because it might increase public investment in social welfare”. So it’s a win-win politically.

Large organisations are lining up to act as intermediaries in establishing SIBs and evaluators are honing their skills to get the contracts to assess the outcomes. I’ve even talked to people on the street who have heard about SIBs. “I’d like to invest in that,” they’ll say. “I’d love to put my money in something that’s going to yield a good return for everyone.” But they’re not quite sure what this thing is.

Considerable concern
And there’s lots of concern as well. Providers are responding to this new landscape – some are nervous about payment. Academics are probably the most critical. Government managers feel cautious, worried about how much work is involved in putting these things together. Sometimes, officials feel forced reluctantly down the SIB pathway.

Take, for example, the SIB at Riker’s Island prison, designed to reduce the rate at which juvenile offenders return to jail. The Vice Mayor of New York City told me she would not have done the Riker’s Island project as a SIB if she could have funded the program directly. That would have been quicker and easier. But budget limits prevented that and she didn’t want to wait as, each year, more teenagers get caught in the prison system.

So we should take a long hard look at what’s going on. We should make sure that we’re not swept along by a tide of wishful thinking that could leave disappointment in its wake.

SIBs transform social services
This aspect of SIBs is particularly important because they represent a major upheaval in the design and delivery of social services. Typically, SIBs require intermediary management, private investment and some kind of outside evaluation which allows you to link performance to financial returns.

SIBs take public management to a new place because government is no longer at the centre, as in the past, but an intermediary organisation is organising and running things. It’s true that the Government is at the top, calling the shots, defining the structure, defining the goals, but the intermediary is at the centre of the process making everything else happen, linking to the service provider, linking to the outside investors, commissioning the evaluator. The intermediary is becoming a very important actor.

Clients lack a voice
There are some real concerns about the way SIBs work. They tend to focus on areas where the client is pretty weak or voiceless or maybe despised by society. These are areas where we haven’t been able to motivate sufficient public investment in prevention because who cares, for example, about a prisoner and their re-entry into the community?

Yet, ironically, SIBs seem to leave the clients voiceless. These are homeless people, little kids, people who are vulnerable. There seems to be very little thought that we’re not giving the consumer or the client a voice in SIBs. That’s deeply troubling.

I would sum up other concerns as relating to: the suitability of SIBs for complex social problems and solutions; the difficulties of contracting; the transparency of SIBs; whether private investment in SIBs is, or can ever genuinely be, a reality; and the potential impact of private investment on the core values of public services.

Do SIBs suit complex problems?
You’ve got to be careful because social problems are by definition very complex. People are complicated and we live in a very complex world. There’s a lot going on. So focussing on a simple, short-term intervention to deliver a single outcome may not be the best approach.

The Riker’s Island SIB, for example, funded a behaviour modification for young people in prison. Those teenagers were then sent home with very limited skills often to distressed family and neighbourhood situations and to try to engage in an economy that had pretty much left them behind. But they had been taught how to be polite. That’s cheap and it certainly doesn’t hurt. But don’t we need a more comprehensive approach? Aren’t we simply treating the symptom and not the cause? Not surprisingly, some critics have said: “We really need some longer term structural change and this is just a Band Aid”.

Some SIBs are being developed in the US to fund pre-school provision. Indeed the roots for SIBs in the US lie in work done around developing private investment funding models for preschool provision. I can see the appeal. Preschool is inexpensive compared with the costs of early childcare. It’s relatively short term and you can get a good measure of the investment’s efficacy when you see how kids perform in kindergarten. It’s linked with some wonderful long-term improvements in health, education and employment.

When all these savings are calculated, one model found that preschool offers a 17 per cent internal rate of return, which is better than anything the stock market could give. And three quarters of that return is savings to government which offers the chance to get government to pay for SIBs out of savings in future programme costs. Nevertheless, it’s pretty heroic to assume that preschool provision can be credited with producing all the benefits you hope for when a child reaches adulthood.

At this point, it’s also worth remembering that SIBs – like many social policy innovations – are created around the assumption that you can take model programmes that have been proven to work in one place and then scale them up. But have you ever tried to make brownies for 40 people instead of ten? You actually have to change the recipe. You don’t just quadruple the recipe, because it won’t work.

Contract complexity
I worry about the complexity of contracts that typically accompany SIBs. I’ve been studying contracting for 20 years in local government in areas like water and garbage collection. Those contracts can be quite complex, but they are simple compared with contracts for social services.

We also know that cost savings from contracting out water and garbage collection are at best ephemeral. In the longer term, markets require management, which can be expensive – especially when, at a local government level, there are not really markets for your public services. I measured how many alternative providers there are for any one of 72 different services that local governments provide in the US. On average, they have one alternative provider. This does not a market make. That’s why local governments do a lot of reverse privatisation or contracting back in – to ensure that there is some element of periodic competition.

High transaction costs
Transaction costs are also high to set up the contracts. Most local officials would say: “If I’m not going to get a ten per cent savings, I’m not going to go out because I’ll spend more than that just designing the contract.” Additionally, because public markets aren’t competitive, there needs to be monitoring, but monitoring is also expensive. That’s especially true for SIBs: they require high quality evaluation because the financial planning is based on delivery against metrics and, ideally, there should be a counter-factual, some sort of comparison group, not subject to the SIB.

Poor transparency
SIBs lack transparency – there is a lot secrecy about discussions until the deal is done. People may not hear about a project until it has already been designed and, even then, documents are often unavailable to the public for scrutiny. It makes them hard to study and is a problem for open governance.

Inflexible innovation?
SIBs could actually promote inflexible innovation, because they typically involve a model programme for a process which has been proven and which is then locked into a contract. Fortunately, it’s usually only for three to five years, whereas public private partnerships for infrastructure might be 25 or 30 years. But that’s still a constraint on innovation.

Will the private sector invest?
The promise is that SIBs will engage the private sector as an investor in public services and so increase funding for effective programmes and build the political will for policy change. So the long-term outcome could actually be more public spending on these projects once the private investment has shown the value of investment.

That is problematic in the US context because, in seven of the ten projects that currently exist in the US, more than 50% of the finance is being guaranteed by private philanthropy. One of the key investors enjoying this underwritten status has been Goldman Sachs.

I had a student recently from the finance sector who did his Master’s thesis on SIBs. He concluded that the risk of SIBs is too high to attract private capital – a secondary financial market will be required to provide private sector funding. That’s something we should be watching for.

Public values at risk?
Some of the enthusiasm around SIBs is that we’re going to insert efficiency and investment logics to make the social sector better. That sounds interesting. But there is the risk that we will lose some of the values that have underpinned social service policies, such as social justice and citizen empowerment. If financial logics prevail over social values, that could undermine other social goals. And if there are savings from SIBs, why does government have to mortgage those savings to private investor profit, rather than spend it on future investment?

Then there are also those who worry that everything is priced in a SIB. I recently saw the play, “A Curious Incident”, about a child living with Asperger’s Syndrome. In the past, we would have justified spending on such a child as a matter of their right to an education and society’s obligation to accommodate all children in our world. But if we shift towards thinking about the “investible child”, justifying expenditure now by its returns in the future, will that child still get the investment s/he needs? I worry about social Darwinism creeping into social policy, especially as the clients’ voice is largely silent in SIBs.

Meanwhile, what role may intermediaries take in redefining the values of public services? The US already has a largely devolved social welfare state. SIBs may hand intermediaries even more power than they already have to determine what social policy looks like.

New political allies
So there are many challenges. But it’s also important to recognise that SIBs are making a difference, sometimes in unexpected ways. For a start, it’s clear to me that many city managers are trying to ride this wave of marketisation in social policy and get something positive out of it for their communities.

So for example, all the discussion about investment returns on preschool provision is bringing in new political partners to the cause of developing better services for children. Important CEOs in the local communities are saying: “This is important. We want to see investment here”. I remember the head of one of the public social welfare agencies saying: “This is the first time I’ve ever been able to go to city council and tell them that their expenditures are investments, positives rather than negatives.” This was because she was able to talk about the return on investment. SIBs often focus on quite short term returns. This style of language gains political attention, and motivates leaders more than long term returns because, they think: “In the long run, I’m not going to be in office”.

New policy tools
We are also seeing the development of new policy tools. People in the welfare arena are accustomed to rules and rigid regulations governing social services. In contrast, the language of economic development encourages entrepreneurship. In the social welfare world, people find it liberating to be in a culture in which they feel incentivised to deliver a policy rather than simply being governed by regulations and rules.

SIBs aren’t going away. They are growing. In the US they are likely to mushroom, but without the careful study and scrutiny that such a policy reform requires. We should not be blinded by the dream of greater investment and greater program effectiveness. Neither has been proven in the early SIBs. And there are clearly many issues about management and public values. We need to be honest about what really is happening. We should give special attention to how to maintain the important values that underpin our public services.

Dr Mildred E. Warner is a Professor in the Department of City and Regional Planning at Cornell University. Her published articles on SIBs and government services can be found on her website www.mildredwarner.org

‘We should model complex public health interventions before piloting them’

By Zaid Chalabi

Mathematical models can test multiple variables cheaply and quickly, giving early indications of what really matters. That helps in designing pilots and understanding how context can affect a policy’s success, argues Dr Zaid Chalabi.

I was part of a team that evaluated the implementation and cost-effectiveness of the government’s Cold Weather Plan (CWP) for England. The CWP is a guidance document which aims to reduce the thousands of additional deaths that typically occur in England when temperatures plummet. The fall in temperature can increase risks for elderly people as well as those with heart or breathing problems and other chronic conditions.

The plan’s principle is that, when cold weather is expected, the authorities are alerted and they can enact various measures, suggested in the CWP. They might, for example, contact or visit vulnerable patients, check that they have medication, that they are warm and have enough food to last a cold snap. The CWP guidance is quite general and it is up to each local authority to implement the plan in its own way.

Assessing the Cold Weather Plan
There is a lot that we do not know in assessing the CWP’s cost-effectiveness. How fully will each health authority implement the plan? Which aspects will they focus on and what impacts are made by each particular action?

Somehow, we need to know how the CWP, implemented in its various ways, might impact on the health of the population and also avoid hospital admissions to save on costs to the NHS. The costs of elective admissions – postponed if there are weather-induced surges in emergency admissions – also must be assessed. We must evaluate the quality of life, and the years of life extended by the CWP, as part of the final cost-effectiveness calculation.

Clearly all of this is hard to evaluate because there are so many variables. Also the circumstances may not occur often: a decade of mild winters might pass before there is a harsh year. Because the CWP has been running for only a few years, we do not have much data, yet it is also a life and death scenario, so policy makers need advice on how best to implement the CWP in order to deploy limited resources well.

Options for evaluators
What should evaluators do? A meaningful analysis of the outcomes of the CWP would require the ability to compare vigilant health authorities with those less engaged, and to evaluate the CWP over several harsh winters. We cannot afford to wait that long, and it may be unwise to stop some authorities making preparations to protect their populations.

Modelling helps when data is missing
We adopted mathematical modelling as the best approach in such circumstances, where lots of important data are not yet available. It is possible to search the literature or seek expert opinion to make estimates for almost every scenario and action. We can estimate, for example, the benefit to patients’ health of being contacted during cold weather. Theoretical costs can also be factored in for the different options – be it a phone call or a more expensive actual visit. These figures can never be absolutely accurate. But, by building in as much data as possible, the model begins to reveal which variables are significant and which ones do not matter much.

Modelling also provides indications of where the extra costs of the CWP might occur (probably in social services) – and which areas are likely to enjoy savings (probably acute hospitals). In health and social care, implementation of cost-effective innovations is often held up because silo working means there are both losers and gainers. Because the losers may not be compensated, the policy may not be implemented, thus depriving the system of a net gain overall. Modelling can identify who wins and who loses, giving policy makers the chance to equalise the outcome between the silos.

This approach to evaluation, in identifying the many variables and assumptions, also helps us to see gaps in knowledge so we can focus research on gathering any important missing data.

Modelling and other public health interventions
The value of modelling goes beyond policies such as the Cold Weather Plan that can take years to bear fruit. Most public health interventions occur in complex environments and involve multiple variables. It is important to understand the role that context plays in their success.

Models provide an opportunity to vary contexts and see what matters and what does not. They are the obvious precursor to setting up a pilot that can then be designed around key factors that seem to be important, regardless of context. Inserting modelling into the process of early evaluation, before piloting, could be vital if we are to escape the scourge of successful pilots that are not rolled out, because they seem to work only in one place.

Dr Zaid Chalabi is Associate Professor in Mathematical Modelling at the London School of Hygiene and Tropical Medicine.

Ethical risks of marketising public services demand caution

By Julia Morley

Social Impact Bonds and similar financial vehicles may seem guaranteed to deliver key values, but they raise important moral dilemmas.

The impulse to use new private sector approaches to funding public services, via Social Impact Bonds (SIBs) and other social investment vehicles, is fraught with ethical dilemmas. We should be cautious before rushing into “marketising” some public services – packaging them up as commodities that can be provided for a price by any supplier judged appropriate.

It’s easy to miss these problems because many SIB advocates hold strong, high-minded and hopeful moral positions. They’re often tired of just trying to do the right thing: they want to achieve real, measurable improvement in people’s lives.

SIBs and Utilitarianism
The views of those supporting Social Impact Bonds are typically rooted in utilitarianism which attaches moral value to the consequences of actions, not merely to the intentions. This approach contrasts with deontological theories, which hold that no matter how morally good their outcomes, some choices are always wrong. The utilitarian tradition can also be contrasted with virtue ethics. This focusses on what kind of people we want to be so we can work out which actions are allowable.

In contrast to deontological theories and virtue ethics, utilitarianism, which underpins the SIBs’ approach, dwells on outcomes. One popular form of utilitarian theory is “prioritarianism”, which attaches higher moral value to improving things for particular groups of people, usually the neediest and worst off. The goodness of this outcome confers rightness on whatever was done to arrive at that outcome.

At first sight, adopting such a consequentialist approach may seem to be a no-brainer. Who could argue with a standpoint that’s focussed on using SIBs to provide concrete help for the most vulnerable? But this seemingly good sense can sometimes blind us to real dilemmas posed by SIBs, and similar examples of marketisation in social services.

SIBs dilemmas
For a start, what timeframe should be used to assess outcomes? SIBs are normally planned to deliver their benefits in between two and seven years. These are timeframes after which a private equity fund, or venture capital fund, would expect to exit a programme and realise a return on their investments. But there is concern that such timeframes may not capture all the main benefits – and the dis-benefits or negative externalities – that might manifest themselves subsequently.

Is it possible, in any case, to measure all the outcomes? And, if we can, is it always possible to establish a causal link between an intervention or service, provided via the SIB, and the outcomes measured subsequently? Such uncertainties make it difficult, sometimes, to assess the ethical value of the SIB approach.

Should some social problems be free of markets?
For reasons not to marketise social services, I draw on work by Debra Satz, a philosophy professor at Stanford University and author of “Why Some Things Should Not Be for Sale: The Moral Limits of Markets”. Satz is a critic of “noxious markets”, which, she says, tend to feature vulnerable people who have little say in what is happening to them and who may be damaged by being involved in such a market. These “noxious markets” are, she finds, often characterised by excessive profits. She worries about marketisation undermining our civic values and society in such cases.

I also draw on the work of Michael Sandel, a Harvard University politics professor, whose book, “What Money Can’t Buy”, studies the moral limits of markets. He invokes what I call an “icky feeling” about markets in certain goods on the grounds that the exchange, and the nature of the goods, may be devalued by using a market.

These commentators challenge us to consider whether we feel comfortable with a market for kidneys or for other human organs. Would we go along with a market for babies? Would we like to trade them? Would it devalue humankind if we were to do this? How about prisoners? Is it morally acceptable to profit from the misery of others? Is it OK if we “fancy a flutter” with someone else’s well-being, as the Economist asked in an article about the first Social Impact Bond, designed to reduce reoffending rates among former prisoners in Peterborough?

Even if one accepts that SIBs and other forms of marketised social investment are the best that can be done in a difficult world, there are other important issues to address.

The risks of marketization
First, could this kind of marketisation exclude some deserving recipients? Take, for example, help for potential paedophiles. It might not be easy for this group to gain access to resources, using these kinds of mechanism. Which social investor would be prepared risk their brand reputation through such an association?

Another important issue is whether marketising social interventions might affect staff motivation. Describing activities in terms of profit-making can undermine the reasons why people want to provide a service. For example, Richard Titmuss’ classic 1970 study compared voluntary blood donation in the UK with the system in the US that offered donors money in exchange for blood. It concluded that paying for blood reduced not only the quality but the supply of blood because some donors’ motivation was potentially damaged by marketisation.

And lastly, we should also consider possible ill-effects on recipients. How does it feel when a recipient’s problems are generating profits for someone else? How does it feel to be a profit-centre? In short, we should recognise that marketisation can create many potential issues affecting relationships for volunteers, staff and those who receive a social intervention or service.

There are also risks that market systems which replace public values can lead to behaviours that actually worsen social problems. Take, for example, private prisons in the US: providers have spent millions of dollars lobbying for increased rates of incarceration and extending custodial sentences.

Ways to tackle the risks
Where do all these risks leave the question of marketisation, particularly if it looks to be the only way to attract resources? We shouldn’t bury our heads in the sand and conclude that there is nothing that we can do. If some degree of marketisation is inevitable, we should address potential issues.

It may, for example, be better to view markets as complementing rather than supplanting non-profit social services. One can use them around the edges of traditional provision, maintaining an awareness of the social and political implications implicit in these market processes and controlling their worst excesses. This approach would militate against SIBs, and similar approaches, becoming more than an adjunct to more mainstream delivery systems.

One possibility is using experts to constrain profit-based decision making, thus placing moral limits on the market: in other words, regulation of approaches such as Social Impact Bonds. It’s also important to improve the accountability of providers as much as possible, particularly to those receiving services: regulation can reduce power differentials between providers and recipients. There should also be transparency about levels of profit and the structure of deals. Broader social impacts should be measured and monitored.

With these safeguards, the marketisation of social services may be justified even though problems remain, as demonstrated by the behaviour of the US providers of private prisons. Such experience ought to make us reflect about the kind of world in which we wish to live, before we rush into wholesale marketisation. We must not ignore the moral and political implications of actions that might previously have made us feel highly uncomfortable.

Dr Julia Morley is a lecturer in the Department of Accounting at the London School of Economics and Political Science.

SIBs may be overhyped but their focus on outcomes is a vital policy innovation

BY ALEX NICHOLLS

Some claimed benefits of Social Impact Bonds remain unproven. But they tackle long-term weaknesses in public service delivery by concentrating on outcomes, early intervention and collaboration.

Governments around the world are increasingly funding health care and other public services through Social Impact Bonds. SIBs claim to deliver better, more cost-effective public services by recruiting private investors who gain their returns by achieving pre-agreed outcomes. Earlier this month, for example, Prime Minister Theresa May promised to fund improved mental health services through this mechanism.

It’s easy to find serious flaws in Social Impact Bonds – particularly with respect to their main selling points. Like many social policy innovations, SIBs are championed for their efficiency and effectiveness, and for having superior – and better evidenced – impact on social problems. Experience so far doesn’t justify these assertions to any great extent.

Claims of greater effectiveness and rigour for SIBs remain unproven. Their supposedly enhanced efficiency – saving public money – looks largely spurious. However, we shouldn’t simply dismiss SIBs because of these doubtful claims. SIBs are a force for some considerable good, albeit in the ways that their champions tend to emphasise less. That’s because SIBs seem in practice to advance three important developments where conventional public services typically struggle: outcomes, prevention and collaboration.

SIBs and key public policy objectives

First, like Payment By Results (PBR), a central principle of SIBs is that they concentrate on the delivery of outcomes, on the consequences of public services rather than on their processes or outputs. This shift represents one of the most important – and potentially impactful – reforms in public service delivery globally. Utilitarianism may have been oversold as a moral philosophy but, as a general approach to deploying public expenditure, there’s a lot to be said for it.

Second – and this is connected to my first point – SIBs are focussed on shifting practice towards prevention and early intervention, because that’s how they can identify long-term savings.

Third, SIBs help increase collaboration in welfare delivery – they are designed to align incentives for multiple stakeholders to deliver a collaborative goal. Cooperation, with everyone pulling together, has bedevilled complex welfare systems. However, it is needed by SIBs if they are to achieve more with less and thus free up savings that can be translated into investor repayments. As a result, their approach offers a breath of fresh air to public service delivery.

So, although the headline claims for SIBs are largely unsubstantiated, some emerging practice is encouraging and much needed. Nevertheless, the verdict for now on SIBs should be “undecided”, given what’s at stake: the supposed benefits of SIBs and similar financial vehicles underpin arguments for the use of private capital in the delivery of public welfare services. I’d advise policy makers to avoid leaping too quickly to adopting SIBs on a grand scale, given what we know so far. Much of this is examined in a book that I’ve co-edited, entitled “Social Finance,” in which leading scholars in the field examine the issues, and in work done by Oxford University’s Government Outcomes Lab.

In addition to the questionable hype about SIBs, there remain further, serious questions about making profits out of vulnerable people, particularly in developing countries. There are also worries about sustainability – that SIBs will fail providers and users once contracts are completed. Finally, as with all PBR approaches, there are doubts about whether SIBs will be value for money in the longer run.

Such a shaky verdict may seem dissonant with a policy option whose popularity is sweeping the world. SIBs are everywhere. If your home country is not on the SIB map, it will be soon. Take Japan for example. It’s poised to join the scores of countries that are laying the foundations for SIBs. At the end of the summer of 2016, its government announced $200m for Development Impact Bonds (DIBs) (see below). So lots of Japanese SIBs are on their way.

Four myths about Social Impact Bonds

Returning to the “hype” I mentioned initially, let’s consider, first, the four core claims, or myths, that underpin the popularity of SIBs: that they are more effective and also more efficient in delivering superior social outcomes; they can make money for investors; and they are more rigorous in their measurement of change.

1. Superior social outcomes?
As far as superior social outcomes are concerned, the most famous UK SIB – for reducing reconvictions among ex-prisoners in Peterborough – did well. Its early performance was promising, suggesting that – once completed – the SIB would exceed its targets and deliver the superior social outcomes that were expected. It was just unfortunate that the experiment was terminated early for other reasons.
Meanwhile, the New Horizons SIB, developed for the Department for Work and Pensions, was meant to reduce numbers of NEETs – young people “not in education, employment or training”. After a rocky start, the SIB outperformed its targets, the contract was paid and the follow-on contract was awarded, albeit at a much discounted price.
There was also qualitative evidence from service users who said things such as: “If it weren’t for that practitioner, I wouldn’t be here today and that’s actually the truth. I’d be dead. Everyone I knew thought I probably wouldn’t make it until the end of the year.” So, there was impact. It might be of a young person who had been excluded from school who said: “Somebody came to see me, worked with me, got me back into school and back into education”. So, the impact was superior at least to what had gone before. But, maybe, there was no help there before, which makes it hard to compare.
So, there is some evidence of SIB successes – exceeding pre-defined targets, around reoffending, educational attainment, the securing of stable accommodation and full time employment. But SIBs have also failed to hit targets in some key areas and there are problems where there has been no baseline.

Sometimes, the performance metrics also seem artificial. In the Peterborough SIB, it’s plain that the particular targets were created to be very achievable because people wanted the programme to succeed. In short, it is possible that improved social outcomes are being achieved. There certainly are evident social outcomes. But it’s more difficult to say that better social outcomes than those achievable by other methods have been proven.

2. Do SIBs save public money?
The claim that SIBs can save public money is important. The financial logic works like this: once a SIB is in place, the cost to government of the status quo is reduced such that when you add the cost of the new intervention – plus a nice payback to investors – there is still a saving to the public sector.

Indeed, it’s a fundamental assumption, in these times of austerity, that SIBs save money. This is the simplest explanation of why SIBs are proliferating: they appear to give you something for nothing, which is very attractive if you are an impoverished service provider or an impoverished government. SIBs seem to get somebody else to pay for something up front, so you pay nothing, or, if you do eventually pay, you pay less than you would have paid anyway. Also, with any luck, repayment will be over a five or seven year period by which time, at the very least, you’re no longer the minister. So the pay back becomes somebody else’s problem.

There is certainly some technical truth in the saving money argument: services delivered now – because someone else is paying upfront – probably have more value than those delayed five years until the Government has the cash. Yet, beyond what’s called the “present net value” argument, the claim that SIBs save public money is probably the most spurious myth of all. In truth, it’s likely that the net cost of a SIB will be higher than the alternative and, even if it is lower, realising the savings may be nearly impossible.

For a start, some SIBs actually increase costs. For example, the Greater London Authority Homelessness SIB aimed to deliver support to people who were outside the welfare system, so it inevitably increased direct costs to government.

SIBs also have very high transaction costs. We underestimate the amount of management time that’s required to run a SIB and we radically underestimate the demands for information that investors make. So there are hidden transaction costs, often borne by service providers.

Realising notional savings is also difficult. If a SIB reduces reoffending how should that be expressed? Do you weigh up the numbers of sacked policeman, closed courts, shut prisons, prison officers put out of work? Realising those savings is clearly not easy. So it’s hard to see how savings estimates ever become a reality. Yet, saving money is one of the biggest sells to government regarding SIBs.

3. Making money for investors
It’s also claimed that SIBs make money for investors. There is some evidence of real returns but a lot of this depends on how the SIB is structured. The Peterborough SIB, for example, required investors to wait a long time to get any payback. In other SIBs, investors get some payback much more quickly, once there are some evidenced results. And, of course, the investor, who might be putting in a £1m, may never actually put all the money in up front. It’s typically staggered over a period. So if an investor has to put in £250,000 every three months, but starts getting back £100,000 after four months, the risk is much reduced and the returns are higher.

Financial risks in SIBs vary greatly. With Peterborough, investors took a 100 per cent risk – they would have lost everything if they had not hit their outcome targets. But this approach is less common than originally envisaged. The SIB at Riker’s Island prison in the US, which Mildred Warner discusses in her contribution to this series of blogs, involved zero risk to the investor, Goldman Sachs, whose risk was underwritten by the philanthropists.

A famous social investor, whom I won’t name, was so angry about the Riker’s Island SIB that he practically had steam coming out of his ears. “This is the worst thing that could possibly have happened,” he said, “because every single US SIB after this will say: ‘Why should I take any risk because Goldman’s didn’t?’ ” Elsewhere, in Australia for example, there is a two-tier risk model, with lower risk bonds underwritten by the Government and offering a low annual rate of return.

In short, there is evidence of some profitable returns from SIBs. But there is a wide range of experience already among investors from losing money to making money, to making some money and making money under circumstances that feel uncomfortable, and so on.

4. Rigour
It is said that SIBs provide better evidence of impact. There is certainly no single approach to measuring impact across SIBs. There is a vast range from randomised controlled trials to SIBs where there is little or no comparison with other delivery options.

Peterborough was rigorous in that it measured reconviction events among its prisoner cohort, compared with the rest on the National Offenders Register. But the vast majority of SIBs lack such rigour. Most offer “validated administrative data”: somebody says this is what the intervention achieves and the commissioner says: “OK, I’ll pay on that” or “I won’t pay on that”. The metrics used also vary from quantitative to qualitative. So this claim about superior metrics is again not really proven.

The realities

SIBs do have some genuine strengths on the ground. They are part of a wider shift in programmes towards outcomes which should, if successful, deliver more meaningful public services.

The majority of SIBs that I’ve studied focus on prevention or on an early intervention. Very often they put larger amounts of capital into programmes up front than could be found using conventional funding. This is because programmes, particularly run by the third sector, are often small and don’t have the cash to upscale quickly. So, in speeding up investment and concentrating on early intervention, welfare benefits can potentially be achieved earlier than otherwise.

SIBs also allow collaboration across sectors in a way that is almost unique. In other words, if a SIB works, it should align the interests of the third sector, the government and private investors perfectly. So they should strive to achieve a particular outcome because they would benefit in different ways by that being achieved. This is an important asset for SIBs because the fragmentation of public services is a well-rehearsed obstacle to tackling pernicious problems.

It’s also probably true that SIBs are contributing, at least, to the building of a social investment or finance market – for better or worse – although that market is probably too small at the moment to have a major impact.

Additional critiques

There is a big moral question, which Julia Morley discusses in her contribution to this blog series, of whether it is right to privatise, financialise or marketise – whichever language you prefer – social outcomes. This dilemma is perhaps best demonstrated in the case of Development Impact Bonds (DIBs), in which overseas development projects are debt-funded, like SIBs, with payments contingent on achievement of agreed targets. Making profits in this way from tackling the problems of the most vulnerable looks even starker in poor developing countries.

DIBs could be seen as representing a whole new wave of colonialism in which rich financiers say: “Not only can we solve your social problems, but we’re so clever we’re going to make money out of it, and take it back to London and Bonn and Frankfurt and New York”.

It’s also not clear how service providers will sustain their work once contracts for SIBs – or DIBs – are complete. What happens next? This issue affects service users who might say: “I was having this wonderful service and then the SIB stopped. Nobody’s picked up the slack and now I’m left with nothing.” For the service users, as well as providers, this lack of exit thinking is a real problem.

NAO question mark on value for money

But perhaps the biggest question mark brings me back to the beginning – value for money. The National Audit Office struck a note of caution recently in its report, entitled “Outcome-based payment schemes – government’s use of payment by results”. The NAO was mainly talking about PBRs but SIBs sit within that field. It advised caution in the adoption and embracing of such financial approaches. It said the evidence of impact was, at best, mixed and warned about the risk that such approaches might not, in the long-run, offer value for money.

My own overview of the subject suggests that there is very limited evidence for some of the claims made for SIBs. But they do, in reality, offer some innovative approaches – around outcomes, prevention and collaboration – about which public services have long aspired but rarely delivered.

We shouldn’t allow these benefits to be lost. Likewise, we shouldn’t be seduced by the hype.

Dr Alex Nicholls is Professor of Social Entrepreneurship within the Skoll Centre for Social Entrepreneurship at Saïd Business School, University of Oxford and co-editor of the book, “Social Finance”.

How to commission evaluations of national policy pilots

BY STEFANIE ETTELT AND NICHOLAS MAYS

Evaluations of national policy pilots are often embarked on with high expectations and end with a sense of frustration on all sides.  Policy-makers often expect clearer, and more positive, verdicts from evaluation than researchers are able to provide; researchers hope for their findings to be more influential; and implementers in pilot sites struggle to put in place what they think they are being expected to deliver within the limited timescale of the pilot while wondering what they have gained from either the pilot programme or the national evaluation.

To ease some of these frustrations, we have developed guidance aimed primarily at national level staff involved in policy-making and in initiating policy-relevant pilots and their evaluations.  We think the guidance will also be helpful to evaluators. Our advice stems from both experience and analysis of the fate of policy pilots (Ettelt et al, 2015a; Ettelt et al, 2015b).  Two observations, in particular, from evaluating policy pilots in health and social care have shaped our thinking.

The first observation is that many times it is not clear what an evaluation is intended to contribute to policy development.  This lack of clarity is often a symptom of a deeper problem which has more to do with confusion and conflicts over the reasons for piloting than with the evaluation itself.  Indeed, the objectives of the evaluation can be perfectly clearly expressed, and yet it can entirely ‘miss the point’ if the purpose of piloting is not thought through.  As we have argued elsewhere, policy pilots can serve different purposes, many of which have more to do with the realities of policy-making, and the dynamics of policy formulation and implementation, than with piloting for the purpose of testing effectiveness (Ettelt et al, 2015a).  Different groups involved in a policy pilot can have different ideas about the purpose of piloting.  Also, these purposes often change over time, for example, as a consequence of a ministerial decision to roll out the policy irrespective of whether the evaluation has been completed or not.  For example, the Direct Payments in Residential Care pilots, which PIRU is evaluating, were rebranded early in the life of the programme to become ‘trailblazers’ as it was decided, ahead of the results of the pilots, that direct payments would be rolled out nationally in 2016 alongside other aspects of the 2014 Care Act.  However, the policy context of the ‘trailblazers’ continues to change.  As a result, the Department of Health is currently reconsidering whether direct payments should move forward at the same speed as expected earlier.

We think it is important that the goals of such programmes are stated explicitly and that their implications are thought through carefully at the beginning of a pilot programme while it is still possible to make adjustments more easily than later in the process.  This is also the time to identify the target audience for the evaluation.  Whose knowledge is the evaluation aiming to contribute to?  There are likely to be important differences in the information needs and preferences of national policy-makers and local implementers that require some forethought if they are to be addressed adequately.

The second observation is that, under the influence of the mantra of ‘evidence-based policy’, policy-makers increasingly feel that they should prioritise specific research designs for the evaluations of policy pilots, especially experimental designs.  Yet, this consideration often comes too early in the discussion about pilot evaluations and is introduced for reasons that have more to do with the reputation of the design as producing particularly ‘valid’ evidence of policy effectiveness than with its appropriateness to generate insights given the objectives of the specific pilot programme.  The choice of research design does not make a programme more or less effective.  Conducting an RCT is pointless if the purpose of a pilot is to find out whether or not, and, if so, how, a policy can be implemented.  In such a situation, the ‘active ingredients’ of the intervention have not yet been determined and thus cannot be easily experimented with.  The Partnerships for Older People Projects (POPPs) pilots, conducted in the mid-2000s, are an example of a pilot programme that brought together a large number of local projects (of which about 150 were considered ‘core’), indicating an intention to foster diverse local innovations in care, with an evaluation commissioned and designed accordingly.  However, this did not stop national policy-makers subsequently changing direction and demanding a robust outcome analysis from a pilot programme and related evaluation which were both established to meet a different set of objectives.

A similar tension between piloting to encourage local actors to develop their own solutions to problems of service delivery and the desire for definitive (cost-) effectiveness evaluation of ‘what works’ can be seen in other pilot programmes.  For example, the Integrated Care and Support Pioneers were selected as leaders in their potential ability to develop and implement their own solutions to overcoming the barriers to integrating health and social care.  Yet, the evaluation requirement includes a focus on assessing the cost-effectiveness of integrated care and support.  This is extremely challenging in the face of such a diverse programme.

Beyond our two initial observations, the question of ‘evaluability’, which is relevant to all policy evaluation, is particularly pertinent in relation to RCTs and similar experimental designs.  RCTs require a substantial degree of researcher control over both the implementation of the pilots (e.g. a degree of consistency to ensure comparability) and the implementation of the evaluation (e.g. compliance with a randomised research protocol).  This level of control is not a given, and the influence of researchers on pilot sites is much more likely to be based on negotiation and goodwill than compliance.  This does not mean that conducting RCTs is impossible, but that pilot evaluations of this type require a significant and sustained commitment from pilot sites and policy-makers for the duration of the pilot programme to stick with the research protocol, and manage the added risk and complexity associated with the trial.

To help policy-makers to make these decisions and plan (national) pilot programmes and their evaluations better, we have developed a guidance document.  ‘Advice on commissioning external academic evaluations of policy pilots in health and social care’ is available as a discussion paper here   We are keen to receive comments, addressed to .

This is an expanded version of an article written for the December 2015 edition of ‘Research Matters’, the quarterly magazine for members of the Social Research Association.

References

Ettelt, S., Mays, N. and P. Allen (2015a) ‘The multiple purposes of policy piloting and their consequences: Three examples from national health and social care policy in England’. Journal of Social Policy 44 (2): 319-337.

Ettelt, S., Mays, N. and P. Allen (2015b) ‘Policy experiments: investigating effectiveness or confirming direction?’ Evaluation 21 (3): 292-307.

‘We need critical friends and robust challenge, not aloofness and separation’

by anna dixon

A strong relationship between policy-makers and academic evaluators is vital, particularly to support high quality implementation of change, says Anna Dixon, the Department of Health’s Director of Strategy and Chief Analyst.

There continues to be a view that policy making is a very neat process. An issue supposedly arises and there’s an option appraisal about how we might address it. Then, following some consultation, an implementation process is designed. After that, as good policy makers, we always evaluate what we did, how it worked and those insights feed back very nicely to inform future policy making.

Alas, it’s all a bit more complicated than that. However, my message is that in health policy – as well as in other areas of government – we are serious about commissioning evaluation, and ambitious about using the results. Evaluation matters to us. The conditions for it, albeit imperfect, are improving. The impacts and benefits of evaluation can either be formative, to provide learning and feedback as a policy is rolled out or focused on impact to learn retrospectively. In practice many cover both implementation and impact.

Strong support for evaluation

Enthusiasts will be relieved that aspiration for evidence-based policy is very much alive in government.  Sir Jeremy Heywood, the Cabinet Secretary, has said that an excellent civil service should be skilled in high quality evidence-based decision-making. The Treasury is a crucial driver, requiring the Department of Health to do process, impact and cost-benefit evaluations of policy interventions, particularly where they involve significant public expenditure and regulation.

However, delivering on good intentions can be difficult. The National Audit Office (NAO) recently defined best practice as ‘evaluations that can provide evidence on attribution and causality and whether the policy delivered the intended outcomes and impact and to what extent these were due to the policy’. Doesn’t that sound very simple and easy? If only it were so.

In reality, it is incredibly difficult in the messy world of policy implementation to tease out the isolated impacts of one policy compared with all the layering effects of many policies changing as they are implemented. It is far from easy to identify any neat causality between particular policy interventions and outcomes.

The NAO found that much more could be done to use previous evaluations in developing impact assessments of new policies. A survey of central government departments found that plans for evaluation are sometimes not carried out.

Large evaluations commissioned

The Department of Health commissioned a large scale programme of evaluation of the Labour government’s NHS reforms which was coordinated by Nicholas Mays (now director of PIRU). We’re now also commissioning an evaluation of the Coalition’s reforms of the English NHS and also thinking about evaluating impacts from policy responses to the Francis Inquiry. These are substantial evaluation programmes tackling many interventions, occurring simultaneously against a background where much else is changing. It will not be easy to tease out the ‘Francis effect’ in the current economic context with many other policy initiatives taking place at the same time. As well as funding the NIHR and the Policy Research Units like PIRU, the Government recently developed the ‘What Works Centres’. These aim to help government departments and local implementers – schools and probation services and others – to access higher quality evidence of what works.

 Policy-making misunderstood

Will all this activity make a difference? I feel confident that it can lead to more successful implementation of particular interventions and can contribute to better policy-making. But it is only one input into the process. Policy is often driven by evidence of a different kind. That may be the personal experience of the Minister, deliberative exercises, practical wisdom and so on. Insights about what can work on the ground – ‘implementability’ – are also rightly important. And there is the more political dimension – what is acceptable to the public? All these elements go into the mix along with more formal research evidence.

Benefits of implementation evaluation

The influence of evaluation on implementation is more compelling than its influence on policy and demonstrating real value. We have seen this recently with the Care Quality Commission’s new hospital inspection programme. Researchers, led by Kieran Walshe, went out with hospital inspectors on the first wave which immediately fed into the design of the second wave. That’s also been evaluated and is now feeding into the approach that will be rolled out for future hospital inspections and in other sectors of health and care. These pragmatic, real time evaluations can be very useful. They are critical now for the Department of Health because its separation from NHS England means that many people who had experience of more operational roles are no longer working directly within the policy making environment.

Implementation evaluation is beginning to be reflected in the language used by government. The Treasury continues to emphasise summative evaluation, focussing on outcomes and cost benefit ratios but the policy implementation ‘gap’ is now recognised as being particularly important.  We are in a phase where ‘policy pilots’ seem to be out and we have tried ‘demonstrators’. Now we have ‘pioneers’. The language is becoming clearer that the main goal is to understand how to make something work better.

Evaluation can be more effective

What can government and academia do to increase the influence and usefulness of evaluation? We share a challenge to create engagement at the earliest possible stage – ideally the policy design stage. This means building relationships so that academics understand the policy questions and policy makers can share their intentions. So, evaluators should make sure that they talk to the relevant officials and find out who’s working on what. Success can yield opportunities to help design policy or implementation in ways that will support better evaluation.

Academics should be willing to share interim findings and work in progress, even if it is not complete. Otherwise there is a risk that they will miss the boat. On the Government side, we need to be more honest and open about the high priority evaluation gaps at our end.

In terms of rigour, Government is trying to provide better access to data. For example, organisations implementing interventions in criminal justice are able to use large linked data sets, established by the Ministry of Justice, so it is much easier to see impacts of policy changes on reoffending rates. We must make sure that our routine data collections measure the most important outcomes and that these measures are robust. Clearly, one of the challenges for evaluators is to understand the messiness of context.

Independence

The one word I have avoided is ‘independence’ of researchers. If independence means aloofness and separation, I don’t think the relationship works well.  We need to know each other: academics need to know the policy world; the policy world needs to understand academia.  In government, we need critical friends and robust challenge. The fruitful way forward for both sides is to have ongoing discussion, engagement, creating good relationships that mean, even in this messy world, that we can make greater use of evaluation to inform decision-making.

Dr Anna Dixon is Director of Strategy and Chief Analyst at the Department of Health.  This blog is based on a presentation she gave at the meeting, ‘Evaluation – making it timely, useful, independent and rigorous’ on 4 July 2014 organised by PIRU at the London School of Hygiene and Tropical Medicine, in association with the NIHR School for Public Health Research and the Public Health Research Consortium (PHRC).

Policy process for implementing individual budgets highlights some of the tensions in public policy evaluation

by gerald wistow

A high profile initiative to transform social care delivery demonstrates how the demand for rigorous evaluation can be difficult to fulfil alongside enthusiastic policy advocacy, explains former government advisor, Gerald Wistow

Over 40 years ago, the eminent social psychologist, Donald T Campbell, complained that excessive commitment to policies had prevented proper evaluation of Lyndon Johnson’s ‘Great Society’ reforms. Campbell urged social scientists to engage with policy makers to ensure that they appreciated the value of evaluation and did not allow its political risks to preclude its thorough application. His comments are just as relevant today.

I am grateful to Stefanie Ettelt for drawing my attention to a quote from Campbell’s 1969 paper, ‘Reforms as experiments’. In it, he declares: ‘If the political and administrative system has committed itself in advance to the correctness or efficacy of its reforms, it cannot tolerate learning of failure. To be truly scientific we must be able to experiment. We must be able to advocate without that excess of commitment that blinds us to reality testing.’ 

These sentiments spring to mind when reflecting on the piloting of individual budgets for adult social care that took place from 2005. This process highlights the risk that powerful advocacy within government can still lead to what, from the perspective of evaluation, might be considered excessive commitment and so obscure the ‘reality testing’ that evaluation is supposed to provide.

I was a scientific advisor from 2005 to the individual budgets policy team at the Department of Health, providing advice and support through all stages of the evaluation.  At the time, policy processes were being modernised and made more professional. The New Labour mantra, ‘what matters is what works’, meant policy makers were supposed to favour analysis over ideology not least through experimentation and evaluation in advance of universal national roll out. The Modernising Government White Paper (1999) emphasised that evaluation should have a clearly defined purpose with criteria for success established from the outset, that evaluation methods should be built into the policy-making process from the beginning and that learning from pilots should be interpreted and applied.

A key starting point for the formal introduction of individual budgets was the implementation of the ‘Valuing People’ White Paper (2001) which established the central importance of people with learning disabilities being treated as full citizens rather than being excluded from living normally in society. Its four key principles were rights, choice, independence and inclusion.

The Department of Health established a ‘Valuing People Support Team’ to help local authorities and the NHS to implement these principles. In 2003, the Team formed a partnership with Mencap, known as ‘In Control’, to implement a process of ‘self-directed support’ which was piloted with limited evaluation in six local authorities.  The pilots were designed to enable people with learning disabilities to assess their own needs, write their own care plans and organise their own support. The background to this initiative was the need for people with learning disabilities to have greater opportunities to secure more flexible and individualised services because of the low take-up of direct payments (one per cent of all community care packages in around 2003). At the time, some 75 per cent of all money on learning disabilities was still being spent on three traditional, institutional services – residential and nursing home care and day care.

In Control quickly became an organised movement which penetrated national and local government (almost every local authority in the country soon signed up to its programme). By 2005, it had also allied with the physical disability movement which had been working with the Cabinet Office to develop a national strategy that included proposals for a programme of individual budgets.  The concept envisaged that individuals would be able to combine into a single budget all the different funding streams to which an individual might be entitled – such as social security, housing, access to employment and social care.  Individuals would be able to use such a budget on the basis of their assessed needs to purchase the services that they thought most suited those needs. This fitted in with the principles of improving social care services, scoring high on choice, control and independent living.

So by 2005 we had had proposals for individual budgets that were coming from the heart of government, from the Prime Minister’s Strategy Unit, the Department of Health and the Department of Work and Pensions. It was in the 2005 Labour party manifesto and, during the General Election itself, Downing Street wrote a scoping paper on implementation. All of these champions envisaged a process of piloting and evaluation would be necessary and appropriate. In January 2005, the Cabinet Office had described individual budgets as a radical initiative, which would take time to get right, but which would be progressively implemented and, subject to evaluation and resource availability, would be rolled out nationally by 2012. However, by March, the DWP was saying it would be rolled out nationally by 2010.

There remained in these narratives the possibility of failure – everything was subject to evidence that it worked. Evaluation was part of the Government’s risk management – the risk of introducing a radical change that some people strongly supported but whose workings remained unclear. It also appealed to sceptics by saying, ‘Let’s do it progressively, let’s evaluate, let’s make sure that it works’.

The Treasury also had considerable interest in what the programme would cost to introduce, its outcomes and cost effectiveness compared with conventional approaches to service delivery. This last requirement drove the evaluation design so that its core element was a randomised controlled trial. There was also a process evaluation of factors that facilitated and inhibited implementation but the central focus at the outset was to evaluate how the costs and outcomes of individual budget pilots would compare with standard service delivery arrangements.

Although RCTs were widely regarded in DH as the gold standard for evaluation methodologies, especially for clinical interventions, other government departments were less comfortable with the idea that trials were appropriate in the context of individual budgets. The DH implementation support team, and some local staff, shared these concerns and particularly questioned the ethics of denying some participants in the trial access to individual budgets in order to provide comparisons with those who received such budgets.

Meanwhile, the evaluators soon realised, as is often the case, that the intervention to be evaluated was poorly specified. With the policy team, they had to ask: What is an individual budget? How is it allocated? What’s the operating system? How is need to be assessed? How would an assessment of need be converted into a financial sum that someone had available to spend on their care and support? Fortunately, from one point of view, ‘In Control’ had developed a model in their earlier six pilots that not only filled the vacuum but effectively became the intervention to be piloted and evaluated.

Then, in 2006, a new Minister moved the goal posts and announced that, in his view, the inherent value of individual budgets was not in doubt and that he had decided that the initiative should be rolled out nationally from 2010. The evaluation still had an important role, but it would now advise on how best to implement that decision rather than provide evidence to inform whether such a decision should in fact be made. So the RCT continued, but it was undermined. Sites felt more reluctant to identify participants in the study who would not receive a service that had now been ministerially endorsed. Recruitment to the study was slow and, with systems change lagging behind the evaluation timetable, some participants had not received services for the full follow up period before the pilots ended.

The evaluation reported on time and found that people in receipt of budgets, and their carers, reported greater independence and control over how care was provided. Individual budgets were slightly more cost-effective for some (but not all) groups of people. In addition, the implementation of individual budgets had important implications for staff roles, training and the management of funding streams.

In practice, the evaluation was conducted at the intersection with politics, policy-making and implementation. Ministers wanted to prove they could deliver change in what were their frequently short periods in a particular post. They were also influenced greatly by their own informal networks, including in the case of the second minister, his own previous experience of social care services and knowledge of the ‘In Control’ model.

The Department of Health implementation support team who were helping the local sites to implement individual budgets, were also closely associated with ‘In Control’ and its operating model for individual budgets.

The experience of implementing the individual budget pilots demonstrated how the value base of health and social care competed with arguments about technical rationality underlying the modernising government and public sector reform agendas. The former values emphasised the rights of older people and people with disabilities to have greater control over their lives while the latter argument required evidence to demonstrate the benefits of such control, or at least the costs and effectiveness of an intervention which more anecdotal evidence already appeared to support in advance of results being available from the DH commissioned independent evaluation.

As Russell and colleagues (2008) argue – and the individual budgets example supports – policy-making in practice is more a ‘formal struggle over ideas and values’ than a systematically structured search to find and apply the best evidence of what works. As the same authors also underline, there is no single ‘right answer’ to be identified in the messy world of policy-making but only ‘more-or-less good reasons to arrive at more-or-less plausible conclusions’ (Russell et al 2008).

It is sometimes argued that policy makers need better understanding of evaluation but it is perhaps no less true that evaluators need better understanding of policy-making and political processes. There are, for example, some givens in public policy which inevitably and necessarily impact on the conduct and interpretation of evaluation. These givens include the impact of electoral and financial cycles as well as electoral and bureaucratic politics. There are also multiple actors and stakeholders, some of whose actions and influence within policy processes are less apparent than others. For example, for policy researchers there are fascinating questions about how the radical concept of individual budgets was developed and rolled out universally within less than a decade. How a small and newly established organisation such as ‘In Control’ was able to achieve the transformation of national social care policy and service delivery guidelines so rapidly and subsequently begin to extend its model into the NHS is, in itself, an evaluation topic of great interest and relevance to policy researchers.

As for social policy evaluators, these reflections underline the advice of Donald Campbell cited above from another era of social policy transformation. Moreover, in an inherently political clash between values and evidence, the roles of evaluators can perhaps usefully be summarised as being to provide challenge which is both rigorous and sustained; to serve as professional sceptics where others are the professional advocates of change; and, finally, to suspend belief in the absence of independent analysis.

Gerald Wistow is Visiting Professor in Social Policy at the London School of Economics. This piece is based on a presentation that Professor Wistow gave at the meeting ‘Evaluation – making it timely, useful, independent and rigorous’ on 4 July 2014, organised by PIRU at the London School of Hygiene and Tropical Medicine, in association with the NIHR School for Public Health Research and the Public Health Research Consortium (PHRC).

Modelling lets evaluators test-drive change safely and cheaply, using a diversity of non-RCT evidence

by sally Brailsford

Enhanced decision-making, blue-skies thinking and quick trials of hypotheses are all much easier if modelling is in your evaluation tool kit, explains Sally Brailsford

Everyone thinks that they know what a model is. But we all have different conceptions. I like the definition from my colleague Mike Pidd, from Exeter University. He sees a model as ‘an external and explicit representation of a part of reality’.  People use it ‘to understand, to change, to manage, and to control that part of reality’.

We tend to acknowledge the limitations that models have, but fail to fully appreciate their potential.  ‘All models are wrong,’ as George Box said, ‘but some are useful’.

I work in Operational Research. It’s a tool kit discipline. In one part, we make use of statistics, mathematics and highly complex algorithmic models. In another, we draw pictures and play games. I use these elements to create simulation – I build a model in a computer which replicates a real system and then we can play ‘what if’ with it.

Models inform decision-making

I use models mainly for informing decision-making. Sometimes, they don’t actually need much data to be very useful. For example, there is a famous model about optimal hospital bed occupancy, created by Adrian Bagust and colleagues at Liverpool University’s Centre for Health Economics.  It includes some numbers but they are not based on any specific hospitals. It shows that if a hospital tried to keep all its beds fully occupied, then some patients would inevitably have to be turned away.

The model varies patient arrivals as occupancy increases and demonstrates how often the hospital has to turn away emergency patients. It shows that hospitals deemed inefficient, because they occasionally have empty beds, are actually operating effectively. The finding really influenced policy. It showed that, as a hospital reaches about 85 per cent occupancy, it is increasingly likely to have to turn emergency patients away. It is a simple model. It did not involve long-running, expensive randomised controlled trials. Yet it provided vital evidence and was powerful in influencing occupancy targets.

30 year clinical trial in five minutes

In another model, we looked at patients with diabetes at risk of developing retinopathy. Everyone agreed that it was a good idea to screen patients with diabetes to prevent retinopathy before it leads to blindness. However, there was a whole range of screening practices. We used data from all over the place, from the US and from the UK. The model followed patients with diabetes through the life course and through different progression stages.

We had to draw data from very early studies because it would be unethical to conduct a clinical trial that did not treat people according to best practice. We then adapted the model for different populations, with varying ethnic mixes and probabilities of diabetic incidents. We superimposed on the model a range of different screening policies to see which was most cost-effective. In effect, once we felt confident that the model was valid, we could run a clinical trial on a computer in five minutes rather than running a real clinical trial for 30 years. As a result, we discovered really valuable findings.

The beneficial difference between all the various techniques and screening programmes proved to be minor compared with the large impact of more people being screened. We realised that raising attendance, perhaps by social marketing, offered much better value than buying expensive equipment.

Guiding design of hypothetical systems

The next model is even more hypothetical. Three engineers had an exciting, blue skies idea for patients with bipolar disorder. What if, they asked, different sensors tracked a person’s behavioural patterns and, having established an individual’s ‘activity signature’, could spot small signs of a developing episode that would trigger a message that the person might need help?

We expected, rightly, that success depended on what monitoring individuals could tolerate – perhaps a bedside, touch sensor mat, or a light sensor in their sitting room, sound sensors or GPS. We built these different possibilities into the model. We could also check how accurate the algorithms would have to be, if this technology was developed. So we were guiding design of a hypothetical system.

Many, particularly those from clinical backgrounds, find it hard to accept that modelling can provide evidence upon which to make a major decision. People often expect the same kind of statistical evidence as from randomised controlled trials. Modelling does not claim to provide that level of certainty. It is a decision-support tool, helping you understand what might happen if you do something.

Appreciate modelling advantages

We should recognise the advantages of models. They are quick and cheap – you can run a clinical trial that could last decades in a matter of minutes. If you lack confidence statistically in your model, there are solutions: expert opinion and judgement can help fill the gaps. A model allows people to talk about issues in a policy setting and to articulate their assumptions. Quite often the conversations along the road are more important than the eventual model and the model is just a means to that end.

Like in the bi-polar project, you can model innovations that don’t even exist. So I often use modelling for hospitals around redesigning a system or a service. The development does not exist yet, so there are no data – you must gather all the available evidence you can and build it into your model. It lets you explore more than when using traditional methods because your assumptions can be more flexible.

Collecting primary data is hugely expensive, sometimes impossible.  You can consider all sorts of options that it would be unethical to explore in reality. As the bed occupancy model shows, the findings can be powerful and influential.

There is a saying that, if all you have is a hammer, then every problem is a nail. As researchers, we should avoid being confined by preferred methods, whatever our discipline. Modelling can be a valuable research tool.

Sally Brailsford is Professor of Management Science at the University of Southampton. Her blog is based on her presentation on 4 July 2014 at PIRU’s Conference: ‘Evaluation – making it timely, useful, independent and rigorous’.

‘Different contexts should not be allowed to paralyse wider roll-out – some differences don’t really matter.’

by mark petticrew

 Interventions that succeed in some instances may or may not work in other circumstances. You have to consider whether the contextual differences really are ‘significant’, says Mark Petticrew

How important is the particular context of a policy intervention in deciding whether that intervention can work elsewhere? The answer must lie in the significance of the context. Every place is different. Every time is different. Everybody is different. The important question must be: which differences really matter, which are actually significant? We should avoid mistakenly thinking that the inherent uniqueness of everything means that a particular intervention will never work elsewhere. It might still be generalisable and transferable elsewhere.

Similarity and uniqueness

It is, of course, highly implausible that interventions work the same way across different contexts. Nevertheless, it is equally implausible that that evidence collected in one context has no value for another. These polar positions are unhelpful because neither is true. (‘We are all individuals,’ shouted the crowd to Brian in the Monty Python movie.  ‘I’m not,’ said a lone dissenter). Clearly, all individual study contexts are different, but there may be similarities.

Similarity and portability across apparently very different contexts were aptly illustrated to me when I was involved in housing research. The earliest controlled trial of a housing improvement intervention was done in Stockton on Tees in 1929. Families were moved out of the slums, which were then demolished, and moved into new housing. Unexpectedly, many people’s health deteriorated.

This type of intervention is common today.  Urban improvement accompanied by large-scale housing regeneration occurs frequently. However, the context is very different from 1929.  In those days, poverty was probably more widespread, as was slum housing. Yet, more recently the same unanticipated adverse effect has been found in one study, with a minority of people’s health deteriorating when their housing improves. Although the context looks very different, the underlying mechanisms seem to be the same, namely that, when the housing is improved, rents rise and so people scrimp on their diets and their health gets worse.

Another field where the same mechanisms apparently work across different contexts is smoke-free legislation which aims to restrict the impact of second-hand smoke in work and public places. This has been evaluated at least 11 times in very different contexts. When the issue reached the UK, critics, often in the hospitality industry, said this might have worked in these other countries but it wasn’t going to work in pubs in Glasgow, say, or in London. The same arguments were raised around the implementation of smoke-free legislation in Ireland, that these are very different contexts, that people’s drinking and smoking were wedded. Yet, in fact, the success of implementation has been broadly similar across many different states and countries.

Aspects of context that matter

In short, predicting the generalisability of an intervention is all about understanding the significance of context. So the first step must be to reflect on which aspects of contexts might really matter. A lot of checklists to help this task have been put together. Dr Helen Burchett from the London School of Hygiene and Tropical Medicine has reviewed dozens of these frameworks which are used to help users to judge whether evidence collected in one setting might be applicable in another context.  Her study found that there are 19 categories of context that might be important and a few more can probably be added.

Some of the work that we have been doing as part of the NIHR School for Public Health Research has been particularly enlightening around economic contexts. Local practitioners tell us that the current economic climate has been a big constraint not only on the use of evidence by, for example, local government, but also on evaluation itself, which is often seen as a luxury.

However, as I have tried to show, context always varies and simply pointing out the differences is not sufficient. You have to determine – or sometimes make assumptions – about which of these variations actually matter – which are likely to be clinically, or socially significant. How do you do this? This assessment should be informed by at least three considerations. First, there is knowledge of the existing evidence, which helps one discover whether and how the intervention has worked in other settings. Second, understanding the underlying theory and assumptions about how the intervention works and is moderated can be helpful. Finally, one can draw on the judgement of experts, practitioners and policy makers who might have insights into whether one context is significantly different from another.

There is a lot more scope for research in this field. For example, there may be classes of interventions that are less context-dependent than others. Smoke-free legislation with its 11 evaluations would be a case in point, and suggests that perhaps regulatory interventions are less affected by context than interventions that require more individual behavioural change.

Context and interventions intertwined

We may also need to revise our sometimes simplistic view of the relationship between context and intervention. There is a tendency to see context merely as a moderator, something that interferes with an intervention in some way. Yet there are many situations and policies where the intervention is the context. The intervention changes the nature of the system in some way so that the intervention and the context are, in effect, the same thing.  This makes defining the start and the end of an intervention and its boundaries – and thinking about how you evaluate it – hugely challenging.

The significance of context in generalisability also places question marks against the culture of systematic reviews. During such reviews, researchers aim to put all the evidence together from interventions and attempt to discern a single effect based on everything that is known about an issue. It is an attempt to separate the ‘things that work’ from the ‘things that don’t work’ and identify an overall effect size. This may be problematic because, during this process, the context that produces that effect usually gets stripped away. As a result, in the process of producing evidence, we lose the context.

As researchers we also have a tendency to see the world in terms of studies of ‘magic bullets’ which tell us that, if things work, then they work everywhere. However, at least in public health, we are increasingly putting together assemblages of evidence from different contexts that show what happened when those interventions were implemented in different places to guide future decision makers. This is very different from saying simply that something always ‘works’.  It might be more helpful to see the wider goal of collecting evidence as being to inform decisions, rather than to simply test hypotheses. This may be one way forward to make proper sense of context, rather than trying either to eradicate it or allowing its uniqueness to rule out the possibility that an intervention can be transferred across time and space.

Dr Mark Petticrew is Professor of Public Health Evaluation at the London School of Hygiene and Tropical Medicine and a member of PIRU. He is also a co-director the NIHR School for Public Health Research at LSHTM. (mark.petticrew@lshtm.ac.uk)

 

‘Research units are performing a difficult balancing act … but we’re still smiling.’

by nicholas mays

Our ambition to co-produce evidence with advisors and officials is fraught with challenges, but remains a worthy goal with valuable benefits, explains PIRU director, Nicholas Mays.

When PIRU was set up three and a half years ago, there was a great deal of ambition on all sides. The Department of Health, as funder, wanted us ‘to strengthen the use of evidence in the initial stages of policy making’. That was the distinctive, exciting bit for us. We were to support or undertake evaluation of policy pilots or demonstration initiatives across all aspects of the Department’s policy activity – public health, health services and social care.

We were also brave, seeking to ‘co-produce’ evidence by working closely with policy advisors and officials, aiming to break down conventional sequences in which evaluation tends to follow policy development. We wanted early involvement from horizon scanning to innovation design and implementation design, plus support work for evaluations or to do them ourselves. It was clear that if we could be engaged, flexible and responsive, officials would be more likely to work with us.

Some researchers prefer planned, longer term work. They see the responsive element as regrettably necessary to pay the mortgage. In fact, our more responsive work has often turned out to be the most interesting:  some of it we would probably have planned to do in any case; other parts have led to substantial pieces of research. It can be highly productive, not least because policy advisors are fired up about the findings.

Wide-ranging roles

In our first years, we have tried hard to work across all stages of policy development. To support the early stages of policy innovation, we did some rapid evidence syntheses.  We have advised on the feasibility of a number of potential evaluations – for example, we looked at the Innovation Health and Wealth Strategy to examine which of the strategy’s 26 actions could credibly be evaluated. We have advised on the commissioning and management of early stage policy evaluations. We have also helped define more precisely what the intervention is in a particular pilot because, in pilot schemes or demonstrations, the ‘what’ is often presumed, but can actually be rather unclear.

We had expected to guide roll-out, using the learning from evaluations, but that’s not always easy for academic evaluators. PIRU often works with different parts of the social care and health policy system, perhaps for quite short periods of time, which is a very different relationship from working, say, with clinicians for an extended period.  Also, in policy and management, unlike the clinical world, people change jobs fairly frequently making it difficult to sustain relationships.

We have also advised on modelling and simulation, which is useful for playing out possible effects of innovations and to debate potential designs. However, that work typically tends to happen within government rather than through outsiders such as PIRU.

Challenges

Indeed, we have found it difficult to become involved in the early stages of policy development, partly because health and social policy decision-making in England has been restructured and become more complicated as a result of the Health and Social Care Act 2012. There are new agencies and new people, altering long-established relationships between policy makers and evaluators.

Engaging us early on is also demanding. It requires greater openness and communication within government, so that research managers actually know when an initiative is starting, and a willingness to share early intelligence with outsiders in the research community. Some policy makers also find that the perceived benefits of sharing new thinking with us fails to outweigh the perceived risks of having us at the table early on.

Dilemmas

There have been other big issues. How close should evaluators get to those who commission an evaluation? How candid – and sometimes negative – should we be?  Should we refuse to do an impact evaluation because we know that too little time will be allowed to elapse to demonstrate a difference?  Should we actively create dissonance with customers who are also funders through a process of constructive challenge? Strangely, the researchers are sometimes the ones saying, ‘No, we should not be looking at outcomes. You are better doing a process evaluation or no evaluation at this stage.’ In some cases, the researchers are asking for less evaluation and the policy makers are asking for more.

Can it be predicted that certain pilots do not realistically lend themselves to being evaluated? For example, we conducted a study of a pilot scheme allowing patients to either visit or register with GP practices outside the area in which they live.  We highlighted in our report that we couldn’t look at the full range of impacts in the 12 months for which the pilot ran.  Nevertheless, critics of the policy were annoyed with the evaluation because it was seen to legitimise what was, in their minds, an inadequate pilot of a wrong-headed policy.

We frequently have to say that the policy pilot will take a lot longer than expected to be implemented. However, the commissioners of evaluation often have no time to wait and want the results right away. The danger is that lots of time is spent interviewing people and looking for implementation effects, only to discover that not very much has happened yet.

So we face many challenges. But that’s hardly surprising. In an ideal world, we would have closer sets of relationships with a defined set of potential users. In reality, we are working across a very wide range of policy issues with an overriding expectation that we should engage at an early stage and speedily. It’s a difficult but rewarding balancing act.

Nicholas Mays is Professor of Health Policy at the London School of Hygiene and Tropical Medicine and Director of PIRU. This piece is based on a presentation that Professor Mays gave at the meeting ‘Evaluation – making it timely, useful, independent and rigorous’ on 4 July 2014, organised by PIRU at the London School of Hygiene and Tropical Medicine, in association with the NIHR School for Public Health Research and the Public Health Research Consortium (PHRC).