Research and analysis

Annexe 1

Published 28 July 2020

RQI

What are the relevant published estimates of the Value of a Life Year, and what are their strengths and weaknesses?

1. Introduction

1.1. Background and coverage of this RQ

The approach outlined in Franklin (2015), based on data taken from Carthy et al. (1999), generates a Value of Life Year (VOLY)[footnote 1] of around £60,000 and provides the basis for current Her Majesty’s (HM) Treasury Green Book advice on appraisal of government projects that generate life expectancy gains. The purpose of this research question (RQ) is to review this in the context of estimates in the wider literature. Specifically, we will critically examine the literature with respect to convergence or divergence both amongst the existing literature and from this reference value, accounting for, as much as reasonably practical, the timing and the range of methodologies and approaches used to elicit these valuations. A considered judgement will then be made as to whether there is sufficient consensus in study methodologies and results to either generate a VOLY from existing literature ⁠–⁠ and/or to support this reference value as a single ‘baseline’ VOLY without further research.

The review of the VOLY and the Willingness-To-Pay for a Quality Adjusted Life-Year (WTP-QALY) will together inform, where relevant, the exploration of the relationship between them (RQIII and RQV), For example, the findings on risk-based QALYs and methods of scaling health-state valuations to a QALY are likely to be particularly relevant. For clarity, however, we present these 2 reviews separately.

In what follows, we first set out in Section 2 the literature search strategies for the different measures and provide an outline of the information that was extracted from the literature in respect of them. Section 3 reports the range of VOLY estimates existing in the literature, followed by a critical analysis. Section 4 follows the same structure with respect to the WTP-QALY literature. Section 5 is a summary.

2. Literature search strategy

2.1. VOLY

Newcastle University LibSearch[footnote 2] and Google Scholar electronic databases were our primary source of studies[footnote 3]. Following discussion (SC, JN, CB, MM) the following search strategy was employed, based on Key words, and variations of them:

VOLY estimates’ OR ‘value of a life year’ OR ‘VOLY survey’ OR ‘VOLY study’ OR ‘value of a statistical life year’ OR ‘VSLY’ AND ‘contingent valuation[footnote 4]’ or ‘empirical’.

Time and resource constraints meant searches were restricted to those studies written in English[footnote 5]. Dates were from 1990 until present day to ensure a comprehensive overview of VOLY development[footnote 6]. The search returned 1,120 results. The abstracts and titles were screened and anything that did not contain a value of life year estimated within that study was excluded. This means that any studies calculating only a Value of Statistical Life (VSL)[footnote 7] were not reviewed. This left us with 8 articles.

Our secondary source of studies arose from cross-referencing other literature cited in the primary papers, which generated a further 3 studies. A further study was identified at an academic conference. This generated 13 studies estimating a VOLY directly to take forward for data extraction.

A further 6 studies were identified which used secondary data (or an existing [VSL]) to calculate a VOLY indirectly.

Combining the 2 resulted in a total of 19 articles.

2.2. WTP-QALY

2 electronic bibliographic databases were searched: PubMed and Web of Science. Following a previous review on the WTP for a QALY (Ryen and Svensson 2015) the following search strategy[footnote 8] was employed:

“willingness to pay” OR WTP OR value AND QALY OR “quality adjusted life year” OR “life year” OR “quality-adjusted life year”

Time and resource constraints meant searches were restricted to those studies written in English. Searches were restricted by start date: 1 July 2013 to 1 July 2018 (PubMed) and 2013-2018 (Web of Science). These dates correspond to the previous literature review which had an end date of late 2013 (Ryen and Svensson, 2015). A total of 3,721 articles were identified. In addition, the 24 articles identified from Ryen and Svensson (2015) were added to this total: n= 3,745 articles. The titles and abstracts of these returned articles were screened by a member of the research team (NM) and 150 articles were extracted; removing duplicates resulted in 114 articles available for full text review.

At this stage further inclusion criteria were implemented[footnote 9]:

  • studies had to present a primary estimate of WTP-QALY
  • only studies eliciting preferences of the general public were included (studies of patient preferences were excluded)
  • WTP-QALY for disease specific QALYs were excluded[footnote 10]

Full texts were screened by members of the research team (NM, HMason) which resulted in 21 articles being retained for data extraction.

3. VOLY estimates

In this section, we present a summary of VOLY estimates in the literature. This is followed by a critical assessment of the studies generating these values. We note here that all studies in the literature use stated preference methods – either contingent valuation or some form of choice experiment. Our search did not generate any revealed preference studies, possibly because this would require data on inter-temporal choices over a particular hazard to be available.

With respect to VOLY estimates, we distinguish between VOLY estimates derived directly from a survey (“Primary”) in Table 1 and those derived indirectly from an existing VSL estimate (“Secondary”), reported in Table 2. This distinction is made since, a priori, an argument could be made to place more weight on the former, given that they are based on primary data valuing life expectancy gains and are free of any errors that may be introduced through using an indirect process. However, should the values from the 2 approaches converge, this distinction, for all practical purposes, would be unnecessary.

In both cases, the VOLY estimates were taken from the individual papers, either the sole estimate reported in the study or the main estimate (from a range) identified by the researchers. Max, min, mean, and median values are also reported where available. To facilitate comparability of these results, which were generated from studies conducted at different times and in different places with different sample sizes, we adjust them for purchasing power parity (PPP). Finally, for the purposes of this report, the figures were converted to British Pound (GBP) using the prevailing exchange rate[footnote 11]. Thus, all the values reported in the following tables and surrounding text are 2017 VOLYs and not the VOLYs reported in the original studies.

3.1. VOLY estimates: “Primary” VOLY

Table 1. VOLY estimates (“Primary”)

Authors Country Year of survey Context VOLY (Study) VOLY (2017 GBP)
Nielsen, 2010 Denmark 2001 to 2002 Air Pollution EUR 30,000 to 90,000 15,734 to 47,202
Johannesson and Johansson, 1996 Sweden 1995 New medical programme $400 to $1500 216-810
Desaigues et al, 2007 France 2001 to 2002 Air Pollution EUR 21,000 to 206,000 14,700 to 144,200
Vlachokostas et al, 2010 Greece 2009 Air Pollution EUR 41,000 35,272
Desaigues et al, 2011 Determine Voly for EU – survey in 9 countries France, Spain, UK, Denmark, Germany, Switzerland, Czech Republic, Hungary, Poland   Air Pollution 41,000 euros for EU16, 33,000 for New Member Countries. Headline figure 40,000 euros. 36,470
Chilton et al, 2004 UK (41 postcode sectors) 2002 to 2003 Air Pollution 1 Month: 27,630
3 Month: 9,430
6 Month: 6,040 (Value of a person year in normal health).
In poor health 1 month is 7,280.
Value of avoiding respiratory hospital admission 1,310.
Value of avoiding a days breathing discomfort 1,280. All GBP
29,674
Ara and Tekesin, 2017 Turkey 2014 Respiratory 41,750 TLL for VHLL in Ankara (min of 30,185)and 10,258 TLL for VOLY at 0 discount rate 35,374
Alberini et al, 2006 UK, France and Italy 2002 Mortality risk 147,720 and 53,760 EUR (mean and median) could be due to Weibull distribution. Data pooled (not country specific VOLY). 44,500 (median) 122,275 (mean)
Chanel and Luchini, 2008 France 2000 to 2001 Air Pollution 150,497.7 mean and 147,994.9 median EUR. Also used a second model and got a value of 160,700. If perfect health, VOLY = 206,808 141,605
Chanel and Luchini, 2014 France 2000 to 2001 Air Pollution Mean: 160,151 (Simple model) and 142,279 (Full model). Median: 133,410 (simple model). All EUR 141,605
Hammar and Johansson-Stenman, 2004 Sweden 2000 Risk-free Cigarettes 100% replacement: 3600 to 6700 using OE questions and 7800-9300 using DC. 50% replacement 7900 to 12700 using OE and 16200 to 18600 using DC. USD. 100% replacement headline figure 23891 1,846
Grisolia, 2018 UK (Northern Ireland ) 2012 Cardio-vascular Disease 63024. 1% reduction in risk of death over 10 years gives VSL of 814,000. Life expectancy model (outcome B) gives 63024. Outcome C (full life span) gave 56% said they would want compensated (negative WTP) so do not use to calculate a VOLY. 65,015
Ortiz et al., 2009 Brazil 2002-2003 Air pollution mean $159.456, median $61.392 Mean 230,113, median 88,596

The data in Table 1 shows the considerable range of reported VOLYs[footnote 12],[footnote 13], from a lowest value of £216 (Johannesson and Johansson, 1996) to a highest of £230,113 (Ortiz et al., 2009). The majority estimate VOLYs in the context of air pollution. Focussing on the central tendency measures though, with the exception of the 2 studies reporting very low values (Johannesson and Johansson,1996; Hammar and Johansson-Stenman, 2004) and the 3 studies reporting very high values (Alberini et al.,2006; Chanel and Luchini, 2008; Ortiz et al., 2009), they appear to cluster around £35,000. This is somewhat lower than the £65,015 reported in the most recent study in the table (Grisolia, 2018).

There are a number of possible causes of this variation, but the study by Desaigues et al., (2007) is useful since it highlights one potential reason of relevance to the scoping study in general, and RQII in particular. The researchers reported a VOLY of £14,700 when the gain was expressed in terms of life expectancy and a VOLY of £144,200 when expressed as a 1/1000 risk reduction and £39,410 in 2017 GBP when expressed as a 5/1000 risk reduction. Clearly, the way in which the life expectancy gain is presented matters.

We now present the “Secondary” VOLY estimates reported in the literature (Table 2).

3.2. VOLY estimates: “Secondary ” VOLY

Table 2. VOLY estimates (“Secondary”)

Author Country Year of Study VOLY (study) VOLY (2017 GBP)
Mason et al 2008 UK 2008 Undiscounted
1: £44018
2a: £23199
2b: £40029

Discounted:
£56,331
£26,070
£57,749
58,110
Abelson 2003 Australia 2003 AUS$108,000 61,646
Dolan et al 2008 UK 1991 £48,000 57,051
Narain and Sall 2016 USA 2013 US$189,706 135,260
Abelson 2008 Australia 2007 AUS$151,000 77,826
Aldy and Viscusi US 2007 US$ 296,000 (cross-section)
US$302,000 (cohort adjusted)
211,048
215,306

As above, reported VOLYs were converted into 2017GBP. As in the case for directly elicited VOLYs, a wide range of central tendency values are reported from £57,051 (Dolan et al 2008) to £215,306 (Aldy and Viscusi 2008). In general, though, it is notable that the figures reported are generally higher than the “Primary” VOLYs in Table 1 and certainly higher than £35,000. There are 2 UK-based studies (Dolan et al., 2008; Mason et al., 2008), both of which report VOLYs in £57, 000 to £59,000 range, close to the Franklin (2015) £60,000 VOLY, a finding which we comment on further in the next sub-section.

However, there are methodological arguments to suggest that the 2 types of VOLYs may differ significantly and that these features can have either inflationary or deflationary effects on empirical values (see RQII) rendering any comparison of “Primary” and “Secondary” VOLY values somewhat unreliable. Instead, in Section 3.4 we focus on what might lead to differences in indirect VOLY values.

In summary, a wide range of VOLYs are reported in the literature. At least some of this variation will be due to the timing of the study – in general, older studies will have used less developed methods, both for value elicitation and risk communication, than more recent studies. It is not a priori obvious, that a ‘core body’ of studies of sufficient quality/reliability exist to base any firm recommendations on without further analysis. It is to this issue we now turn.

3.3. A critical assessment: VOLY values

3.3.1. A note on meta-analysis

Due consideration was given to the possibility of carrying out a meta-analysis to indirectly derive a VOLY from all the available literature, similar to the OECD (2012) meta-analysis of VSL studies. This is because a meta-analysis can be a useful mechanism to quantifiably account for the links between the studies to generate an overall value, based on all the available information available to the analyst.

Carrying out a meta-analysis would require a substantive body of literature. The meta-analysis in Viscusi (2018) for example included 68 wage-risk studies. There are no equivalent wage-rate databases in the UK and far fewer VPF studies. In the context of a VOLY, if we constrain an analyses to UK (which is the criteria adopted by Robinson and Hammitt (2016) in the US context), we would be able to include:

  • 3 VOLY studies (Desaigues (2011), Chilton (2004), Grisiolia et al., (2018))
  • 3 WTP/QALY studies (Robinson et al., (2013), Pennington et al. (2015), Baker et al. (2010))
  • 5 VPF studies (Lindhjem et al., (2011))

Overall, we do not believe a meta-analysis should be carried out of the very few UK studies available In addition, , as noted by the OECD (OECD, 2012), whilst there is no agreement in the literature about what the limit of variation between studies can be, there must be a level of consistency across studies. Unfortunately, this is very subjective. There is a general consensus in the literature that the more heterogeneity that is present between studies, the less weight will any particular meta-analysis carry (e.g. Nelson and Kennedy, 2009). Many methodological differences and VOLY estimates do exist amongst our studies (see below) meaning that it is highly likely that, following Borenstein et al. (2009), we would have insufficient power to construct a meta-analysis that would yield reliable information, particularly as the VOLY estimates available to us are both presented in and derived in different ways, for example country currencies and directly elicited or indirectly generated from a VSL, making them very difficult to compare.

Other criteria for a reliable meta-analysis include high response rate, new studies and sample size (Lindhjem and Navrud 2011). However, as well as there being no clear benchmark for exclusion or inclusion of studies, had we followed the precedent set out in the OECD (2012) framework and excluded studies on these criteria, excluding any would decrease the already small sample of VOLY studies from one which has already been judged too small (on the grounds of heterogeneity). In addition, the US Science Advisory Board (SAB) advise that a meta-analysis can really only be considered useful and sound if the focus were on identifying factors that influence VSL, and not on deriving an estimate itself.

Finally, Johnston and Rosenberger (2010) note the lack of structural theoretical foundations underpinning a meta-analysis as well as the fact that there is limited evidence to show that meta-analysis is more robust compared to alternative methods.

Based on this, we set aside the possibility of a meta-analysis and derived and applied the following approaches to the VOLY and WTP-QALY values. By necessity, the assessments are qualitative and are informed by our value-judgements, particularly in their interpretation. However, to ensure that each study was treated and assessed equally we designed the data extraction tools outlined above.

3.2.2. Assessment Approach: VOLY Values (“Primary”)

Due to the high degree of heterogeneity present in the studies (Appendix 1), a systematic review approach based on a study-by-study reporting narrative with a view to deriving a small group of ‘core studies’ with sufficient commonalities, a high degree of best practice and ‘similar’ values was eschewed in favour of a characteristics-based approach. From this perspective, we take heterogeneity in values as given and assess the variation across studies with respect to major characteristics. In doing so, the aim is primarily to try and narrow down the reasons for this observed divergence, as noted in Section 1.1.

The data extraction tool was based on 2 potential causes of difference in estimates:

  • the timing and location and representativeness of the studies
  • the surveys and, in particular, the way in which the life expectancy gain is described to respondents and the elicitation procedures

By also collecting information on the economic consistency of the resulting estimates for example, responsiveness of elicited WTP to income, passing a scope test etc. as well as any data cleaning activities, we are able to pass some judgement on the relative quality of the various studies.

To assist in this overall assessment, we[footnote 14] also developed traffic light system for those characteristics in the data extraction tool where such an approach was possible. These criteria for each characteristic are described below but, in principle, ‘green’ signifies good practice, ‘amber’ some ambiguity and red poor practice and/or missing information in terms of how they might influence the resulting VOLY estimates. The primary purpose of this approach was to allow us to broadly assess each study in a consistent albeit subjective manner both with respect each other and to the aims and objectives of our study. It should not be viewed as implying a ranking of one study over another from an academic perspective although it is of course reasonable to conclude that a study assessed as red on all dimensions is unlikely to generate a reliable VOLY for policy purposes.

Overall study characteristics

At the most general level, the studies demonstrate a wide variety of practice with respect to overall design. The mechanism by which these feed through, if at all, to affect convergence or divergence with other VOLYs or bias the value in a particular direction is not identifiable, but it is possible that a study a more recent study with a large sample size and age range carried out in the UK would generate more reliable values than others who did not meet these criteria, either as a whole or in part. With respect to assessment of:

  • timing: studies reporting survey results[footnote 15] from 2010 onwards were assessed as green (Johannesson and Johansson,1996; Ara and Tekesin, 2017; Grisolia et al., 2018), whilst (the 9 other) studies carried out before 2010 were assessed as amber on the grounds that the more recent a study is the more likely the generated VOLY reflects current preferences amongst the population
  • sample size: no study employs a truly random, nationally representative sample so if we based our assessment on this all studies would be assessed as red. However, from a more pragmatic perspective, on the grounds that larger sample size is to be preferred to a smaller sample size all other things equal, we classify the 6 studies with sample sizes of greater than or equal to 800 green (Nielsen, 2010; Johannesson and Johansson,1996; Vlachokostas et al., 2010; Ara and Tekesin, 2017; Chanel and Luchini 2008; 2014[footnote 16] and those with smaller sample size (or sample size not reported) as red. Amber is reserved for studies where the sample size is ostensibly large (for example, Desaigues et al. 2011; Grisolia et al., 2018) but because of study design namely the size of the study area (Europe) in the former and the split sample design employed in the latter (reducing sample to size to 336 for the reported VOLY) they cannot be assessed as green
  • age range: as age might be expected to influence the value of a VOLY (see RQIV), hence studies that include a broad spread of ages in the sample is probably to be preferred, although there may be statistical efficiency cost to this[footnote 17]. Setting that to one side, we assess studies that sample across all or most decades – from 20s through to 80s – green (Nielsen, 2010; Johannesson and Johansson,1996; Desaigues et al., 2011; Chilton et al., 2004), those that restrict themselves to a subset of those decades i.e. from 40’s onwards (Desaigues et al., 2007; Grisolia et al., 2018; Ortiz et al., 2009) or report a low mean age implying older people are excluded or under-represented (Vlachokostas et al., 2010) amber whilst those that do not report on this feature are assessed as red.
  • country: all other things equal, a UK-based survey with a broad geographical coverage is more likely to reflect UK citizens’ preferences over life expectancy gains than surveys of people in the countries, although if the countries were very similar in terms of demographic and cultural factors this might not hold true. As an in-depth examination of country characteristics is beyond the scope of this review, we assess UK studies (Chilton et al., 2004; Grisolia, 2018[footnote 18]) as green and all other studies red

In summary, no study is assessed as green across all of these 4 characteristics. Having said that, it is not easy to judge the relative importance of each and it may well be the case that studies meeting one or 2 criteria are just as likely to generate reliable VOLYs than a study that meets more – not least because the study that meets the most i.e. timing, sample size and age, generates by far the lowest VOLY (Table 1) which is probably, at least in part, driven by the fact that the researchers asked about valuing a year of life over the age of 75 which people may value less than they would at a younger age (this issue is considered in more detail in RQIV). Thus, whilst providing a useful starting point, it is also necessary to investigate the characteristics of the surveys themselves in more detail as poor practice will almost certainly lead to less reliable estimates from a particular study and more divergence in estimates across studies.

Survey: explanation of life expectancy gains

The mechanism by which different explanations of expectancy gains feed through, if at all, to affect convergence or divergence with other VOLYs or bias the value in a particular direction is again difficult to identify directly but a strong argument can be made in favour of studies that pay particular attention to explain to respondents that the gains to life expectancy will not be enjoyed for certain and, in particular, that it is not an ‘add-on’ at the end of life (in poor health). In addition, studies that communicate to respondents the method by which these gains are generated – by small changes in survival probabilities over there remaining lifetime – should be given more weight, particularly if some effort has been made to take account of the cognitive challenge faced by respondents in processing such information. An ‘ideal’ study would present clear explanations of both the mechanism (risk changes) and the outcome (changes in life expectancy). With respect to assessment of:

  • context: assessing this characteristic is not particularly appropriate, since no natural criteria exist other than it must be realistic to respondents and be on in which life expectancy gains can be generated. The issue of context is discussed in detail in RQIV as well as the issue of whether a context-less or a contextual VOLY is to be preferred for UK policy purposes. Here, we simply note that only one study elicits a context-less VOLY. Whilst the remaining studies use a range of ‘one-off’ contexts, the most common is air pollution (Nielsen, 2010; Desaigues et al., 2007; Vlachokostas et al., 2010; Desaigues et al., 2011; Chilton et al., 2004; Chanel and Luchini, 2008, 2014; Ortiz et al., 2009)
  • risk information-baseline risk: studies were assessed as green if they provided respondents with explicit, quantitative information on the baseline risk of dying (Desaigues et al., 2007; Alberini et al., 2006; Grisolia et al., 2018; Ortiz et al., 2009), in 2 cases supplemented by additional information with respect to the impact of age, gender and lifestyle (Desaigues et al., 2007; Grisolia et al., 2018). Studies were assessed as amber if they provided a more descriptive, qualitative information set in which the risk was implicit rather than explicit e.g. Vlachokostas et al. (2010) informed respondents of average life expectancy for men and women whilst Chilton et al. (2004) and Chanel and Luchini (2008) provided fairly general descriptions pertaining to a baseline of a healthy lifestyle with no discomfort and living in a town with pollution equivalent to that in Marseilles, respectively. Studies not specifying a baseline risk were assessed as red
  • risk information-risk change: studies were assessed as green if they provided respondents with explicit, quantitative information on the risk change e.g. Desaigues et al., 2007[footnote 19] specified risk changes of 1 in 1,000 and 5 in 1,000 while Alberini et al. (2006) elicited WTP for a range of risk reductions over varying time periods. Studies were assessed as amber if changes in life expectancy only were valued, the dominant approach in the literature to date (Nielsen, 2010; Vlachokostas et al., 2010; Johannesson and Johansson, 1997; Chilton et al., 2004; Ara and Tekesin, 2017; Grisolia et al., 2018). Studies that valued life expectancy through other mechanisms were assessed as red for example, a relatively lower pollution level than Marseilles or the loss of minutes or hours lost per cigarette or cigarette pack. Whilst not incorrect, it is not clear why such mechanisms would be preferred without further evidence. In addition, these studies have generated one of the highest and lowest VOLYs respectively (Table 1)

Given that these 2 aspects in combination with the explanation of how life expectancy is generated (see below) are such a crucial part of the communication of life expectancy gains, it is striking that only 2 studies were assessed as green on both aspects (Desaigues et al, 2007; Alberini et al., 2006). The fact that the remaining studies were assessed as amber on both or a mix of amber and red suggests that it might be inappropriate to base a new VOLY value on these studies and/or use them as a comparator for a £60,000 VOLY.

  • explanation: since a ‘gold standard’ explanation has yet to be developed, this is a particularly difficult characteristic to assess. We chose therefore to only assess red those studies providing no explanation at all (either quantitative or qualitative) as to the link between mortality risk reductions and the generation of life expectancy gains. It is notable that these 2 studies are amongst the earliest VOLY valuation studies (Johannesson and Johansson, 1997; Chilton et al., 2004) and it appears that lessons learned there were subsequently adopted in future studies, suggesting a general wide scale acceptance and adoption of the principle of trying to ensure respondents do not perceive this life expectancy gain as add on of life at the end of life. Thus, for the purposes of this RQ, we assess green any study attempting explicitly to do this. We, do note, however, that these explanations vary greatly in their depth and content. For example, a number of studies (Nielsen, 2010; Desaigues et al., 2011 and Ara and Tekesin, 2017) presented survival curve diagrams and emphasised that it was the change in life expectancy being valued. The primary purpose of the visual aid was to covey the on-going nature of a gain in life expectancy. An alternative technique is the use of risk grids as in Desaigues et al. (2007) who showed respondents a 10,000 square grid, with a small proportion of shaded squares to convey visually a respondent’s risk of dying (Desaigues et al. 2011). Some studies chose to communicate this risk more indirectly e.g. Chamel and Luchini (2008) who described the life expectancy gain as being generated by living in a town with 25 % or 50% to 100% less pollution than Marseilles. It will certainly be the case that not all explanations are equally good, a point we return to in RQII in the context of methodological issues surrounding the valuation of life expectancy
  • private/ public: this is concerned with whether the life expectancy gain is generated through a public good (e.g. by the government through the policy process) or a private good (for example, medical care) in the valuation scenario. Whilst it initially might be thought that a public good context would be preferred if the VOLY value is to be used in public policy appraisal, it might well be the case that the VOLY elicited in a private good context is less affected by potential biases introduced by some public provision scenarios[footnote 20]. In general, it would seem that the private-based valuations are slightly higher than public-based valuations but it is unclear why. This is an unresolved issue and, as such, we do not assess these studies, instead noting that the studies split 50/50 on this aspect of the scenario
  • acute/chronic: this was recorded primarily to check that all studies based their valuations on small changes in the risk of dying rather than changes in the risk of a health impact

Survey: elicitation procedures

These characteristics are reflective of stated preference surveys in general as opposed to the monetary valuation of a VOLY per se. It is nevertheless the case that, all other things equal, a good survey is to be preferred to a bad stated preference survey. We chose not to assess these features but include them to illustrate the point that any new primary research would need to ensure current ‘good practice’ is followed with respect to these characteristics. It is beyond the scope of this study to consider this aspect of VOLY valuation, although it will have a clear impact on resulting values[footnote 21]. To an extent, the ‘best’ approach with respect to these characteristics will depend ultimately on the particular scenario and questions in any new primary research. Trade-offs and judgements will have to be made in the context of that study. It would be virtually impossible to isolate the impact of each of these features on the reliability or magnitude of the resulting VOLYs, so instead we compare VOLY practice with broader stated preference practice and development, since:

  • survey mode (for example, face-to-face, internet, telephone, mail): the relative advantages and disadvantages of different survey modes are considered in more detail in RQIIIn the past, a major concern was the degree of control that the survey administrator had in ensuring that only one person completed the survey and also in ensuring due cognisance is given by the respondent to the information presented[footnote 22]. This led to an increasing tendency within stated preference studies in general to favour face -to-face or telephone over other media. This can be seen in the context of the VOLY e.g. studies utilising face-to-face (Vlachokostas et al 2010; Desaigues 2011; Chilton et al., 2004; Ara and Tekesin, 2017; Alberini et al., 2006; Chanel and Luchini, 2008; Ortiz et al., 2009) or telephone modes (Johannesson and Johansson, 1996; Alberini et al., 2006; Chanel and Luchini, 2008). The early study by Hammar and Johansson-Stenman (2004) was the exception (mail survey). More recently, the wider literature is characterised by studies utilising the internet (N$ielsen, 2010; Grisolia et al., 2018). The most obvious advantage of the latter is the potential for much larger sample sizes, although there are some potential disadvantages, as with all survey modes (see RQII)
  • value elicitation mechanism: the potential strengths and weaknesses of different elicitation mechanisms (e.g. open-ended, dichotomous choice etc.) have been considered at length in the wider stated preference literature and will not be repeated here. We chose to use this data to identify whether or not a consensus mechanism had evolved in the literature, one that we could give further consideration to in RQII. This was not the case and instead we note, that once again, the studies utilised a wide variety. For example, Desaigues et al. (2007) used closed ended questions whilst Vlachokostas et al. (2010) and Chilton et al. (2004) used open-ended question. The dichotomous choice method was employed by Ara and Tekesin (2017) and Alberini et al. (2006) whilst others utilised randomised card sorting (Nielsen, 2010; Desaigues et al., 2011; Chilton et al., 2004). Grisolia (2018) carried out a choice experiment. This variation in practice is problematic in the sense that, if VOLY values are sensitive to the elicitation mode used, then inter-study comparisons of both the reliability and magnitude of the resulting VOLYs becomes very difficult
  • payment vehicle: a consensus appears to have evolved in the literature to use payment vehicle that utilises ongoing payments (10 out of the VOLY 12 studies reviewed), which captures the fact that an extension in life expectancy is generated over a person’s lifetime (regardless of the perturbation in hazard rates that apply in any one particular case). This also has the advantage that a person’s budget constraint is less likely to bite relative to a one-off payment. This feature is discussed in the context of methodological issues in RQII and will not be considered in any further detail here
  • individual/household: this is a wider issue relating to survey representativeness in general and is beyond the scope of this review, except to note that arguments can be made in support of each
  • order effects: it is good practice to either control for (in the design and/or econometrics) or test for order effect in valuation studies. In the context of the VOLY, order effects were tested for in Desaigues et al. (2007) and controlled for in Chilton et al. (2004) and Chanel and Luchini (2008; 2014)

In summary, a face-to-face mode combined with an ongoing payment appears to have be the most prevalent approach. However, the variability introduced by the differing elicitation mechanisms and the general lack of information on order effects in the reviewed studies makes it difficult to draw definitive conclusions as to whether this is the preferred combination with respect to best practice.

Economic consistency and data reliability

Along with scope sensitivity, in order to have any confidence in the economic validity of VOLY values from a particular study, WTP must be shown to be sensitive to income, in particular to increase with a persons’ income. WTP might also be expected to be affected by time preferences, although in the case of a VOLY the nature of any such relationship is an empirical question and is considered elsewhere in this report (RQV).

  • scope test: a key test of reliability and validity is whether WTP values are scope sensitive i.e. that they increase (significantly)in line with the size of the life expectancy gain. Further, if this rise in WTP is directly proportional to the increase in life expectancy, then this might be considered to demonstrate strong scope sensitivity (for example, Vlachokostas et al. (2010) in which WTP increased by a ratio of 1.9 when life expectancy gain was doubled). In most cases, WTP rises less than in proportion to WTP, suggesting weak scope sensitivity (Nielsen, 2010; Desaigues et al., 2011; Ara and Tekesin, 2017). Studies were assessed as green if they reported statistically significantly different values (Nielsen, 2010; Chilton et al.,2004; Chanel and Luchini 2014), amber if they reported differing scope sensitive values with no statistical information such as p-values (Desaigues et al., 2007; Vlachokostas et al., 2010; Desaigues et al., 2011; Ara and Tekesin, 2017) and red if no scope tests were carried out or insignificantly different values were reported
  • income: as mentioned, WTP must be shown to be sensitive to income, in particular to increase with a persons’ income. This was the case in all studies (assessed as green) except for Desaigues et al. (2007) and Hammar and Johansson-Stenman ( 2004[footnote 23]) are assessed as red, green otherwise
  • age: as noted, the relationship between age and WTP is an empirical issue and in general a number of different relationships have been found to date. Although some studies find no significant link between age and WTP amount (Nielsen, 2010), (Desaigues 2011), (Alberini et al 2006), others find a positive relationship between age and WTP such as Johannesson and Johansson (1996), Vlachokostas et al., (2010) and Desaigues (2007). It is not possible to disentangle genuine preferences from wealth effects if it is the case that older people may be wealthier on average than younger people in a particular country of study. Chanel and Luchini (2008) find an inverse U shaped relationship, and Johannesson and Johansson (1997) find that people value younger lives more. In general, the age of respondents spanned all adult ages, with the exception of Alberini et al (2006) and Desaigues et al (2007) who restricted their samples to 40+.
    Studies cannot be assessed with respect to the type of relationship they uncover. We chose to assess them simply on whether they report on this issue or not, since a priori, age of inception of the policy might be expected to affect value it then becomes impossible to compare findings on age-related VOLYs. 3 studies do not report on it (Chanel and Luchini; 2014; Hammar and Johansson-Stenman, 2004; Grisolia, 2018) and are therefore assessed as red, green otherwise. The issues of relationship between age and WTP is discussed further in RQIV and will not be expanded on here
  • latency/discounting: we did not assess this particular characteristic, simply noting that a range of discount rates are reported and/or applied with implications for the subsequent comparability of resulting VOLYs. There does not seem to be any link between high discount rates and low VOLYs. We would expect time preferences to play a key role in VOLY valuation and this issue if considered further in RQIV

Turning to data reliability, this set of characteristics reflects some test and/or procedures that can be used to either assess the overall validity of the data and/or to make the dataset less noisy:

  • protest responses: the proportion of protest responses ranged from around 1%(Chanel and Luchini, 2008) to 47% (Johannesson and Johansson, 1996). In addition to this, they were also dealt with in different ways. Since the inclusion of protest zeros in a value estimate will have a deflationary effect, it is usual practice to remove at least those that can be clearly identified as protestors and reporting a zero WTP.. Hence, studies were assessed as green if protestors were identified and excluded (Nielsen 2010; Vlachokostas et al., 2010; Desaigues et al., 2011; Chilton et al.,2004; Ara and Tekesin 2017; Chanel and Luchini, 2008; Hammar and Johansson-Stenman 2004), amber if they were identified but not excluded and red otherwise[footnote 24]
  • outliers: the variations in how outliers are treated again makes direct comparisons between VOLYs more difficult but the inclusion of unreasonably high or very low WTP values have the potential to significantly influence central tendency measures (to differing degrees). For example, when Nielsen (2010) excluded outliers, mean WTP fell by 15%. To exclude or include outliers is a currently unresolved issue in the academic literature, as well as which outliers to exclude i.e. trim out very large bids only (Nielsen 2010) or trim from both extreme ends of the distribution (Chilton et al., 2004). However, we note that including all responses may cause difficulties in policymaking – for example, if including outliers was shown to increase the central tendency measure by a “large” amount e.g. more than 10%, this might render its use in policymaking to all intents and purposes impossible, regardless of the academic view. Given this, we did not assess this characteristic and simply note that some studies excluded outliers (Nielsen 2010; Vlachokostas et al., 2010; Desaigues, 2011; Chilton et al., 2004) whilst others did not (Desaigues et al., 2007; Chanel and Luchini, 2008; Hammar and Johansson Stenman, 2004)
  • response rate: in general, a higher response rate is likely to generate responses that better reflect society’s preferences over life expectancy than as smaller sample, although direct comparison across studies is difficult since recruitment practices vary quite significantly across modes[footnote 25] Reported response rates ranged from as low as 35% (Nielsen 2010) to as high as 82% (Johannesson and Johansson, 1996). No attempt was made to control for sample selection bias in any of the studies, except for Desaigues et al., 2007). Studies were assessed as green if they achieved a response rate higher than 50%(Johannesson and Johansson, 1996; Chilton et al., 2004; Chanel and Luchini, 2008, 2014; Hammar and Johansson-Stenman, 2004; Grisolia et al., 2018), amber if lower than 50% (Nielsen, 2010) and red otherwise

We now turn to the “Secondary” VOLY estimates.

3.4. Assessment approach: VOLY values (“Secondary”)

There are a priori reasonable grounds to give less weight to VOLYs generated from secondary data, particularly if the secondary data comes from a different location or time to the population of interest. The more this varies, the less likely is any VOLY to reflect that populations preferences. It is also the case that because the secondary data is generated from a one-period risk reduction, there is the potential for the preferences of people who might value life expectancy gains generated from on-going risk reductions are not properly represented by values generated from one-period valuations (VSLs), alongside the fact that there are a number of potential relationships between the VSL and VOLY. This issue is examined in RQII. The very small number of studies precludes a meta-analysis.[footnote 26]

Returning to the first point, it is indeed the case that there is little consistency across the studies in terms of country of study, the timing and in overall practice, including the sourcing of secondary data. We did not utilise a traffic light system in assessing these studies, since its value to us was less obvious and, in most cases, it is not possible to pass judgement on good or bad practice with respect to the secondary data or the methodological approach since no obvious criteria exist.

The studies were carried out in the UK (Mason et al., 2008; Dolan et al., 2008), USA (Narain and Sall, 2016; Aldy and Viscusi 2008) or Australia (Abelson 2003; 2008). Whilst these might be considered broadly similar since they are all developed counties, set against this is the timing in that they took place between 1991 and 2013. However, despite the fact that all used the standard approach to calculate a VOLY – dividing a population level VSL by the population average by the population average number of expected remaining life expectancy – the primary factor that makes comparison very difficult is the different secondary data sources for the VSL.

For example, Abelson (2003) sourced data from the United States Environmental Protection Agency (USEPA), a range of European studies and NSW Roads & Traffic Authority (2002). To calculate an Australian VOLY he multiplied the ratio of the proposed Australian 2002 VOLY to the US 2001 VOLY[footnote 27] under strict assumptions with respect to the transferability of underling Quality-of Life values. His 2008 study was based on a meta-analysis of research into VSL and VOLY and of international guidelines for life and health from 1991 to 2005). Narain and Sall (2016) also carried out a meta-analysis on 40 primary studies (quality screened studies from the OECD study (2012)). Viscus and Aldy 2008 presents a wage-risk study focusing in particular on deriving an age-dependent VOLY from a VPF, controlling for birth-year cohort effect.

Considering the UK based studies, Dolan et al. (2008) carried out an exploratory Social Well Being Analysis on British Household Panel Survey data sourced from (Oswald and Powdthavee (2008). Mason et al., 2008 based their VPF (VSL) on Department for Transport prices and estimates of age-specific life expectancy gains on data from the Office for National Statistics. The similarity between the Franklin (2014) VOLY and the latter study arises a least in part because both studies sourced their WTP values from the Carthy et al., (1989) study, so it is unsurprising that the 2 values converge. Regarding the former, it is beyond the scope of this study to discuss the differences between SWB and WTP-based values but their very different conceptual underpinnings make a direct comparison very difficult. Essentially, the similarities in values could be completely coincidental or truly reflective of underlying preferences – it is impossible to determine.

As in the case of the ‘Primary ‘VOLYs, no standard approach to discounting has emerged, increasing the chance of divergence between studies, all other things equal. Where studies report age-specific VOLYs (Abelson, 2003, 2008; Mason et al., 2008; Viscusi and Aldy 2008) these are shown to vary. There are obvious implications for this with respect to the current application of a constant VSL in UK policy, which we expand on in RQV.

Given the small number of studies and the divergent sources of the VSLs within, it would be difficult to recommend a UK VOLY based on these studies. The question is whether, together, the 2 types of studies yield information in which we have enough confidence to either recommend a primary value or conclude that, taken as a whole, it provided sufficient support for the current HMT recommended £60,000. We address this question in Section 5 below but now turn to the WTP-QALY literature.

4. WTP-QALY estimates

To recap, the aim here is 2-fold. First, to review the WTP-QALY literature to establish if consistency in estimates is present to generate or identify a VOLY from this literature. Second, to assess the methods used and findings from WTP-QALY studies to inform the conceptual relationship between the VOLYs and QALYs (RQIII). This section presents a summary of WTP-QALY estimates found in the literature. This is followed by a narrative review of the main features of the studies generating these values.

4.1. WTP for a QALY estimates

For each study, the range of WTP-QALY estimates are presented, including mean and median values. Following the method used in the VOLY section, QALY estimates are adjusted for PPP and converted to GBP using the prevailing exchange rate. This enables comparison between studies conducted at different times and in different countries. Thus all values reported in Table 3, and referred to in the text, are 2017 QALY estimates.

The data in Table 3 shows a large range of reported WTP-QALY estimates, from £970 (Bobinac et al. 2012) to £912,835 (Sund and Svensson 2018). The review article by Ryen and Svensson (2015) estimated a trimmed mean WTP-QALY of £63,221 from studies published up to the end of 2013. It is also apparent that multiple WTP-QALY estimates were elicited per study, with ten studies having ten estimates or more, including 2 studies with over 30 estimates (Pinto-Prades et al. 2009; Ahlert et al. 2016).

The reason behind the range of WTP-QALY estimates relates to a number of methodological considerations; this is the focus of RQIV. As the QALY is comprised of both quality of life and length of life, many of the studies examined possible combinations of these 2 items, resulting in multiple estimates. For example, the study by Pennington et al. (2015) examined 5 different ways of presenting a one QALY gain to participants, each resulting in a different WTP-QALY estimate. Studies also set to test different elicitation procedures, for example, the aim of Pinto-Prades et al. (2009) set out to test for biases during the elicitation of estimates, including order effects and sensitivity to the duration of the payment period. The studies with the very highest WTP-QALY values are those in which the questions used a risk framing, which in combination with small changes in utility values generally result in significantly higher WTP estimates (van de Wetering et al. 2015; Attema et al. 2018).

Table 3: Summary of WTP-QALY studies

Authors (publication date) Study location Study year Health state utility approach Type health benefit Elicitation approach Risk change No. QALY estimates QALY estimates range (converted to GBP 2017
Sund and Svensson (2018) Sweden 2014 VAS QoL (small, medium, large, small scope, large scope) Contingent valuation Uncertain 15 £8,867 to £912,835
Attema et al. (2018) Netherlands 2013 VAS Trading life years TTO for income & QoL Certain 15 £3,923 to £529,316
Lim et al. (2017) Malaysia 12/12 to 12/14 EQto5Dto3L and VAS QoL and LE Contingent valuation Certain 4 £2,278 to £3,974
Ahlert et al. (2016) Germany 06/12to 02/14 VAS QoL Contingent valuation Certain 32 £3,660 to £40,495
Soeteman et al. (2017) Netherlands 2010 VAS QoL Contingent valuation Uncertain 2 £181,450 to £322,969
Tilling et al. (2016) Netherlands 2009 VAS Trading life years avoid income loss/income gain TTO – trading length of life & income Certain 12 £2,267 to £53,299
van de Wetering et al. (2015) Netherlands 2013 VAS Quality of life (95, 15, 25, 35) and length of life (5, 10, 15, 20) DCE Uncertain 14 £34,010 to £486,386
Nimdet & Ngorsuraches (2015) Thailand 12/13 to 02/14 EQto5Dto3L and VAS Life extension Contingent valuation Certain 2 £14,206 to £14,300
Bobinac et al. (2014) Netherlands 2010 EQto5D and VAS QoL Contingent valuation Uncertain 8 £44,244 to £200,058
Bobinac et al. (2013) Netherlands 2010 EQto5D and VAS QoL Contingent valuation Uncertain 6 £41,689 to £150,862
Shafie et al. (2014) Malaysia 2010 EQtoVAS Life extension Contingent valuation Certain 3 £15,122 to £27,888
Bobinac et al. (2012) Netherlands 2008 VAS QoL Contingent valuation Certain 29 £970 to £172,930
GyrdtoHansen & Kjaer (2012) Denmark 2005 TTO on either one or 2 health states QoL Contingent valuation Certain 14 £1,434 to £50,796
GyrdtoHansen (2003) Denmark 2001 EQ5Dto3L using Danish Tariffs QoL DCE Certain 2 £5,072 to £6,022
Baker et al. (2010) UK 2006 Chained standard gamble QoL Contingent valuation Certain & uncertain 2 £18,815 to £23,806
Bobinac et al. (2010) Netherlands 2008 EQtoVAS and EQto 5D Dutch Tariff QoL Contingent valuation Certain 6 £7,758 to £19,798
Shiroiwa et al. (2010) International (Japan, Rupublic of Korea, Taiwan, Australia, UK, US) 2007 to 2008 Assumed one year in full health Life extension of 1 QALY from immediate death Contingent valuation Certain 12 £23,195 to £108,577
Shiroiwa et al. (2013) Japan 2011 EQto5D Japanese tariffs QoL and end of life scenarios (0.2 or 0.4 QALY gain) Contingent valuation Certain 14 £11,132 to £55,658
Pennington et al. (2015) Europe (9 European Countries) all values reported in USPPP$ 2009 to 2010 VAS 0to100 QoL, LE and End of Life scenarios (0.25/4yrs; 0.1/10yrs, LE END, LE Coma, LE Terminal) Contingent valuation Certain 5 £7,636 to £20,721
Robinson et al. (2013) Europe (9 European Countries) all values reported in USPPP$ 2009 to 2010 Standard gamble and time tradetooff of EQ5D 3L health states QoL (0.05 or 0.1 Gain) Contingent valuation Uncertain and certain 8 £13,010 to £23,697
PintotoPrades et al. (2009) Spain 2002 to 2003 & 2005 to 2006 Standard gamble using EQto 5D QoL Contingent valuation Certain 37 £3,907 to £246,687

4.2. Assessment of WTP-QALY estimates

The same characteristics-based approach to assess VOLY values is used to structure the assessment of studies detailed in Table 3. However, unlike the ‘primary’ VOLY studies the traffic light system is not used to critically assess studies as there is no agreed upon best practice for the majority of these characteristics.

Overall study characteristics

Generally, WTP-QALY studies were undertaken over a narrow time period and in countries with a health technology assessment (HTA) agency where cost-effectiveness plays a role in the recommendations for new interventions. This reflects the interest that WTP-QALY thresholds generated amongst policymakers and academics following the establishment of HTAs. More detail below:

  • timing: strikingly all studies were conducted after 2000 with all but one study (Gyrd-Hansen, 2003) undertaken between 2005-2014. This reflects the creation of many HTA agencies, such as (the now named) National Institute for Health and Care Excellence (NICE) in 1999 and the use of WTP-QALY thresholds in recommendations
  • country: single country studies were located in 9 different countries with the Netherlands being host to the most with 8 studies. 3 studies present results of multi-country studies (Shiroiwa et al., 2010; Pennington et al, 2015; Robinson et al., 2013). Of the single country studies 14 are located in Europe and 4 in Asia. All single country studies are home to a HTA assessment agency; Japan recently introduced a pilot HTA agency in 2016 (Mahlich et al. 2017)
  • sample size: the sample size should be adequate to be representative of the general population of the country and allow for statistical analysis. 3 studies had a sample size <500 respondents (Tilling et al., 2016; Shafie et al., 2014; Baker et al., 2010) with 14 studies having a sample >1,000 respondents. Additionally, in one of the multi-country studies (Shiroiwa et al., 2010) the sample of 5 of 6 countries were >=1,000 respondents

Eliciting the value of a QALY

QALYs are a multidimensional measure of health that combines any survival gains that may arise from an intervention with an assessment of the “quality” of life during those years. QALYs are calculated by multiplying a health state utility by the number of life years spent in that health state (see RQIII for further details on the theoretical and empirical foundations of QALYs). From a theoretical perspective ‘a QALY is a QALY is a QALY’, meaning any combination of health state utility and life years that equal one are considered to be of equal value. There is huge variation in methods used to elicit the value of a QALY. This not only relates to the type of health state utility approach used but also to the type, size and structure of health benefit and the amount of scenarios used. More detail below:

  • health state utility approach: different approaches exist for eliciting respondents’ preferences for health states, the 3 most common being a rating scale, standard gamble (SG) or time trade-off (TTO) (See Appendix to RQIII for full explanation of these methods). Visual Analogue Scale (VAS) is a common rating scale approach. It involves respondents rating health states on a 0-100 scale, 0 can indicate death or worst health outcome and 100 can indicate perfect or optimal health. SGs present respondents a choice between a certain outcome (e.g. a length of time in a particular health state) and a gamble (a probability (p) of the best health outcome and another probability (p-1) of the worst health outcome. P is varied until indifference is reached between the certain outcome and the gamble. TTOs generally involve respondents’ trading-off remaining life expectancy in a particular health state (A) with a shorter life expectancy in normal or a better health state (B). The length of time in B is varied until a point of indifference is reached. While there is no agreed upon best practice, different approaches are associated with health state utilities. 8 studies used VAS only and 6 used VAS with EQ-5D-3L. This latter approach involves respondents measuring their current and/or hypothetical health states using the 3 level EQ-5D instrument then rating the health states on a VAS. Only one study used a TTO (Gyrd-Hansen & Kjaer, 2012) and 2 studies utilized a SG (Pinto Prades et al., 2009; Baker et al., 2010). Robinson et al. (2013) used a SG and a TTO, 2 studies described health states using EQ-5D-3L descriptions (Shiroiwa et al.,2013; Shafie et al., 2014) and one study had no need to use health state utilities as the health state presented was one year in full health (Shiroiwa et al.,2010)
  • type of health benefit: health benefits are presented in different ways and can relate to the type of stated preference approach utilised. The majority of studies are concerned with quality of life changes – 12 studies use quality of life only health benefits and a further 4 studies present different scenarios including quality of life benefits and life extending benefits. 3 studies are concerned with life extensions only and in 2 studies life years are traded-off (using a TTO – discussed in stated preference approaches sub-section). Life extension generally relates to an ‘add-on’ of life years when death is imminent. This could either be at the end of respondents’ stated life expectancy (Pennginton et al., 2015) or for an end of life/terminal situation, for example, receiving one year in current health followed by death as opposed to current health followed by immediate death (Nimdet & Ngorsuraches, 2015). A more unusual situation involved avoiding time spent in a coma which was assumed to equate to a shortening of life equivalent to one QALY (Pennington et al., 2015)
  • size and structure of health benefit: while the QALY gain is always adjusted to 1 QALY to calculate the WTP-QALY, the size and structure of the health gains respondents’ value vary widely across studies. For example, Pennington et al.(2015) presented 5 scenarios (of quality of life gains only and life extensions only) of a 1 QALY gain. These scenarios included 25% quality of life gain over 4 years or 10% quality of life gain over 10 years as well as a 1 QALY extension to life at the end of life. QALY gains could also be much smaller, such as 0.05 or 0.1 gains (involving changes from 11111 to 22222 and 21121 as measured by the UK EQ-5D 3L tariff, respectively) (Robinson et al., 2013). Within studies a variety of different health gains were also used. For example, in Sund and Svensson (2018) 3 types of quality of life gains are valued over one year: 12121 to 11121 (small gain); 22222 to 11121 (medium gain); and, 12223 to 11121 (the size of the health gains expressed as a QALY gain were not explicitly stated)
  • number of scenarios: studies vary from presenting one scenario (Sund and Svensson, 2018) to those which utilised 29 choices comprised of 42 different EQ-5D states (Soeteman et al., 2017; Bobinac et al., 2010; 2012; 2013; 2014). In these latter studies respondents did not value all 29 studies rather they were randomised to one scenario or a sub-set of scenarios
  • risk change: the probability of respondents being affected by an illness or treatment could be certain or uncertain. 14 studies used a context of certainty, 5 used uncertainty and 2 studies used both. In all but one study uncertainty related to the probability of becoming ill. For example, in Soeteman et al. (2017) 4 probability levels were used 2, 4, 10 and 50%. While in Sund and Svensson (2018) uncertainty related to the probability of attaining the better health state i.e. 1% chance of improvement by natural causes, raising to a 5%chance if they pay for the treatment. This implies the probability for improvement is increased by 4%; another scenario presented a 40% probability of improvement
  • latency: Studies rarely considered latency or used discount rates as generally health benefits occurred immediately or over the near future rather than over the respondents’ life-time. Only 3 studies reported using discount rates (Attema et al., 2018; Bobinac et al., 2012; Shiroiwa et al., 2010); discount rates ranged from 3% to 23%

Elicitation procedures

While an online contingent valuation approach using an individual perspective was most favoured there was significant divergence in elicitation procedures, particularly within the contingent valuation approach. More detail below:

  • stated preference approach: 3 approaches were used contingent valuation, discrete choice experiments (DCEs) and variants of a TTO

Contingent value:

  • the most favoured approach to eliciting WTP-QALY was contingent valuation; 17 studies utilised this approach. However, within this approach there was divergence regarding the elicitation procedure and the payment vehicle. There were 3 main elicitation procedures: payment card followed by a choice (11 studies), dichotomous choice (5 studies) and open-ended (one study). The payment card approach involved respondents being presented (usually randomly) with different monetary amounts and indicating the highest and lowest amount they would certainly (not) pay (in some studies respondents could indicate values they were unsure about, see for example Ahlert et al.(2016)). Between this range, respondents then stated the maximum they would be willing to pay. A double-bounded dichotomous choice approach via a bidding game was generally utilised (except in Sund and Svensson (2018) which only presented dichotomous choices). Respondents were shown monetary amounts (bid value) and asked if they would be willing to pay the bid value (yes or no). Respondents were then shown further bid values which corresponded to their initial response (if initially yes a higher bid value is shown and vice versa). An open-ended approach involved respondents simply stating the maximum they would be willing to pay (see Nimdet & Ngorsuraches, 2015). It is important to note that different payment scales/bid values were generally used across studies and even within studies (see Soeteman et al., 2016). The specificity of the payment vehicle varied from increases to an insurance premium (see Soeteman et al. (2017) to an out of pocket payment (see Pennington et al., 2015). Also the payment could be a one-off (see Shiroiwa et al. (2010)) or a monthly payment over one, 2 or 10 years (see Gyrd-Hansen & Kjaer (2012) and Pinto Prades et al. (2009))

DCEs:

  • 2 studies used a DCE. DCEs delineate the relative importance of attributes comprising a good or service and can provide an indication of the relative overall value of discrete scenarios (Lancaster 1966; Lancsar and Louviere, 2008). Both van de Wetering et al. (2015) and Gyrd-Hansen (2003) included a cost attribute which enabled an indirect estimate of WTP to be calculated

TTOs:

  • 2 studies used a variant of a TTO. Respondents were either trading years or quality of life to achieve an income gain (Attema et al., 2018; Tilling et al., 2016) or trading years to avoid an income loss (Tilling et al., 2016)
  • perspective: the vast majority of studies asked respondents to take an individual perspective (18 studies), valuing changes in their own health. This approach aligns with welfare economic theory as respondents’ are trading-off their own individual consumption to improve their own health. The other approach is to take a societal perspective (2 studies take this approach only (Bobinac et al., 2013; van de Wetering et al., 2015) and one study uses both perspectives (Shiroiwa et al., 2010)). Individuals again trade-off their individual consumption but this time it is for societal health gains which may or may not include themselves (Bobinac et al., 2013). Of these societal perspective studies 2 use contingent valuation (Bobinac et al., 2013; Shiroiwa et al., 2010) and one uses a DCE (van de Wetering et al., 2015)
  • survey mode: surveys were either delivered online (13 studies) or face-to-face (7 studies). One study delivered the survey using both approaches to test for framing effects (Ahlert et al., 2016); WTP values were considerably higher when elicited face-to-face

Economic consistency and data reliability

A range of criteria were used to test for economic consistency and data reliability:

  • protestors: 12 studies made an attempt to deal with potential protestors. One study defined protestors as those stating £0 and either selecting the reason “I am not willing to pay out of ethical considerations” or “other” but decided to retain these respondents in analysis as there were a small number Bobinac et al. (2014); 3 studies recognized there could have been protestors in their study but did not explicitly define them or try to account for them in their analyses (Attema et al., 2018; Bobinac et al., 2012; Gyrd-Hansen, 2003); 8 studies defined protestors using a rule (such as, stating £0 and “government should pay for health care” (Ahlert et al., 2016) or being extreme non-traders in the case of TTOs (Tilling et al., 2016)) and excluded them from analysis (Ahlert et al., 2016; Tilling et al., 2016; Shafie et al., 2014; Gyrd-Hansen & Kjaer, 2012; Bobinac et al., 2010; Pennington et al., 2015; Robinson et al., 2013; Lim et al., 2017)
  • outliers: 11 studies made no mention of outliers. 3 studies excluded the top 1% of responses (Ahlert et al., 2016; Pennington et al., 2015; Robinson et al., 2013); one study trimmed the 5% highest and lowest WTP ratios (Attema et al., 2018); one study used Heckman analysis to accounted for skewed distributions (Shafie et al., 2014); one study replaced an extreme value with a smaller amount (Baker et al., 2010); one used respondents’ response to the TTO health state utility task to judge whether to exclude respondents (Gyrd-Hansen & Kjaer, 2012); and one TTO study truncated negative WTP values (Tilling et al., 2016)
  • scope effects: scope sensitivity relates to whether the monetary value of a QALY changes in proportion to the size of the QALY gain; if this occurs then studies have strong sensitivity to scope. Of the 9 studies testing for scope sensitivity 3 found some suggestion of sensitivity to scope (Ahlert et al., 2016; Tilling et al., 2016; Bobinac et al., 2014, and 6 found weak or insensitivity to scope (Sund and Svensson, 2018; Attema et al., 2018; Soeteman et al., 2016; Baker et al., 2010; Robinson et al., 2013; Lim et al., 2017). Of these studies Sund and Svensson (2018) examined scope sensitivity through changes to the probability of improvement through treatment (4% to 40%) and Baker et al. (2010) also examined scope sensitivity through changes to risk as well as to duration. The TTO studies used different tests of scope sensitivity, examining whether larger or smaller changes to income affected years of life traded (Tilling et al., 2016; Attema et al., 2018)
  • certainty effects: 7 studies included a certainty calibration question asking respondents how confident or sure they were about their response, generally using a 5-point Likert scale (Sund and Svensson, 2018; Soeteman et al., 2016; Bobinac et al., 2010; 2012; 2013; 2014; Gyrd-Hansen & Kjaer, 2012). It is suggested that more certain WTP estimates are better at predicting real consumption behavior and that analyzing only certain WTP values could help mitigated range bias (Blumenschein, Blomquist et al. 2008; Shackley and Dixon 2014). A variety of results were found, including more certain respondents indicating lower WTP per QALY estimates (Bobinac et al., 2014) and more certain respondents indicating higher WTP per QALY estimates (Bobinac et al., 2010) and more certain respondents being slightly more sensitive to scope (Bobinac et al., 2012)
  • income sensitivity: it would be expected that WTP-QALY would be sensitive to income. Of the 16 studies examining the effect of income, 12 found a positive association between income and WTP-QALY, 2 found an unclear relationship (Tilling et al., 2016; Bobinac et al., 2013) and 2 noted budget constraints did not appear to affect results (Pinto-Prades et al., 2009; Bobinac et al., 2012). The 5 papers that did not provide results were: Attema et al.,(2018); van de Wetering et al., (2015); Gyrd-Hansen & Kjaer (2012); Baker et al., (2010); and, Robinson et al. (2014)

4.3. Summary

This review identified 21 papers reporting WTP-QALY estimates calculated from primary research studies. The search strategy was restricted to estimates elicited only from the general public. Studies reporting estimates from patient samples were excluded. There was a wide range of reported WTP-QALY estimates, from £970 to £912,835, which is indicative of the variety of study designs and often the methodological nature of the studies. In the final section of this RQ we draw on the above reviews to inform a response to RQI.

5. Overall summary

This review concluded that the significant variation in values e.g. £216 to £144,200 for a VOLY; £970 to £912,835 for WTP-QALY) and heterogeneity of methods precluded the identification of a VOLY value robust and reliable enough for future policymaking. Significant differences in timing, value elicitation and risk communication methods, amongst other things, meant that insufficient evidence prevailed to support any particular study value over another. 3 studies (Mason et al., 2009; Grisolia et al., 2018 and Ryen and Svenson, 2015) were observed to generate a value close to £60,000 which is also broadly in line with the estimation provided in Franklin (2014) but some fundamental concerns were raised with respect to their reliability for policy purposes. It was also noted that whilst an additional study by Dolan et al. (2008) generated a VOLY in the range of £57,000, a direct comparison is inappropriate given the very different conceptual underpinning of the study (Social Wellbeing Analysis[footnote 28]). Similarly, it was noted that whilst there were a few primary studies converging around a Value of £30,000 to £40,000, these were too few in number and varied too much in terms of timing and/or methodology to provide a reliable corpus of studies as a whole.

Studies were also compared qualitatively with each other to establish whether a particular study or studies could be considered to generate a reliable value from a methodological point of view. Under this scenario, such a value need not map to either a reference value or values from other studies. An assessment framework was devised, the primary purpose of which was to allow each study to be assessed in a consistent manner across a range of relevant factors, as opposed to generating an (implied) ranking of one study over another although any study judged to perform well across all (or many) of the factors would clearly be preferred to a study performing poorly across these same factors.

Thus, studies were compared across timing and location, elicitation procedures and standard economic consistency tests and data handling procedures (for example, responsiveness of WTP to income; scope sensitivity; data cleaning). At the most general level, it was noted that this assessment identified a wide variety of practices with respect to overall design and that it was not really possible to assess how these differences might affect convergence or divergence of any resulting VOLY, either with respect to VOLYs from other studies and/or any ‘reference value’.

RQI also reviewed the WTP-QALY literature using similar procedures with a view to establishing the degree of consistency amongst estimates and procedures and/or establishing whether a VOLY could be derived from this literature instead. For reasons similar to those outlined already, it was concluded that a VOLY robust enough for future policymaking could not be derived from this literature.

References

Abelson, P. (2003). The value of life and health for public policy. Economic Record, 79 (Special Issue). S2-S13.

Abelson, P. (2008). Establishing a monetary value for lives saved: issues and controversies. Canberra: Office of Best Practice Regulation, Department of Finance and Deregulation. Abgerufen am, 5, 2012.

Abelson, J., Eyles, J., McLeod, C.B., Collins, P., McMullan, C. and Forest, P.G. (2003). Does deliberation make a difference? Results from a citizens panel study of health goals priority setting. Health Policy, 66(1), 95-106.

Ahlert, M., F. Breyer, and Schwettmann, L. (2016). How you ask is what you get: Framing effects in willingness-to-pay for a QALY. Social Science & Medicine 150, 40-48.

Alberini, A., Cropper, M., Krupnick, A. and Simon, N.B. (2004). Does the value of a statistical life vary with age and health status? Evidence from the US and Canada. Journal of Environmental Economics and Management, 48(1), 769-792.

Alberini, A. (2017). Measuring the economic value of the effects of chemicals on ecological systems and human health. OECD Environment Working Papers, 116, OECD Publishing, Paris.

Alberini, A. and Šcasný, M. (2011). Mortality Risk Reductions or Life Expectancy Gains. In A 3-Country Comparison of Approaches to Mortality Benefits Estimation, paper submitted to the annual European Association of Environmental and Resource Economics meeting, to be held in Rome Italy.

Alberini, A., Hunt, A. and Markandya, A. (2006). Willingness to pay to reduce mortality risks: evidence from a 3-country contingent valuation study. Environmental and Resource Economics, 33(2), 251-264.

Aldy, J. E., Viscusi, W. K. (2008). Adjusting the Value of a Statistical Life for Age and Cohort Effects. Review of Economics and Statistics, 90 (3), 573-581.

Ara, S. and Tekeşin, C. (2017). The Monetary Valuation of Lifetime Health Improvement and Life Expectancy Gains in Turkey. International journal of Environmental Research and Public Health, 14(10), 1151.

Attema, A. E., Krol, M., van Exel, J. and Brouwer, W.B. (2018). New findings from the time trade-off for income approach to elicit willingness to pay for a quality adjusted life year. European Journal of Health Economics 19(2), 277-291.

Baker, R., Bateman, I., Donaldson, C., Jones-Lee, M., Lancsar, E., Loomes, G., Mason, H., Odejar, M., Prades, J.L.P., Robinson, A. and Ryan, M. (2010). WEIGHTing and valuing quality-adjusted life-years using stated preference methods: preliminary results from the Social Value of a QALY Project. Health Technology Assessment 14(27), 1+.

Bateman, I., Carson, R., Day, B., Hanemann, M., Hanley, N., Hett, T., Jones-Lee, M., Loomes, G., Mourato, S., Ozdemiroglu, E., Pearce, D., Sugden, R. and Swanson, J. (2002). Economic valuation with stated preference techniques. A manual. 177-178. Edward Elgar: Cheltenham, UK.

Bergstrom, T. C. (2006). Benefit-cost in a benevolent society. American Economic Review 96(1), 339-351.

Bobinac, A., van Exel, J., Rutten, F., Brouwer, W (2014). The Value of a QALY: Individual Willingness to Pay for Health Gains Under Risk. Pharmacoeconomics 32(1), 75-86.

Bobinac, A., van Exel, J., Rutten, F., Brouwer, W. (2013). Valuing QALY gains by applying a societal perspective. Health Economics 22(10), 1272-1281.

Bobinac, A., van Exel, J., Rutten, F., Brouwer, W (2010). Willingness to Pay for a Quality-Adjusted Life-Year: The Individual Perspective. Value in Health 13(8), 1046-1055.

Bobinac, A., van Exel, J., Rutten, F., Brouwer, W (2012). Get more, pay more? An elaborate test of construct validity of willingness to pay per QALY estimates obtained through contingent valuation. Journal of Health Economics 31(1), 158-168.

Borenstein, M., Cooper, H., Hedges, L. and Valentine, J. (2009). Effect sizes for continuous data. The handbook of research synthesis and meta-analysis, 2, 221-235.

Blumenschein, K., Blomquist, G.C., Johannesson, M., Horn, N. and Freeman, P. (2008). Eliciting Willingness to Pay Without Bias: Evidence from a Field Experiment. The Economic Journal 118(525), 114-137.

Carthy, T., Chilton, S., Covey, J., Hopkins, L., Jones-Lee, M., Loomes, G., Pidgeon, N. and Spencer, A. (1999). On the contingent valuation of safety and the safety of contingent valuation: Part 2 – The CV/SG Chained Approach, Journal of Risk and Uncertainty, 17(3), 187-213.

Chanel, O. and Luchini, S. (2008). Monetary values for air pollution risk of death: A contingent valuation survey. https://halshs.archives-ouvertes.fr/halshs-00272776/document.

Chanel, O. and Luchini, S. (2014). Monetary values for risk of death from air pollution exposure: a context-dependent scenario with a control for intra-familial altruism. Journal of Environmental Economics and Policy, 3(1), 67-91.

Chilton, S., Covey, J., Jones-Lee, M., Loomes, G. and Metcalf, H. (2004). Valuation of health benefits associated with reductions in air pollution. Defra publication PB, 9413.

Desaigues B, Ami D, Bartczak A, Braun-Kohlová M, Chilton S, Czajkowski M, Farreras V, Hunt A, Hutchison M, Jeanrenaud C, Kaderjak P, Máca V, Markiewicz O, Markowska A, Metcalf H, Navrud S, Nielsen JS, Ortiz R, Pellegrini S, Rabl A. (2011). Economic valuation of air pollution mortality: A 9-country contingent valuation survey of value of a life year (VOLY). Ecological Indicators 2011, 11(3), 902-910.

Desaigues, B. M., Rabl, A. A., Ami, D., Kene, B., Masson, S., Salomon, M., & Santoni, L. (2007). Monetary value of a life expectancy gain due to reduced air pollution: Lessons from a contingent valuation in France. Revue D’Economie Politique, 117(5), 674-698.

Dolan, P., Metcalfe, R., Munro, V. and Christensen, M.C. (2008). Valuing lives and life years: anomalies, implications, and an alternative. Health Economics, Policy and Law, 3(3), 277-300.

Franklin, D. (2015). Derivation of the monetary value of a QALY or SLY.

Grisolía, J.M. (2018). Lifestyle and Heart Diseases in Choice Experiments. Lifestyle in Heart Health and Disease, 163-173 Academic Press.

Grisolía, J.M., Longo, A., Hutchinson, G. and Kee, F. (2018). Comparing mortality risk reduction, life expectancy gains, and probability of achieving full life span, as alternatives for presenting CVD mortality risk reduction: A discrete choice study of framing risk and health behaviour change. Social Science & Medicine. 211,164-174.

Gyrd-Hansen, D. (2003). Willingness to pay for a QALY. Health Economics 12(12), 1049-1060.

Gyrd-Hansen, D. and T. Kjær (2012). Disentangling WTP per QALY data: different analytical approaches, different answers. Health Economics 21(3), 222-237.

Gyrd-Hansen, D., Kjaer, T., and Nielsen, J. S.. (2016). The value of mortality risk reductions. Pure altruism – a confounder? Journal of Health Economics 49, 184-192.

Hammar, H. and Johansson-Stenman, O. (2004). The value of risk-free cigarettes– do smokers underestimate the risk? Health economics, 13(1), 59-71.

Hammitt, J.K. and Haninger, K. (2017). Valuing nonfatal health risk as a function of illness severity and duration: Benefit transfer using QALYs. Journal of Environmental Economics and Management, 82, 17-38.

Haninger, K. and Hammitt, J.K. (2011). Diminishing willingness to pay per Quality-Adjusted life year: Valuing acute foodborne illness. Risk Analysis: An International Journal, 31(9), 1363-1380.

Hirth, R.A., Chernew, M.E., Miller, E., Fendrick, A.M. and Weissert, W.G. (2000). Willingness to pay for a quality-adjusted life year: in search of a standard. Medical Decision Making, 20(3), 332-342.

Johannesson, M., & Johansson, P. (1996). To be, or not to be, that is the question: An empirical study of the WTP for an increased life expectancy at an advanced age. Journal of Risk and Uncertainty, 13(2), 163-174.

Johannesson, M., Johansson, P.O. and Löfgren, K.G. (1997). On the value of changes in life expectancy: blips versus parametric changes. Journal of Risk and Uncertainty, 15(3), 221-239.

Johansson, P.O. (1994). Altruism and the value of statistical life: empirical implications. Journal of Health Economics, 13(1), 111-118.

Johnston, R.J. and Rosenberger, R.S. (2010). Methods, trends and controversies in contemporary benefit transfer. Journal of Economic Surveys, 24(3), 479-510.

Lancaster, K. J. (1966). A New Approach to Consumer Theory. Journal of Political Economy 74(2), 132-157.

Lancsar, E. and J. Louviere (2008). Conducting Discrete Choice Experiments to Inform Healthcare Decision Making. PharmacoEconomics 26(8), 661-677.

Lim, Y.W., Shafie, A.A., Chua, G.N. and Hassali, M.A.A. (2017). Determination of Cost-Effectiveness Threshold for Health Care Interventions in Malaysia. Value in Health 20(8), 1131-1138.

Lindhjem, H. and Navrud S. (2011). Valuing mortality risk reductions in regulatory analysis of environmental, health and transport policies: Policy implications. ENV/EPOC/WPIEEP (2011).

Lindhjem, H., Navrud, S., Braathen, N.A. and Biausque, V. (2011). Valuing mortality risk reductions from environmental, transport, and health policies: A global meta-analysis of stated preference studies. Risk Analysis: An International Journal, 31(9), 1381-1407.

Mahlich, J., I. Kamae,I, Rossi, B. (2017). A new health technology assessment system for Japan? simulating the potential impact on the price of simeprevir. International Journal of Technology Assessment in Health Care 33(1), 121-127.

Mason, H., Jones-Lee, M. and Donaldson, C. (2009). Modelling the monetary value of a QALY: a new approach based on UK data. Health Economics, 18(8), 933-950.

Mason, H., Baker, R. and Donaldson, C. (2008). Willingness to pay for a QALY: past, present and future. Expert review of Pharmacoeconomics & Outcomes Research, 8(6), 575-582.

Narain, U. and Sall, C. (2016). Methodology for valuing the health impacts of air pollution: discussion of challenges and proposed solutions. World Bank.

Nelson, J.P. and Kennedy, P.E. (2009). The use (and abuse) of meta-analysis in environmental and natural resource economics: an assessment. Environmental and Resource Economics, 42(3), 345-377.

Nielsen, J.S. (2010). Approaching the value of a life year: empirical evidence from a Danish contingent valuation survey. Nationaløkonomisk Tidsskrift, 148(1), 67-85.

Nimdet, K. and S. Ngorsuraches (2015). Willingness to pay per quality-adjusted life year for life-saving treatments in Thailand. Bmj Open 5(10).

OECD. (2012). Mortality Risk Valuation in Environment, Health and Transport Policies. OECD (Paris).

Ortiz, R.A, Markandya, A., Hunt, A. (2009). Willingness to Pay for Mortality Risk Reduction Associated with Air Pollution in São Paulo, RBE Rio de Janeiro v. 63(1), 3–22.

Oswald, A.J. and Powdthavee, N. (2008). Death, happiness, and the calculation of compensatory damages, Journal of Legal Studies, 37, 217-251.

Pennington, M., Baker, R., Brouwer, W., Mason, H., Hansen, D.G., Robinson, A., Donaldson, C. and EuroVaQ Team. (2015). Comparing WTP values of different types of QALY gain elicited from the general public. Health Economics 24(3), 280-293.

Pinto-Prades, J.L., Loomes, G. and Brey, R. (2009). Trying to estimate a monetary value for the QALY. Journal of Health Economics 28(3), 553-562.

Robinson, A., Gyrd-Hansen, D., Bacon, P., Baker, R., Pennington, M., Donaldson, C. and Team, E. (2013). Estimating a WTP-based value of a QALY: The ‘chained’ approach. Social Science & Medicine 92, 92-104.

Robinson, L.A. and Hammitt, J.K. (2016). Valuing reductions in fatal illness risks: Implications of recent research. Health Economics, 25(8), 1039-1052.

Ryen, L. and M. Svensson (2015). The willingness to pay for a Quality Adjusted Life Year: a review of the empirical literature. Health Economics 24(10), 1289-1301.

Shackley, P. and S. Dixon. (2014). The random card sort method and respondent certainty in contingent valuation: an exploratory investigation of range bias. Health Economics 23(10), 1213-1223.

Shafie, A.A., Lim, Y.W., Chua, G.N. and Hassali, M.A.A. (2014). Exploring the willingness to pay for a quality-adjusted life-year in the state of Penang, Malaysia. Clinicoecon Outcomes Res 6, 473-481.

Shiroiwa, T., Igarashi, A., Fukuda, T. and Ikeda, S. (2013). WTP for a QALY and health states: More money for severer health states? Cost Effectiveness and Resource Allocation,11, 22.

Shiroiwa, T., Sung, Y.K., Fukuda, T., Lang, H.C., Bae, S.C. and Tsutani, K. (2010). International survey on willingness-to-pay (WTP) for one additional QALY gained: what is the threshold of cost effectiveness? Health Economics 19(4), 422-437.

Soeteman, L., van Exel, Bobinac, A. (2017). The impact of the design of payment scales on the willingness to pay for health gains. The European Journal of Health Economics 18(6), 743-760.

Sund, B. and M. Svensson (2018). Estimating a constant WTP for a QALY-a mission impossible? The European Journal of Health Economics 19(6), 871-880.

Tilling, C., Krol, M, Attema AE4, Tsuchiya A1,5, Brazier J1, van Exel J3, Brouwer W3. (2016). Exploring a new method for deriving the monetary value of a QALY. The European Journal of Health Economics. 17(7), 801-809.

Tolley, G., Kenkel, D. and Fabian, R. (1994). State of the Art Health Values, Valuing Health for Public Policy, an Economic Approach, 323-44. University of Chicago Press, Chicago.

Vlachokostas, Achillas, Slini, Moussiopoulos, Banias, & Dimitrakis. (2011). Willingness to pay for reducing the risk of premature mortality attributed to air pollution: A contingent valuation study for Greece. Atmospheric Pollution Research, 2(3), 275-282.

van de Wetering, L., van Exel, J., Bobinac, A. and Brouwer, W.B. (2015). Valuing QALYs in Relation to Equity Considerations Using a Discrete Choice Experiment. Pharmacoeconomics 33(12), 1289-1300.

Viscusi, W.K. (2018). Best estimate selection bias in the value of a statistical life. Journal of Benefit Cost Analysis 9(2), 205–246.

Appendix 1: Data Extraction Tool(s)

VOLY studies:

For VOLYs elicited from primary data (surveys):

  • study overview (Survey Year; Sample Size; Country; Mean Age; Age Range)
  • results (Study VOLY; Converted VOLY (GBP))
  • Life Expectancy Communication Mechanism (Context; Baseline risk; Risk change; Explanation; Private/Public; Acute/Chronic)
  • Elicitation Procedures (Survey Approach; Survey Mode; Elicitation; Payment Vehicle; Individual/Household; Order effect; Order Effect Test)
  • Age/Income Effects and Discounting (Scope Test; Impact of Respondent Income; Impact of Age on WTP; Latency/Discounting)
  • Quality (Reliability Measures (Protest Responses; Outliers; Response Rate)

For VOLYs calculated from secondary data:

  • study overview (Survey Year; Sample Size; Country; Mean Age; Age Range)
  • results (Study VOLY; Converted VOLY (GBP))
  • method for calculating VOLY (Value of VSL; VPF/Remaining Life Expectancy; Data Source; Data Set)
  • Life Expectancy Related (Health State; Context)
  • Age and Discounting (Discount Rate; Age Dependant VOLY)

WTP-QALY studies:

  • study overview (Study Year; Sample Size; Study Location)
  • results (Study WTP-QALY; Converted WTP-QALY (GBP 2017 prices))
  • QALY Communication Mechanism (Type of Health Benefit; Baseline-state; End-state; Risk change)
  • Elicitation Procedures (Survey Approach; Elicitation; Survey Mode; Elicitation; Payment Vehicle; Latency; Health State Utility Approach)
  • Age/Income Effects and Discounting (Income-Significant; Age-Significant; Latency)
  • Quality (Reliability Measures (Scope Test; Protestors; Outliers; Certainty Test; Direct/Indirect; Analysis)
  1. Termed value of a Statistical Life Year (SLY) in Franklin (2015). 

  2. LibSearch includes the following databases (among others): Compendex, EBSCO, JSTOR, Medline, Ovid, ProQuest, Scopus, Web of Science. 

  3. In addition, leading international scholars (James Hammitt; Lisa Robinson; Alan Krupnick; Emily Lancsar; Dorte Gyrd-Hansen; Milan Scazny; Stale Navrud; Andrea Leiter-Scheiring; Alberton Longo; Henrik Andersson) were also contacted with a draft list and were asked to indicate if they were aware of any VOLY valuation studies that had not been uncovered in the initial search. 

  4. A ‘revealed preference’ Key Word did not result in any identified studies. 

  5. With the exception of one study in German. Leiter-Scheiring kindly translated the VOLY value and its derivation. 

  6. Whilst the studies are to be assessed later on in respect to quality, reliability etc., at this stage we treat all VOLY estimates equally and do not screen any out. 

  7. We use the term Value of a Prevented Fatality (VPF) elsewhere in the report but in this RQ we respect the original source literature and use the term “VSL” throughout. 

  8. This search strategy enabled us to capture common expressions, such as “value of a QALY”. 

  9. These criteria meant we found less studies than in the review by Ryen and Svensson (2015). For example, Mason et al. (2009) and Hirth et al., (2000) were excluded here as WTP-QALY was calculated through conversions of VSL estimates, see Mason et al, (2008) for a review of this method. Likewise, the paper by Hammitt and Haninger (2017) was classified as ‘indirect’. 

  10. This includes among other studies the paper by Haninger and Hammitt (2011). 

  11. VOLY estimates were standardised in US$ (following the reporting convention in OECD, 2012) by dividing by PPP for the relevant currency and year and then multiplying by ‘time conversion ratio’. This was calculated as the 2017 PPP figures for the currency divided by year of study PPP figure. This procedure adjusted for inflation to 2017. The figures were multiplied by the prevailing exchange rate (1USD=0.713GBP; OECD website) in order to convert the USD2017 into GBP2017 by PPP. All conversion data was taken from the OECD website

  12. We do not adjust these estimates to try and account for any differences in the marginal utility of income and/or elasticity of health expenditure. This implies that both ourselves and the study authors assume that the marginal utility of income is constant over time. The Elasticity of health expenditure is found to be less than 1 in all cases except Hammar and Johansson-Stenman (2004), ranging from 0.16 in low incomes (Alberini et al., 2004)) to a mean of 0.7554 (Chanel and Luchini, 2014). The Abelson et al. (2003; Table 1) study reported similar evidence. 

  13. An argument could be made, that to truly compare reported VOLYs, they should all be recalculated to the same discount factor. The reason we have not done this is because the papers that do discount their VOLYs (see 4.1.3 below) do not make it clear whether this discounting as taken place and the individual or sample level. We therefore could not be confident in re inflating them correctly. Furthermore, some papers elicit a discount rate by comparing VOLYs from samples at different ages and there is no clear method available to re inflate such VOLYs. Finally, some studies do not mention discounting, therefore we must assume individuals used their own implicit discount rate, which cannot be quantified. 

  14. (SC, CB, agreed with subsequently by HMetcalf, NM, HMason, RM). 

  15. This explains why, for example, the study by Vlachokostas et al., 2010 is assessed as amber – their survey was conducted in 2009. 

  16. These 2 studies are based on the same dataset, so in effect constitute only one study. 

  17. Sampling only one age cohort or a narrow range of cohorts such as 40+ (e.g. Alberini et al., 2006; Grisolia et al., 2018) increases the reliability of any estimate by decreasing the heterogeneity introduced by varying ages (and hence preferences). 

  18. Noting that this is restricted to a very narrow geographical coverage i.e. Northern Ireland). 

  19. This study also included treatment variants valuing life expectancy gains. 

  20. For example, pure altruism, which can led to double-counting (Bergstrom, 2006) or free-riding which would inflate or deflate a VOLY, although this (could) depend on how the question is asked (see Johansson, 1994; Gyrd-Hansen et al., 2016). 

  21. We make the same observation in relation to econometric analysis. This will undoubtedly have an effect on the robustness of any VOLY value, a point noted by Alberini (2017) in her critique of the Desaigues et al., 2011 study). 

  22. A necessary, but not sufficient, condition for a reliable valuation. 

  23. Not tested for in this particular study. 

  24. This criteria does not apply to choice experiments (for example, Grisolia, 2018). 

  25. For example, mail surveys generate a final sample via number of survey waves (stopping either when the sample is considered “large” enough and/or, pragmatically, when responses stop coming in and thus would never expect a 100% response rate. On the other hand, internet survey participants are recruited from a large panel by a survey firm and recruitment stops only when the quota is fulfilled. 

  26. The research team considered a meta-analysis combing “Direct” and “Indirect” VOLYs but ruled it out on the grounds of the even larger degree of heterogeneity that such a procedure would introduce not least because the VOLYs reported are outcomes from or a meta-analysis (or similar) themselves, over and above the obvious heterogeneity within studies of the same VOLY type. 

  27. Tolley et al. (2004), also based on secondary sources. 

  28. This would require the (future) development of a preference-based framework establishing how a VOLY elicited under the assumptions of expected utility theory maps to that elicited under the assumptions of SWB, which is beyond the scope of this report.