Research and analysis

Annex A: supplementary physical activity analysis

Published 1 August 2025

Applies to England

Background

Context

The Better Health: Rewards pilot was designed as a randomised controlled trial to test whether offering financial incentives to adult residents of Wolverhampton could:

The Department of Health and Social Care (DHSC) commissioned the Behavioural Insights Team (BIT) to carry out a mixed-methods evaluation of the pilot, including an impact evaluation, and an implementation and process evaluation of the pilot. BIT is an independent consultancy that acted as an evaluator of the pilot scheme.

Impact analysis in evaluation report

BIT carried out analysis to assess the impact of being offered any financial incentive (compared with being offered no incentive) on primary outcomes for both physical activity and diet. This followed an analysis plan, which was pre-agreed with DHSC and set out in the trial protocol. Full findings and methodology can be found in the evaluation report on BIT’s website.

To assess the impact on physical activity outcomes, analyses were conducted on the sample of participants who wore a wearable fitness tracker for 6 or more hours per day. This was defined as a ‘valid’ day of data in the trial protocol and informed by preliminary data provided by HeadUp (the company who developed the app through which the financial incentives were earned) at the protocol drafting stage.

Using this sample, the analysis showed that offering financial incentives did not statistically significantly impact participants’ physical activity outcomes.

At an early stage of data collection, BIT became aware of evidence to suggest that removing participants who wore a wearable fitness tracker for less than 6 hours per day was excluding lots of valid data. Section 8.2.1 of the evaluation report notes that 11% of participants wore a wearable fitness tracker for less than 6 hours per day and that it was also possible for users to capture physical activity data using mobile phones. As a result, BIT deemed that the time the wearable fitness tracker was worn for was not an accurate indicator of data validity.

Following discussion with the Trial Steering Committee and DHSC, BIT conducted additional analyses on the primary outcomes for physical activity that included observations with less than 6 hours of wearable time. This is published in the full evaluation report under ‘Sensitivity analysis’.

Section 4.2.1.1 of the evaluation report notes that this change resulted in a larger and more balanced sample across several demographic characteristics. It included:

  • more men
  • more participants who were younger
  • a greater number of participants from more deprived areas

Aims of supplementary analysis

Following BIT’s completion of the evaluation, DHSC chose to rerun the analysis for secondary and exploratory physical activity outcomes on the larger sample that includes data observed from wearable fitness trackers worn for less than 6 hours. This decision was taken, in agreement with BIT, to ensure policy recommendations can be informed by a larger and potentially more reliable set of data.   

Definitions

A p-value is a statistical measure that helps determine the strength of the evidence against the null hypothesis. The null hypothesis states that there is no statistical difference between groups as tested by the research hypothesis[footnote 1].

A confidence interval is a statistical concept that provides a range of values around an estimate. This gives an indication of the level of uncertainty of an estimate and helps to describe how precise the estimate is[footnote 2].

Findings

The following section outlines the findings of the impact of financial incentives on physical activity for primary, secondary and exploratory outcomes:

  • primary outcomes include analysis of impacts on moderate-to-vigorous physical activity (MVPA) and step count
  • secondary outcomes include analysis of impacts on primary outcomes (measured at 1, 3 and 5 months), and the impact of different reward levels
  • exploratory outcomes include subgroup analyses

This analysis uses the larger sample of participants, including those who wore wearable fitness trackers for less than 6 hours.

Overall, although the differences that are found are small, they do suggest that financial incentives can have a meaningful impact on participants’ activity levels. These results are aligned with the results and policy recommendations made by BIT to DHSC in the full evaluation report, with some additional insights.

Primary outcomes: how offering financial incentives impacts physical activity outcomes after 5 months

As outlined in the ‘sensitivity analysis’ in BIT’s report, when considering the larger, more valid sample, offering a financial incentive resulted in statistically significant increases in both MVPA and step count compared with the control group:

  • MVPA: +1.93 minutes per day (confidence interval (CI): 1.01 to 2.85), adjusted p-value <0.001
  • steps: +256 steps per day (CI: 71 to 442), adjusted p-value <0.01

Secondary outcomes: how offering financial incentives affects physical activity outcomes after 1, 3 and 5 months

This analysis explored the effects of offering financial incentives (compared with being offered no incentive) on physical activity outcomes at 3 points in time: 1 (M1), 3 (M3) and 5 months (M5) after randomisation.

MVPA

Treatment effects were statistically significant across all 3 points in time, with the largest effects at month 3:

  • M1: +0.82 minutes per day (CI: 0.08 to 1.57), adjusted p-value <0.05
  • M3: +2.72 minutes per day (CI: 1.76 to 3.67), adjusted p-value <0.001
  • M5: +1.93 minutes per day (CI: 1.01 to 2.85), adjusted p-value <0.001

Steps

Treatment effects were statistically significant across all 3 points in time, with effects increasing over time between months 1 and 5:

  • M1: +149 steps per day (CI: 13 to 286), adjusted p-value <0.05
  • M3: +185 steps per day (CI: 15 to 355), adjusted p-value <0.05
  • M5: +256 steps per day (CI: 71 to 441), adjusted p-value <0.01

When compared to the findings of the full evaluation report:

  • this analysis showed that effect sizes are statistically significant across all points in time for both MVPA and steps
  • the treatment effect for MVPA remains largest at month 3, while the treatment effect for steps increases over time, rather than decreasing
  • the headline remains that this analysis does not offer a clear interpretation in effect size across physical activity outcomes over time. This is likely because the sample of participants reduced over time

Table 1 in the supporting data tables includes point estimates, confidence intervals and sample sizes.

Secondary outcomes: how different levels of financial incentives impact physical activity outcomes after 5 months

This analysis explored the impact of different levels of financial incentives (compared with being offered no incentive) on physical activity outcomes 5 months after randomisation.

MVPA

Treatment effects were statistically significant in those receiving low, medium and high rewards, with the highest effects for medium rewards:

  • low: +1.88 minutes per day (CI: 0.57 to 3.18), adjusted p-value <0.01
  • medium: +2.10 minutes per day (CI: 0.68 to 3.53), adjusted p-value <0.01
  • high: +1.82 minutes per day (CI: 0.42 to 3.22), adjusted p-value <0.05

Steps

Treatment effects were statistically significant in those receiving medium rewards, but not for low and high rewards:

  • low: +106 steps per day (CI: −154 to 366), adjusted p-value >0.05
  • medium: +736 steps per day (CI: 454 to 1018), adjusted p-value <0.001
  • high: -4 steps per day (CI: −272 to 264), adjusted p-value >0.05

When compared to the findings of the full evaluation report:

  • the treatment effect for MVPA found statistically significant treatment effects across all incentive levels, rather than no statistically significant treatment effects, with the largest effect for medium rewards
  • the treatment effect for steps remains only for medium rewards, like the main analysis
  • the headline remains that this analysis does not offer a clear interpretation in the relationship between incentive level on physical activity outcomes

Table 2 in the supporting data tables includes point estimates, confidence intervals and sample sizes.

Exploratory outcomes: how offering financial incentives impacts specific population subgroups after 5 months

This analysis explored the effects of offering any financial incentives (compared with being offered no incentive) on physical activity outcomes among subgroups 5 months after randomisation.

This analysis suggests impact of financial incentives on physical activity outcomes is more effective among certain subgroups. Important findings include:

  • female participants saw a statistically significant increase in MVPA and steps compared with the control group:
    • MVPA: +1.81 minutes per day (CI: 0.72 to 2.91), p-value <0.01
    • steps: +355 steps per day (CI: 120 to 550), p-value <0.01
  • participants from more deprived areas (areas with an Index of Multiple Deprivation decile of 1 or 2) saw a statistically significant increase in MVPA and steps compared with the control group:
    • MVPA: +2.68 minutes per day (CI: 1.37 to 3.99), p-value <0.001
    • steps: +444 steps per day (CI: 162 to 726), p-value <0.01
  • participants who were initially inactive (less than 30 minutes of MVPA during the baseline week) saw a statistically significant increase in MVPA and steps compared with the control group:
    • MVPA: +2.30 minutes per day (CI: 1.27 to 3.32), p-value <0.001
    • steps: +285 steps per day (CI: 52 to 518), p-value <0.05
  • participants who initially consumed lower amounts of fruit and vegetables (less than 3 80-gram (g) portions during the baseline week) saw a statistically significant increase in MVPA and steps compared with the control group:
    • MVPA: +1.96 minutes per day (CI: 0.95 to 2.98), p-value <0.001
    • steps: +288 steps per day (CI: 79 to 497), p-value <0.01

When compared to the findings of the evaluation report:

  • this analysis found similar insights as the full evaluation report, with additional statistically significant findings for impact on MVPA and steps for female participants, and those who initially consumed lower amounts of fruit and vegetables
  • this analysis also found additional statistically significant findings for steps for those who were initially inactive and those from more deprived areas
  • these additional statistically significant treatment effects could be due to larger sample sizes across subgroups including more people who typically wore their wearable fitness tracker for shorter periods of time

Table 3 in the supporting data tables includes point estimates, confidence intervals and sample sizes. This includes further analysis by age group and ethnic group.

Analysis has not been adjusted for multiple comparisons. Subgroups with less than 100 people with valid data at M5 were excluded.

Methodology

Pilot design

The Better Health: Rewards pilot was launched in Wolverhampton on 17 February 2023 and closed on 13 October 2023.

Participants were given free fitness trackers that were linked to the free Better Health: Rewards app.

Following registration, participants completed a baseline period to record their usual physical activity and diet behaviour. They were then randomised into 4 research arms:

  • control
  • low reward
  • medium reward
  • high reward

Over 5 months, participants took on challenges to improve their physical activity and diet and collected points for each goal they completed.

While all participants received the app and, if they needed it, the wearable fitness tracker, participants received different levels of financial rewards in exchange for their points depending on which research arm they were randomised to. The control arm received no financial rewards for completing health challenges. This made it possible to evaluate the impact of the financial incentives alone.

For evaluation purposes, all participants, regardless of treatment arm, were asked to submit diet and physical activity data at months 1, 3 and 5, and were rewarded equally for doing so. 

All participants were able to spend the money they earned through challenges or by providing their physical activity or diet data in the in-app e-store, which offered a range of rewards. 

Sample

Wolverhampton residents aged 18 and over were eligible to take part in the pilot.

Data collection

Physical activity data was recorded through wearable fitness trackers for the full duration of the pilot - all participants could order a free tracker if they did not already own one.

Participants completed a baseline period to record their usual physical activity and diet behaviour. They were then randomised into 4 research arms:

  • control
  • low reward
  • medium reward
  • high reward

For data analysis purposes participants were asked to sync their device with the app at months 1, 3 and 5 after randomisation, and were rewarded for doing so. 

Approach for supplementary analysis

This supplementary analysis extends the ‘sensitivity analysis’ carried out within the full evaluation report to consider the impact of financial incentives on secondary outcomes and exploratory outcomes for physical activity. This analysis was carried out by mirroring BIT’s approach to the ‘sensitivity analysis’.

Analysis plans are set in trial protocols before a study begins. Sometimes these are altered after the analysis begins for several reasons. When this happens it is called a deviation. Both the ‘sensitivity analysis’ in the evaluation report and this supplementary analysis follow the analysis plan set out in the trial protocol with one deviation. This deviation was that the analysis was conducted on a larger sample of participants, as it included those who wore the wearable fitness tracker for less than 6 hours. This was because evidence found that the time the wearable fitness tracker was worn for was not a good indicator of data validity. This change to the analysis was designed and agreed before data from month 5 was accessible.

Missing physical activity data was replaced with observed values on the same day within the 2 weeks before or after.

The analysis uses an intention-to-treat approach and a linear mixed effects model - this means it can assess the impact of being offered any financial incentive, compared with being offered no financial incentive.

The analysis accounts for individual characteristics of:

  • age
  • sex
  • ethnicity
  • education
  • body mass index at baseline
  • attrition (drop-out rate) across trial arms
  • the brand of wearable fitness tracker

It uses inverse probability weighting described in Annex E of the evaluation report.

P-values for secondary outcomes were adjusted for multiple comparisons using the Benjamini Hochberg correction. P-values for exploratory analysis of sub-groups were not adjusted for multiple comparisons.

Full details of the methodology can be found in the evaluation report on BIT’s website.

Limitations

The limitations of this analysis are the same as those reported in chapter 8 of the full evaluation report.

The specific location and time-period in which the pilot took place may limit the generalisability of the findings.

Outcome data for physical activity was collected through a range of wearable fitness trackers. Although the analysis attempts to control for variation between wearable fitness trackers, differences in the functionality may have played a role in how participants used them, which may have impacted wear time and data collection.

All participants, including the control group, received the app and were offered a free wearable fitness tracker as part of the pilot - it is possible that the wearable fitness tracker and/or app alone played a role in encouraging healthier behaviour.

Feedback

For further information or to provide any feedback on the publication, contact us at: statistics@dhsc.gov.uk

  1. National Center for Biotechnology Information. Hypothesis Testing, P Values, Confidence Intervals, and Significance. Jacob Shreffler and Martin R Huecker. 

  2. Office for National Statistics. Uncertainty and how we measure it for our surveys. Accessed 20 November 2024.