Guidance

Factorial randomised controlled trial: comparative studies

How to use a factorial randomised controlled trial to evaluate your digital health product.

This page is part of a collection of guidance on evaluating digital health products.

A factorial randomised controlled trial (RCT) is a specific type of RCT. It lets you carry out 2 or more comparisons at the same time.

What to use it for

Use a factorial RCT to assess several different elements of a product or service. You would generally use it when developing your product (formative or iterative evaluation).

Factorial RCTs are less useful for finding out whether your final product achieves its aims (summative evaluation) because there is less evidence for any one option. Usually, a summative trial of the final version of the product against a control will be carried out afterwards.

Pros

Benefits include:

  • you can test several different options relatively efficiently, which can save time and resources
  • as with any RCT, it can produce definitive answers because randomisation makes sure that participants in each group are similar

Cons

Drawbacks include:

  • it works on the assumption that the effects of different elements of the product don’t interact. If they do interact, the results will be less accurate and may be biased
  • it is often expensive to do well
  • choosing an appropriate comparison (control) can be difficult

How to carry out a factorial RCT

Imagine you have a basic app to promote physical activity and you’ve developed 3 new features you want to evaluate:

  • an activity tracker (A)
  • a boxercise feature with videos demonstrating exercises (B)
  • a chat function that lets people talk to other users (C)

You could run a trial comparing the basic version of the app to a new version with all 3 features, but then you wouldn’t know whether all the features are effective. Instead, you can run a factorial RCT.

You have 3 new features and you can either include or exclude each feature. This means there are 8 possible versions of the app:

  • the basic
  • with feature A
  • with feature B
  • with feature C
  • with features A+B
  • with features A+C
  • with features B+C
  • with all 3 features

In a factorial trial, you test all of these.

If you had 200 participants, there would be 25 participants in each group. However, if you wanted to estimate the effectiveness of feature A, you would compare 100 people who had it (in groups A, A+B, A+C, A+B+C) to 100 who did not (in groups B, C, B+C, and just with the basic app). You can do the same with the other 2 features. This achieves good statistical power to measure the effect of all 3 features without having to run 3 trials.

Factorial RCTs rely on there being no interaction between the effects of different elements of the product. If you expect there to be an interaction effect, another design will be better. For example, the Activity tracker might be more effective at motivating people when they also had the Boxercise feature to show them what activity to do. The Boxercise feature might be more effective when people can also easily track their activity. So the effect of both features together is greater than just the sum of the effects of each feature on its own (a positive interaction or a synergistic effect).

Even if you don’t expect there to be an interaction effect, you should test whether there is one. If there is an interaction effect and you don’t correctly assess it, your results may be wrong. You should account for the interaction effect in the analysis.

However, this means that you have to treat different combinations of features as having distinct effects, so you do not get the benefit of this design. The value of the factorial design depends on there being no interaction effect. In digital health trials, there will often be an interaction. If this is the case, multi-arm designs are better. One approach is multi-arm multi-stage (MAMS) trials.

The factorial trial can also estimate the interaction effect (often written A×B), but it is not as precise at doing this. This is because the estimate of the effect of the interaction is only based on the 50 people who had both features A and B (groups A+B, A+B+C).

Example: Drink Less

See Crane and others (2018): A smartphone app to reduce excessive alcohol consumption: Identifying the effectiveness of intervention components in a factorial randomised control trial.

This factorial design considered 5 app components for an alcohol reduction app called Drink Less. This means it had 2 × 2 × 2 × 2 × 2 = 32 groups. There were 672 study participants. The main outcome was number of units of alcohol consumed. Secondary outcomes included app usage.

The components were Self-monitoring and Feedback, Action Planning, Identity Change, Normative Feedback, and Cognitive Bias Re-training. Each component had a minimal version and an enhanced version.

The analysis only considered the main effects of each module and two-way interactions (interaction between 2 components). They found that:

  • the combination of enhanced Normative Feedback and Cognitive Bias Re-training produced a greater reduction in alcohol consumption (the primary outcome measure) than the basic app
  • the combination of enhanced Self-monitoring and Feedback and Action Planning produced greater improvements in AUDIT score (a measure of problem drinking) than the basic app
  • use of enhanced Self-monitoring and Feedback was associated with using the app more frequently, and with people rating the app more positively on helpfulness, likelihood to recommend, and satisfaction

The team were cautious in interpreting the importance of the 2-way interactions they found, because the study was not specifically designed to test them. They concluded that, while no definitive evidence was found for the effectiveness of their modules, an app with the enhanced versions of the Normative Feedback, Cognitive Bias Re-training, Self-monitoring and Feedback, and Action Planning components might produce good results.

More information and resources

Jaki T., Vasileiou D.: Factorial versus multi-arm multi-stage designs for clinical trials with multiple treatments. Stat Med. 2017 Feb 20;36(4):563–580. This article argues that multi-arm multi-stage designs are generally a better approach.

Collins L.M., Murphy S.A., Strecher V.: The Multiphase Optimization Strategy (MOST) and the Sequential Multiple Assignment Randomized Trial (SMART): New Methods for More Potent eHealth Interventions. American Journal of Preventive Medicine 2007, 32(5): S112–S118. This article argues for the use of factorial and fractional factorial designs in evaluating digital health interventions as part of the Multiphase Optimization Strategy (MOST).

Examples of factorial RCTs in digital health

Healthy Campus Trial: a multiphase optimization strategy (MOST) fully factorial trial to optimize the smartphone cognitive behavioral therapy (CBT) app for mental health promotion among university students: study protocol for a randomized controlled trial. An example of a protocol for a factorial RCT using Multiphase Optimization Strategy (MOST). Researchers plan to assess 5 app components intended to increased well-being and decrease stress.

Adams and others (2017): Adaptive goal setting and financial incentives: a 2 × 2 factorial randomized controlled trial to increase adults’ physical activity. Researchers used a factorial RCT to test the main effects of 2 goal-setting strategies (adaptive vs static goals) and 2 rewards (immediate and delayed) on step count.

Published 20 March 2020