Official Statistics

Winter Coronavirus (COVID-19) Infection Study: quality and methodology information

Updated 14 March 2024

Applies to England and Scotland

About this report

This report outlines the quality and methodology information (QMI) for the Winter Coronavirus (COVID-19) Infection Study official statistics published by UK Health Security Agency (UKHSA). This QMI report helps users understand the strengths and limitations of these statistics, ensuring UKHSA is compliant with the quality standards stated in the Code of Practice for Statistics.

About these statistics

The Winter Coronavirus (COVID-19) Infection Study (Winter CIS) is a joint UKHSA and Office for National Statistics (ONS) study. Winter CIS provides vital information on the epidemiology of SARS-CoV-2, the virus that causes COVID-19, and will help us understand the potential winter pressures on our health services.

Samples size

Winter CIS is a survey involving around 150,000 participants. Participants use lateral flow devices (LFDs) to test for SARS-CoV-2 and the results are reported to ONS who then make this data available to UKHSA for analysis.

Geographical coverage

Winter CIS includes participants from England and Scotland, reported by region, sex and age group.

Measures published

The prevalence and incidence rate and infection hospitalisation rate of SARS-CoV-2, the coronavirus that causes COVID-19.

Prevalence is the proportion of the population infected with the virus.

The incidence rate is the number of new SARS-CoV-2 infections that occur each day per 100,000 individuals.

The infection hospitalisation risk (IHR) is a measure of the risk of hospital admission given that an individual has been infected with the SARS-CoV-2 virus.

Publication frequency

The publication will be produced fortnightly except during public holidays. The first report was released on 21 December 2023 and will run until March 2024.

Quality summary

Winter CIS provides timely insight into the population level epidemiology of the SARS-CoV-2 virus. Estimates are published fortnightly and are made available in as near as real time as possible. UKHSA will usually publish about a week after the data was collected to allow for production and quality assurance.

The statistics generated from the data are estimates based on a sample from a survey and as with all estimates, there is inherent uncertainty. We quantify that uncertainty using credible intervals and publish the intervals so that users can understand the precision of the estimates. As we are unable to sample everyone in the population, we have recruited a cohort that would enable the production of reliable and timely estimates of prevalence and the infection incidence rate.

How to use these statistics

You can use these statistics:

  • to understand the proportion of the population in England and Scotland infected with SARS-CoV-2 (prevalence) at a particular point in time
  • to understand the rate of new SARS-CoV-2 infections per day (incidence rate) in England and Scotland at a particular point in time
  • to understand how prevalence varies over time by nation, region, and age groups
  • to understand how the incidence rate varies over time by nation, region, and age groups
  • to understand the average risk of hospitalisation given that an individual has been infected with the SARS-CoV-2 virus by age group

You should not use these statistics:

  • to compare this study with the previous COVID-19 Infection Survey (CIS), which ran from 2020 to 2023, because the 2 studies used different diagnostic tests and statistical methods, therefore the results are not comparable
  • for unweighted positivity rates of SARS-CoV-2; instead use the ONS statistics on positivity

What you need to know about the statistics

These statistics rely on data collected from a survey. There is always a degree of uncertainty when using surveys to estimate the characteristics of the population. UKHSA will report the median, with corresponding credible intervals. This approach indicates the level of uncertainty associated with an estimate and helps users decide how precise it is. For further information, the ONS has published more details on the subject of uncertainty in surveys.

Small differences in the results might not reflect actual changes in SARS-CoV-2 prevalence or incidence rate in the population and could just represent the inherent variation due to sampling. Subgroups (geographic areas, age groups, and sex) may be under-represented or over-represented in temporal samples of the cohort and therefore, estimates are reweighted using a temporal Bayesian Multilevel Regression and Post-stratification (MRP) approach. This allows the model to gather information across different sub-strata to improve precision and reduce the error due to under-representation.

Methods

Data collection

Winter CIS is a survey involving around 150,000 participants. Participants in the Winter CIS also took part in the previous CIS. As part of the previous survey, participants had agreed to be approached about other research studies with ethical approval. This included adults aged 16 years and over who are willing and able to consent, as well as children aged between 3 years and 15 years, for whom a parent or carer is willing and able to consent to their participation. The study, therefore, does not include participants aged under 3 years.

Participants are sent 14 LFDs to test for the SARS-CoV-2 virus. Every 4 to 5 weeks they are asked to take a LFD test and then answer a 10 to 15 minute questionnaire. Participants have a 7 day window to perform their test. If a participant tests positive, they are asked to complete a short 2 to 3 minute follow-up questionnaire, and to continue to test themselves every other day until 2 negative results are observed. Data from the questionnaires is submitted to ONS who make this available to UKHSA for analysis.

The ONS has published a separate QMI report for their Winter CIS report. This has more information on how participants were selected, and how the data is managed before they make it available to UKHSA for analysis.

Quality assurance process and data flows 

The Winter CIS is conducted in collaboration between the ONS and UKHSA. To ensure consistent, high-quality analytical outputs from both organisations, additional quality assurance checks were put in place.  The ONS have separately published further detail on the quality characteristics of the study in their QMI, which outlines the checks conducted on the data before the transfer to UKHSA.

The consent, main and follow up survey responses are sent from the ONS to UKHSA via secure data transfer. Upstream issues with data quality and processing are flagged by ONS and reported to UKHSA. To ensure consistent processing of the data, UKHSA and ONS agree an approach to resolve issues arisen. UKHSA performs additional checks on the data, which includes ensuring reported tests fall within appropriate time frames. 

Before the modelling of the survey data, descriptive analyses of the data are conducted to check for recording errors and changes to the response rates and cohort subgroup distributions. The code used to run the analysis and each model is version controlled and approved by multiple modellers within the production team. Model diagnostics to understand how well the model fit (posterior predictive checks), as well as key parameters (such as sensitivity) are reviewed by members of the modelling team. Approximate estimates of parameters are calculated and used in conjunction with the academic literature to check that the parameters estimated are appropriate. 

Checks on both the report and data tables are performed to ensure publication outputs match calculated estimates. Dates and estimated values are compared across different outputs, geographies, ages and nations to ensure consistent results. Following modelling checks, separate analysts (who did not run the models) review each output checking for manual errors in the production of the report and data tables. Finally, this is signed off by a senior analyst within the team before being shared with ONS.

Estimating prevalence, incidence and the infection hospitalisation risk

Incidence is calculated by estimating the rate at which new infections of SARS-CoV-2 occur within the subgroups analysed over time, reported as a rate per 100,000 individuals. Prevalence is a measure of the proportion of the population infected with SARS-CoV-2. An infected individual’s exposure date to SARS-CoV-2 occurs prior to testing positive. Incidence is a measure of new infections by exposure date therefore, it is reported with a temporal delay relative to prevalence. The prevalence at a given point in time can be expressed in terms of the number of recently infected individuals that have not yet cleared the virus. Using the repeat testing data collected in the survey, we estimate how long an individual is likely to test positive for, known as the duration of positivity. The model uses the duration of positivity, obtained from follow up testing, to estimate a time series of incidence rates, by demographic group, that is most credible in generating the observed positivity in the survey cohort. This allows the model to then estimate the prevalence at each point in time from the temporal pattern of incidence rates.

The proportion of SARS-CoV-2 positive tests in the study is known as the test positivity. Prevalence is calculated by adjusting positivity to account for imperfect test performance. To do this a statistical model was developed that adjusts for the diagnostic performance of the LFD test.

The model reweights both the estimated prevalence and incidence rate to account for subgroups that might be underrepresented or over-represented in the survey sample.

The infection hospitalisation risk (IHR) measures the risk of hospitalisation given that an individual has been infected with the SARS-CoV-2 virus. The IHR can be estimated using the incidence rate and the number of new hospitalisations through time. It is modelled by temporally matching the incidence time series to the time series of hospitalisations, which adjusts for the delay from an individual becoming infected to going to hospital, and then estimating the proportion of newly infected individuals that went on to be hospitalised. Daily hospitalisations data come from the NHS England COVID-19 Hospital Statistics. The time delay was informed using data from the Emergency Care Data Set linked to testing data from the Second Generation Surveillance System (SGSS). An estimate of the IHR is provided for England only, with an estimate for each age group, and a re-weighted overall figure. No data from Scotland were included due to differing definitions of a COVID-19 hospitalisation between the two countries. A separate IHR for Scotland will be provided in a future release.

Statistical model

A Bayesian multi-level regression with post-stratification (MRP) model was used to produce estimates of prevalence and incidence rates. UKHSA reports prevalence and incidence rates over time, for England and Scotland combined, England, Scotland, by age group, and by English region.

Positivity describes what proportion of the survey reported a positive LFD result. However, some of those sampled that are infected will get a negative test result. These are known as false negatives. False negatives contribute to the calculation of sensitivity, which quantifies the proportion of true positive results (positive test outcomes from infected individuals). Similarly, false positives, where uninfected individuals receive positive test results, contribute to the determination of specificity, which represents the proportion of true negative results (negative test outcomes from uninfected individuals).

The statistical model used accounts for the infections that were missed (false negative tests) and the small number of people who are incorrectly identified as having a SARS-CoV-2 infection (false positive tests). The analysis adjusts positivity to give an estimate of prevalence.

The MRP approach estimates the prevalence and incidence in each survey group by modelling age, regional, and sex-related differences in a multi-level regression. These prevalence and incidence estimates for each group are then reweighted by their true population size to give a more representative estimate by accounting for the demographic composition of different groups (post-stratification).

To capture changes in prevalence over time, the model first estimates the trend over time of SARS-CoV-2 incidence rate using a second-order random walk model. Given the incidence rate, the model is then able to estimate prevalence and positivity. Time trends are allowed to vary by:

  • geographies (defined as Scotland, North East, North West, Yorkshire and Humber, East Midlands, West Midlands, East of England, London, South East and South West)
  • sex (male and female)
  • age groups (defined as 3 to 17 years, 18 to 34 years, 35 to 44 years, 45 to 54 years, 55 to 64 years, 65 to 74 years, 75 years and over)

Surveys need to collect data from different ages, regions, and sexes in the same proportions as they are in the population to be representative. However, some groups were under-represented or over-represented in this survey due to unequal recruitment, retention, and response rates. To address this, post-stratification is used, where the model first estimates the prevalence and incidence rate in each subgroup, and then reweights to be more representative of the overall population in terms of age, sex, and geography. To do this, population estimates broken down by age, sex, and geography from the 2023 to 2024 ONS projections are used.

The models are run using all data from 14 November 2023.

Specificity model

The specificity for the LFD tests which are used in the Winter CIS has been shown to be very high, with estimates over 99.9%. In the analysis, it is assumed specificity between 99.89% and 99.98%, based on the existing evidence.

Sensitivity model

The sensitivity of LFD tests has been found to be lower than that of PCR tests, which were used in the previous ONS Coronavirus (COVID-19) infection survey (CIS). This means data from the previous survey cannot be used to inform sensitivity. Other peer-reviewed studies have reported sensitivity between 55% and 73% for LFD tests.

In the survey, once a participant tests positive for SARS-CoV-2, they are asked to take repeat tests every other day until they return 2 negative tests. This repeat testing data is used to estimate the sensitivity of LFDs for the Winter CIS cohort. The estimated LFD test sensitivity varies over time, due to the epidemic phase, and across age groups. Epidemic phase affects the temporal estimates of test sensitivity, which will be higher when an epidemic is growing, and lower when the epidemic is declining.

Duration of positivity

Incidence was calculated by estimating the rate at which new infections of SARS-CoV-2 occur within the subgroups analysed over time, reported as a rate per 100,000 individuals. Prevalence is a measure of the proportion of the population infected with SARS-CoV-2. The prevalence at a given point in time can be expressed in terms of the number of recently infected individuals that have not yet cleared the virus. Using the repeat testing data collected in the survey, we estimated how long an individual was likely to test positive for, known as the duration of positivity.

Once a participant has tested positive, they were asked to take repeat tests every other day until they return 2 consecutive negative tests. This repeat testing data was used to estimate the length of time that cases test positive. A participant was defined as negative if their subsequent test results was negative. Participants were considered to be positive at the time of symptom onset. If cases did not report a symptom onset date, or did not report 2 consecutive negative tests, then they were treated as partial observations, and were adjusted for by using a censoring approach.

Testing window effects

Each participant cohort has a 7 day window to perform their test within. Participants who test earlier in their window appear to be more likely to test positive than those who test later in the window. Participants who test earlier in their window may have a reason to do so, for example a symptomatic household member. We perform a statistical adjustment based on when participants tested within their window. Over the winter bank holiday period, it is possible that patterns of when participants tested changed, and so a separate adjustment is included for this period.

Uncertainty

Uncertainty in the model comes from 2 main sources. First, the positivity data used in this analysis relies on a sample from the total population, and so there is uncertainty around the positivity estimates used in the model. Secondly, there is uncertainty in the parameters used to convert positivity to prevalence. Because a Bayesian model is used, central estimates are presented alongside credible intervals. These credible intervals can be interpreted as a probabilistic range of values for the estimated prevalence. A wider interval indicates more uncertainty in the estimate.

Reweighting prevalence and incidence rate estimates

The headline figure for the report is the reweighted prevalence for England and Scotland combined. The reported incidence rate estimates are also reweighted. To calculate this, the prevalence and incidence rate estimates for the nation, region, age group, and sex, are weighted using the true population sizes of these groups nationally. This is the post-stratification step of the modelling. The combined England and Scotland estimate is therefore more representative of the true population composition and corrects for the non-representativeness of the survey sample.

For the England national estimates, the same reweighting logic is applied but does so for English region, sex, and age group. Likewise, for the Scottish national estimates, the same reweighting logic is applied but does so for just sex and age group. Age-groups are reweighted according to nation, region, and sex. Estimates for English regions were reweighted based on age-group and sex.

Statistics release schedule

The first release of these statistics, published on 21 December 2023, contained estimates of SARS-CoV-2 prevalence. From 15 February onwards, releases of these statistics also contain estimates of the SARS-CoV-2 incidence rate. On 14 March, estimates of the infection hospitalisation rate were released.

Future releases may also contain statistics on:

  • the infection fatality risk (IFR) which measures the risk of death given that you have been infected with the SARS-CoV-2 virus
  • how effectively vaccinations protect people from health outcomes such as infection, symptomatic disease, hospitalisation, and mortality, which is known as vaccine effectiveness

Most health protection functions in the UK are devolved to the other UK nations’ public health teams. Published data and statistics for COVID-19 is available for Public Health Wales for Wales, Public Health Scotland for Scotland, and the Department of Health for Northern Ireland.

National flu and COVID-19 surveillance reports

The national flu and COVID-19 surveillance reports contain information from surveillance systems which are used to monitor COVID-19, influenza, and diseases caused by seasonal respiratory viruses in England. They include weekly data on the number of laboratory confirmed COVID-19 cases, and positivity of specimens, from a wide range of clinical surveillance systems.

Some of these statistics are also published on the UKHSA data dashboard.

You should use these statistics if you need the latest data on laboratory confirmed COVID-19 cases and the positivity of specimens.

Please note that the positivity in the national influenza and COVID-19 surveillance report is not comparable to what is reported in the Winter CIS report, and requires knowledge and experience to interpret. Winter CIS report is the only source of the positivity (through the ONS publication) and prevalence (through the UKHSA publication) for the population of England and Scotland.

ONS publication: Winter Coronavirus (COVID-19) Infection Study, England and Scotland

The ONS publishes data tables reporting both the unweighted and weighted test positivity rate over time, including breakdowns by region, age and sex. These results reflect the actual data collected via the survey and are published to ensure transparency around the data collected.

Please be aware that the methods used by the ONS are not the same as those applied by UKHSA. You should read through the publications explained document for more information on how the UKHSA and ONS statistics differ.

COVID-19 hospital activity

The COVID-19 hospital activity reports contain information on COVID-19 hospital activity. They provide measures of total beds occupied by confirmed COVID-19 patients, the number of patients admitted with COVID-19, the number of inpatients diagnosed with COVID-19, and the number of COVID-19 related absences of staff, either through sickness or self-isolation.

You should use these statistics if you need the latest data hospital admissions due to COVID-19, and to understand the impact of COVID-19 on hospital staffing.

Further information

Changes to this document

21 December 2023: QMI report first published.
11 January 2024: Addition of testing window effects section.
15 February 2024: Addition of incidence modelling section.
14 March 2024: Addition of infection hospitalisation risk section

Contact information: sipmo@ukhsa.gov.uk

Authors

The lead analysts for this publication are:

Andre Charlett
Christopher Overton
Jonathon Mellor
Martyn Fyles
Robert Paton
Steven Riley
Thomas Ward