Quality and methodology information (QMI) for healthcare-associated infections (HCAI) reports
Updated 25 September 2025
About this report
This report outlines the quality and methodology information (QMI) relevant to the healthcare-associated infections (HCAI) statistics which are published by the UK Health Security Agency (UKHSA). HCAI statistics are published monthly, quarterly and annually and include:
Accredited official statistics:
- HCAI monthly data sets
- Quarterly epidemiological commentary: Mandatory Gram-negative bacteraemia, MRSA, MSSA and C. difficile infections: hereafter ‘Quarterly epidemiological commentary’
- Annual epidemiological commentary: Gram-negative, MRSA, MSSA bacteraemia and C. difficile infections: hereafter ‘Annual epidemiological commentary’
- HCAI annual data sets
Official statistics in development:
- Annual commentary on MRSA, MSSA and Gram-negative bacteraemia and Clostridioides difficile infections from independent sector healthcare organisations in England: hereafter ‘Independent sector report’
This QMI report supports users in understanding the strengths and limitations of these statistics, ensuring UKHSA is compliant with the quality standards stated in the Code of Practice for Statistics. The report covers:
- the strengths and limitations of the data used to produce the statistics
- the methods used to produce the statistics
- the quality of the statistical outputs
About the statistics
These statistics present trends in the counts and rates of the 6 infections which comprise the mandatory surveillance of bacteraemia and Clostridioides difficile (C. difficile) infections (CDI): Escherichia coli (E. coli) bacteraemia, Pseudomonas aeruginosa (P. aeruginosa) bacteraemia, Klebsiella species (Klebsiella spp.) bacteraemia, meticillin-resistant Staphylococcus aureus (MRSA) bacteraemia, meticillin-susceptible Staphylococcus aureus (MSSA) bacteraemia and CDI. The data is broken down by various key epidemiological and clinical characteristics.
Geographical coverage: England
Publication frequency:
- monthly (monthly data tables)
- quarterly (quarterly epidemiological commentary)
- annual (annual epidemiological commentary, annual data tables, independent sector report)
Change log
25 September 2025: Revised the standard population for age- and age-sex standardised rates which now use ONS mid-year estimate populations (for ethnicity and deprivation analyses, respectively) rather than the 2013 European Standard Population. Revised CDI HOHA definition in the AEC in line with bacteraemia and the QEC. New QMLR blood culture and CDI stool specimen positivity metrics added to AEC. Revised sampling rates which now use England population denominators for England-level metrics instead of bed-days denominator for AEC.
10 July 2025: Added calculations for rate of blood culture sets, overall stool specimens examined, rate of stool specimens examined for CDI diagnosis, pooled E. coli, Klebsiella spp., P. aeruginosa, MRSA and MSSA blood culture positivity and CDI positivity in the ‘Data set production’ subsection.
10 April 2025: Acknowledged a change of definition in date of admission in the ‘Sound methods’ section.
5 February 2025: Revised the definition of CDI HOHA, reviewed unlocks and HCAI DCS-SGSS sections with analysis in 2023/2024.
9 January 2025: Added the link to a relevant R package and seasonality considerations to the ‘Sound methods’ section.
18 December 2024: QMI report first published.
Contact
To contact the team responsible for producing these statistics, please email mandatory.surveillance@ukhsa.gov.uk
Suitable data sources
Statistics should be based on the most appropriate data to meet intended uses.
This section describes the data used to produce the statistics.
Data sources
The primary data source for the numerator data is the mandatory surveillance of bacteraemia and CDI which is collected through UKHSA’s HCAI DCS (data capture system) covering 6 data collections. Mandatory surveillance began in response to increasing rates of MRSA bacteraemia across NHS trusts and was subsequently rolled out for other HCAIs when concern arose. Independent Sector Providers were also mandated to submit data from April 2009. There is a complete timeline of changes to mandatory surveillance available.
The inclusion criteria for reporting each bacteraemia or infection to the surveillance system are:
- for MRSA bacteraemia, positive blood cultures caused by S. aureus resistant to meticillin, oxacillin, cefoxitin or flucloxacillin
- for MSSA bacteraemia, positive blood cultures caused by S. aureus which are susceptible to meticillin, oxacillin, cefoxitin, or flucloxacillin, and not subjected to MRSA reporting
- for E. coli bacteraemia, all laboratory-confirmed positive blood cultures cases of E. coli bacteraemia
- for CDI (patients two years and older),
- diarrhoea stools (Bristol Stool types 5 to 7) where the specimen is C. difficile toxin positive
- toxic megacolon or ileostomy where the specimen is C. difficile toxin positive
- pseudomembranous colitis revealed by lower gastro-intestinal endoscopy or computed tomography
- colonic histopathology characteristic of CDI (with or without diarrhoea or toxin detection) on a specimen obtained during endoscopy or colectomy
- faecal specimens collected post-mortem where the specimen is C. difficile toxin positive or tissue specimens collected post-mortem where pseudomembranous colitis is revealed, or colonic histopathology is characteristic of C. difficile infection
For the annual epidemiological commentary, the mandatory surveillance data is linked to the data sources below to obtain key epidemiological characteristics.
Additional patient characteristics such as patient postcode, GP code, fact of death and date of death, are obtained from linkage with the NHS Spine Summary Care Records data set.
NHS acute trust-level population data does not currently exist in England as NHS acute trusts do not treat patients within defined geographical boundaries. Therefore, a suitable proxy for population is required to calculate hospital-onset and hospital-onset-healthcare-associated (HOHA) rates. The occupied overnight bed-days from the national KH03 data set provides the daily average overnight bed occupation for a specific time period: annually from financial year 2007 to 2008 to financial year 2009 to 2010 and quarterly from financial year 2010 to 2011 onwards. This data set is an open access return published by NHS England and provides a measure of clinical activity in each trust, which is used as a proxy measure of the patient population. The latest published KH03 data may include revisions to previously published data.
The high-level ethnic group for each case is identified by linking our surveillance records using NHS number and date of birth to NHS England Hospital Episode Statistics Admitted Patient Care (HES APC) records. This approach aligns with the COVID-19 Health Inequalities Monitoring for England (CHIME) tool, except for linkage to Accident and Emergency (A&E) and outpatient (OP) datasets, which existing methods do not utilise but expected to be incorporated in future. HES APC records since financial year 2003 to 2004 are considered and where multiple ethnicities are found for an individual, the most frequently occurring ethnicity is kept.
The Office for National Statistics (ONS) mid-year population estimates up to and including 2022 (the latest at the time of query) are used at national and integrated care board (ICB)-level for the calculation of total, community-onset and community-onset-community-associated (COCA) incidence rates. For 2023, 2024 and 2025 estimates we assumed they stayed the same as the proxy year 2022.
Data quality
The data that we use to produce statistics must be fit for purpose. Poor quality data can cause errors and can hinder effective decision making.
We have assessed the quality of the source data against the data quality dimensions in the Government Data Quality Framework.
This assessment covers the quality of the data that was used to produce the statistics, not the quality of the final statistical outputs. The quality summary assesses the quality of the final statistical outputs.
Strengths and limitations of the mandatory surveillance data
The strengths of the data
The surveillance is at patient-level and in real-time, including both risk factor data and information on date of positive specimen, date of inpatient admission, and date of recent discharge, which allow for onset location and prior trust exposure to be ascertained. These enhanced data provide a platform to identify potential interventions, which could not be garnered from other surveillance schemes in England.
In addition, the surveillance scheme is a census of all microbiologically-confirmed episodes of bacteraemias and CDI, which provides up to 2% greater ascertainment than comparative voluntary surveillance schemes (excluding CDI cases, due to issues with voluntary surveillance described in the Routine SGSS-DCS audit sub-section with voluntary laboratory surveillance data). Of note, the financial penalty structure on trusts for incomplete data has not been a major factor in the reduction of CDI in England (Gerver et al., 2015).
Well-completed patient identifiers allow for direct linkage with other data sources which make fuller data sets and reduce data entry burden for trusts. For example, data can be linked from the mandatory surveillance scheme with data from the voluntary laboratory reports to access antimicrobial susceptibility information, HES APC for comorbidity information and prior healthcare interactions.
Reporting from the live mandatory surveillance database, HCAI DCS, for registered users such as healthcare professionals provides real-time statistics and other tabulations or graphical representations of their data.
The limitations of the data
Despite the ability to link the mandatory surveillance data with other data sets, the completion of the data return takes time which leads to variable field completion for the non-mandatory fields and restricts the data’s utility.
There is a potential conflict between the use of these data for epidemiological purposes by UKHSA and performance management or audit by others.
While the effect on data validity is not currently of great concern, as discussed in ‘Mandatory HCAI Surveillance Data in NHS performance management’, the emphasis on performance management surrounding reductions in MRSA bacteraemia and CDI could lead to an emphasis on the infection prevention and control of these infections over others which have not had similar attention.
Summary
The mandatory data completion requirements, combined with long-term surveillance and enriched data, make the mandatory surveillance data the most reliable and suitable source for these statistics.
Accuracy
Accuracy is about the degree to which the data reflects the real world. This can refer to correct names, addresses or represent factual and up-to-date data.
The accuracy of the case-level data submitted to the mandatory surveillance of healthcare associated infections scheme is assured by the chief executive officer (CEO) of all the reporting acute trusts via the monthly sign off process, which was mandated by Chief Medical Officer (CMO) from October 2005. To add or delete cases after the sign-off, the reporting organisation CEO needs to request an unlock to the mandatory surveillance team in a formal process described in the Accuracy and reliability sub-section.
Completeness
Completeness describes the degree to which records are present.
For a data set to be complete, all records are included, and the most important data is present in those records. This means that the data set contains all the records that it should and all essential values in a record are populated.
Completeness is not the same as accuracy as a full data set may still have incorrect values.
Routine SGSS-DCS audit
We undertake routine comparison and quality assurance of HCAI DCS data with voluntary laboratory surveillance data and the Second-Generation Surveillance System (SGSS), which is used by laboratories to report cases of microbial infection from various samples like blood, urine and faeces. Information on antibiotic and antifungal susceptibility is also submitted where relevant. Although primarily an internal system used by healthcare professionals, the data reported via this system is routinely compared to the mandatory data collected via HCAI DCS. This routine comparison between surveillance systems provides a data quality check of case ascertainment on the HCAI DCS.
It is not currently possible to include C. difficile data in the routine HCAI DCS and SGSS comparison as information on these cases is not comparable due to data quality and reporting issues in the SGSS. C. difficile testing is a two-stage process where the second stage identifies the C. difficile toxin. As only C. difficile toxin-positive cases are reportable to the mandatory surveillance system, it is not currently possible to differentiate reported C. difficile cases which have tested positive for C. difficile toxins from those which have not with an acceptable degree of accuracy from the SGSS.
In general, more cases are captured via the HCAI DCS than the SGSS. Meticillin resistance in the mandatory surveillance is reported by NHS acute trusts after susceptibility testing but meticillin resistance in the SGSS is determined by selecting the most severe susceptibility results from patients’ blood cultures within a 14-day period. This difference explains some of the apparent over-ascertainment of the voluntary MRSA reports in some financial years, and therefore this should be considered when comparing the case numbers for MRSA and MSSA bacteraemia for SGSS versus HCAI DCS.
Not all cases in the SGSS would be reported to the HCAI DCS. SGSS cases for each bacteraemia and infection reported to the HCAI DCS are defined as:
- for MRSA bacteraemia, the earliest S. aureus blood isolate per patient within a 14-day period with resistant or indeterminate result to meticillin, oxacillin, cefoxitin or flucloxacillin result within the 14-day period
- for MSSA bacteraemia, the earliest S. aureus blood isolate per patient within a 14-day period with a susceptible result to meticillin, oxacillin, cefoxitin or flucloxacillin result within the 14-day period
- for E. coli bacteraemia, the earliest E. coli blood isolate per patient within a 14-day period
- for Klebsiella spp. bacteraemia, the earliest Klebsiella spp. (including Enterobacter aerogenes blood isolate per patient within a 14-day period)
- for P. aeruginosa bacteraemia, the earliest P. aeruginosa blood isolate per patient within a 14-day period
Ordered matching between cases from HCAI DCS and SGSS show only about 5% of infection captured by the HCAI DCS cannot be found in the SGSS.
As part of routine laboratory data checks, laboratories with cases reported to the SGSS but not identified in the HCAI DCS are contacted for feedback on the discrepancy. The cases are closed if:
- the unmatched case is subsequently identified in the HCAI DCS
- the unmatched case is added to the HCAI DCS as a new record
- there is a legitimate reason for not reporting it to the HCAI DCS
Accounting for the open cases identified in the SGSS, the HCAI DCS captures an estimated 98%, 96%, 96% and 97% of E. coli, Klebsiella spp., P. aeruginosa and S. aureus bacteraemia cases, respectively which are eligible for mandatory reporting, suggesting the HCAI DCS provides an accurate national picture of the overall burden of infection which is under mandatory surveillance in England.
Uniqueness
Uniqueness describes the degree to which there is no duplication in records. This means that the data contains only one record for each entity it represents, and each value is stored once.
Some fields, such as National Insurance number, should be unique. Some data is less likely to be unique, for example geographical data such as town of birth.
The HCAI DCS has a de-duplication algorithm where cases reported by the same reporting organisation with matching NHS number and date of birth are flagged to the reporting trust to determine whether the case is a true duplicate. The CEO of the reporting organisation is required to sign-off data monthly which provides an additional verification of the uniqueness of the data.
Consistency
Consistency describes the degree to which values in a data set do not contradict other values representing the same entity. For example, a mother’s date of birth should be before her child’s.
Data is consistent if it does not contradict data in another data set. For example, if the date of birth recorded for the same person in 2 different data sets is the same.
The HCAI DCS includes various validation rules which prevent the entry of invalid dates such as disallowing a specimen date preceding a patient’s date of birth. In such cases, a meaningful validation error message will be displayed to the data-entry user to correct the input before proceeding.
Timeliness
Timeliness describes the degree to which the data is an accurate reflection of the period that it represents, and that the data and its values are up to date.
Some data, such as date of birth, may stay the same whereas some, such as income, may not.
Data is timely if the time lag between collection and availability is appropriate for the intended use.
Cases entered in the HCAI DCS require sign-off by the CEO of the reporting organisation on the 15th of each month for the previous month’s data. Hence there is minimal delay between the data collection and availability.
Validity
Validity describes the degree to which the data is in the range and format expected. For example, date of birth does not exceed the present day and is within a reasonable range.
Valid data is stored in a data set in the appropriate format for that type of data. For example, a date of birth is stored in a date format rather than in plain text.
HCAI DCS enforces data validation for many fields to prevent users from entering incorrect information to ensure accuracy. For example, the “specimen date” must be in the correct date format, otherwise, an error message will be displayed to resolve before further progress. Whenever possible, drop-down lists help minimise data entry errors.
Sound methods
Statistical outputs should be made using the best available methods and recognised standards.
This section describes how the statistics were produced and quality assured.
Data set production
All cases of bacteraemia and CDI originate in the hospital (hospital-onset (HO)) or community (community-onset (CO)).
A case of bacteraemia or CDI is classified as hospital-onset if it meets all the following criteria:
- the patient is an in-patient, day-patient, emergency assessment patient or unknown location,
- the specimen was taken at an acute trust or at an unknown location and
- the specimen was taken on or after day 3 of the admission (admission date is considered day 1).
Cases that do not meet all the above criteria are categorised as community-onset.
Any publications prior to 23 January 2025 will use the previous hospital-onset definition for CDI which included specimens taken on or after day 4 of the admission (admission date is considered day 1).
It is not possible for UKHSA to change the onset status of a case as it is determined by the above criteria based on the data provided by the reporting organisation. A case may change from one category to another only if the relevant case details are incorrect and requires amendment by the trust. Reports published before September 2017 used the term ‘trust-apportioned’ for hospital-onset cases and ‘not trust-apportioned’ for community-onset cases which was simply a change in terminology.
All cases are also attributed to an ICB. ICBs, which cover a specific geographical area, are NHS organisations responsible for planning health services for their local population.
A sub-ICB for each case is attributed in the following order:
- If the patient’s GP practice code is available (and is based in England), the case will be attributed to the ICB at which the patient’s GP is listed, or
- If the patient’s GP practice code is unavailable but the patient is known to reside in England, the case is attributed to the ICB catchment area in which the patient resides, or
- If both the patient’s GP practice code and patient post code are unavailable or if a patient has been identified as residing outside England, then the case is attributed to an ICB based on the postcode of the headquarters of the acute trust that reported the case.
For ICBs, all cases of bacteraemia and CDI are attributed to an ICB regardless of onset. UKHSA’s HCAI DCS does not currently request NHS organisations to record patient ICB details for any bacteraemia or CDI case such as patient GP registration details and patient residential postcode. However, to obtain this data, an extract comprising patient NHS number, date of birth, patient forename, patient surname and sex are submitted to NHS Digital via Demographics Batch Services (DBS) tracing service daily and matched in a two-stage algorithm using a combination of the provided patient details.
Cases are categorised into one of the following 6 groups for CDI:
Cases are categorised into one of the following 6 groups for CDI:
- Hospital-onset-healthcare-associated (HOHA): date of onset is 2 or more days after admission (day 3 or greater, where day of admission is day 1).
- Community-onset-healthcare-associate (COHA): is not categorised HOHA and the patient was most recently discharged from the same reporting trust in the 28 days prior to the specimen date.
- Community-onset-indeterminate-association (COIA): is not categorised HOHA and the patient was most recently discharged from the same reporting trust between 29 and 84 days prior to the specimen date.
- Community-onset-community-associated (COCA): is not categorised HOHA and the patient has not been discharged from the same reporting organisation in the 84 days prior to the specimen date.
- Unknown: the reporting trust answered ‘Don’t know’ to the question regarding previous discharge in the 3 months prior to CDI case.
- No Information: the reporting trust did not provide any answer for questions on prior admission.
Cases are categorised into one of the following 5 groups for each bacteraemia:
- HOHA: date of onset is 2 or more days after admission (day 3 or greater, where day of admission is day 1).
- COHA: is not categorised HOHA and the patient was most recently discharged from the same reporting trust in the 28 days prior to the specimen date.
- COCA: is not categorised HOHA and the patient has not been discharged from the same reporting organisation in the 28 days prior to the specimen date.
- Unknown: the reporting trust answered ‘Don’t know’ to the question regarding previous discharge in the month prior to the current case.
- No Information: the reporting trust did not provide any answer for questions on prior admission.
Prior trust categories use the following denominator data when calculating rates:
- HOHA: the infection occurred within hospital and is healthcare-associated. Hospital overnight bed-days are used as a denominator as the patient has already been admitted to hospital.
- COHA: the infection occurred within the community but is healthcare-associated. Hospital overnight bed-days and hospital day-only are used as a denominator. The addition of ‘day only’ accounts for community cases who have not been admitted and may initially present as day-only.
- COCA: the infection occurred within the community and is community associated. Population data is used in the rate calculation.
An R package for working with data downloaded from the HCAI DCS can be found on GitHub.
From 1 April 2024 following discussions with NHSE partners, trusts were required to enter the decision-to-admit date as the admission date rather than the inpatient admission date for patients who were admitted to A&E. This is to account for patients spending increasingly longer periods in A&E and thereby increasing the risk of infection who would otherwise be categorised as CO cases rather than as HOHA cases following a positive specimen.
Monthly data tables
These data tables include a monthly count of total reported cases for each data collection as well as a breakdown by prior trust exposure for the last 13 months. The counts are reported at national (England), ICB, NHS acute trust, UKHSA centre and NHS region levels. The data tables also record whether the data was signed off.
Quarterly Epidemiological Commentary (QEC)
The incidence rate of total and CO cases is calculated using their quarterly count and the mid-year population for England. It is converted to an annualised incidence rate to allow comparisons with annual incidence.
Its calculation is: the count of reported episodes in England in a given quarter divided by the mid-year population of England in that year, multiplied by the number of days in that year, divided by the number of days in that quarter and multiplied by 100,000.
The incidence rate of HO cases is calculated using their quarterly count and the KH03 average bed-day activity for England.
Its calculation is: the count of reported episodes in a given quarter in England divided by the daily average number of occupied overnight beds in that quarter in England, then divided by the number of days in the same quarter and multiplied by 100,000.
Percentage changes in rates when presented in the commentary have been rounded to one decimal place. Similarly, graphs included in this report use unrounded calculation numbers. The unrounded calculation numbers are included in the Quarterly Epidemiological Commentary’s accompanying data.
Annual Epidemiological Commentary (AEC)
The Office of National Statistics (ONS) mid-calendar year population estimates are used to calculate the financial year population. For instance, for financial year 2023 to 2024 mandatory surveillance data, we use the mid-calendar year 2022 population estimate. For the latest year the mid-2024 population was unavailable at the small area population (SAPE) level, so the most recent (mid-2022) estimate was used. Most analyses rely on the SAPE level extracted at sub-ICB level which is then aggregated up to England level.
Rates at the England, ICB or sub-ICB level are calculated by number of new cases that fall into that area, divided by their population and multiplied by 100,000, expressed as ‘per 100,000 population’.
Bed occupancy data (KH03) from NHS England is as an indicator of the total activity in each trust during the relevant periods and is used in the denominator of acute trust rates. KH03 has been published quarterly since April 2010.
The denominator of acute trust rates for all cases or hospital-onset healthcare-associated (HOHA), hospital-onset (HO) or community-onset (CO) ones use the total overnight beds KH03 metric. However, for community-onset healthcare-associated (COHA) cases, the denominator is the total of the ‘overnight beds’ plus ‘day-only admissions’ KH03 metrics then multiplied by the number of days in the relevant period.
The acute trust rate is then the number of new cases reported by the trust, divided by the relevant denominator multiplied by 100,000: the rates of all, HOHA, HO or CO cases is expressed as ‘per 100,000 bed-days’ while for COHA it is ‘per 100,000 bed-days and day admissions’.
Prior to trust apportioning, the rates for all cases were calculated per acute trust. Therefore, to retain the historical time series, an all-cases rate per acute trust is also calculated. ‘All reported cases’ refers to all bacteraemias or C. difficile infections that are detected by the acute trust that processed the specimen. It does not necessarily imply the infection was acquired there.
To calculate time-to-onset of an episode (bacteraemia or CDI) among inpatients, the number of days between the date of admission to an NHS acute trust and the date of positive specimen are used. This was performed for only patients who were admitted to an acute trust and for those whose specimen was taken on or after the date of admission also at an NHS acute trust. The number of days between the date of admission and the date of specimen is then grouped into meaningful categories by the number of days.
To assess seasonal trends, case data were aggregated by financial quarter, organism and onset type (HO, CO and total cases) while observations from years before financial year 2010 to 2011 were excluded. Percentages were calculated by dividing the number of cases for each organism, financial quarter and onset type by the total cases for that organism and onset type within the corresponding financial year and multiplied by 100.
To calculate the age-sex stratified denominator of rates by deprivation, ONS mid-year populations of 1-year age-sex bands by the lower layer super output area (LSOA) (2021 version) total during years 2018 to 2022 are used and linked to IMD (Index of Multiple Deprivation) deciles. IMD deciles are then converted into quintiles. The IMD quintile for each case is identified by using the postcode of residence at the time of infection and linked to the LSOA (2011 version) of residence and its 2019 IMD decile.
If the postcode was unavailable on the HCAI DCS then HES APC was searched for the patient’s postcode of a HES APC record closest to the time of specimen. Populations are then converted to financial year-level.
To calculate the age-stratified denominator of rates by ethnicity, 2021 Census populations using 5-year age bands are used. As population data by ethnic group are not available for 2018 to 2020 and 2022 to 2024, the 2021 Census populations by ethnic group and age were used as a proxy. The proportions of each age and ethnic group observed in 2021 were applied to the respective mid-year ONS populations for 2018 to 2020 and 2022 to 2024. These estimates were then converted from calendar year to financial year.
For the ethnicity analysis the ONS populations were up to mid-2023, as we use the England-level, unlike the SAPE-level that is only available up to mid-2022. Downscaling and upscaling the 2021 census population to achieve earlier and later years’ age-stratified ethnic populations, respectively, assumes that England’s age and ethnic distributions have not changed substantially since 2018. As C. difficile infection numerators only include people aged two years and over, denominators are similarly restricted to this age group but only in the mortality, age-sex pyramid figures, ICB rates, deprivation and ethnicity analyses. It was decided for all other analyses that CDI denominators should not drop under-twos for temporal consistency - as national rates are likely to be used for target setting.
The observed (crude) incidence rates were calculated for each organism and financial year as the number of infections in a financial year in each ethnic group or IMD quintile divided by the population in that ethnic group or IMD quintile, then multiplied by 100,000.
National population estimates provide age-sex stratification for deprivation denominators and age-stratification for ethnicity denominators which is a prerequisite for computing directly standardised rates - the ONS and former Department for Levelling Up, Housing and Communities contributed to the IMD estimates (up to and including 2022, with 2022 serving as proxy for 2023 and 2024), while the 2021 Census provides the ethnicity estimates (with 2021 serving as proxy for years 2018 to2020 and 2022 to2024, assuming ethnicity distribution remained constant during this period).
Directly standardised rates (DSRs) by deprivation (age-sex standardised) and ethnicity (age standardised only) are estimated using national guidance.
The confidence intervals for these DSRs use Byar’s method using the PHE indicator methods R package.
Age-sex or age standardised populations for each financial year were computed using ONS populations from the same respective years. Neither age-sex standardisation nor one-year age bands were used for ethnicity, because the publicly available ONS populations had heavy censoring in older age groups for some ethnic groups. This censoring could bias estimates more than the benefits that age-sex standardisation or narrower age bands would provide. Like previous years, we replaced censored values in age-ethnic group denominator data with the value 5.
DSRs are not calculated for counts less than 10 when the method becomes unreliable, whereas crude rates (labelled as the ‘Observed’ rate in the deprivation and ethnicity analyses) do not have this restriction. For statistical consistency, confidence intervals for crude rates in deprivation and ethnicity analyses use Byar’s method throughout the time series. Most of the data points of these time series involve sufficiently large counts to warrant this method even though a few data points will have low counts that are normally incompatible with Byar’s method or will be unable to compute it at all.
Cases without a known IMD quintile or ethnic group (1.2% and 2.9% of all cases, respectively) are excluded from the calculation of deprivation and ethnicity rates. This means that rates stratified by deprivation or ethnicity are slightly underestimated.
A missing IMD quintile value may be because:
- the patient’s residence was not in England
- the patient’s residence was in a new area that has not been assigned an IMD decile yet
- the patient was homeless
A missing ethnic group value may be because:
- the patient had opted not to state their ethnic group on admission to hospital
- the trust did not record a valid NHS number or date of birth on the HCAI DCS
- HES APC did not contain an ethnicity value.
Age-sex directly standardised rates are also provided at the ICB level using the same methods as described for the deprivation analysis above.
All case-level data (not just the most recent year) is sent for retracing on the DBS tracing service to correct date of birth, sex and date of death values in case they have been corrected since initial trace. This will cause small changes in historical age-sex estimates, mortality, deprivation and ethnicity analyses, and CDI counts (if the revised age would drop a patient due to being aged under 2 years on the specimen date). Mortality rate is used for assessing risk of death and is calculated by dividing the number of deaths by the population at risk. This reflects the incidence of all-cause deaths following these infections in the population.
Case fatality rate is a measure for comparing survivability of different infections and is expressed as the number of deaths as a percentage of all reported cases.
Data is presented on all-cause mortality, and therefore includes deaths that may not be directly attributable to the infections.
When using mortality sub-analyses such as by age and sex, onset or region (accompanying tables S10 to S12), aggregate totals should not be calculated as they are expected to differ from the official value quoted in the main analysis (accompanying table S9).This is caused during the calculation of the thirty-day all-cause deaths which inherently has rounding error due to a ceiling function to provide an integer result. This is further amplified by subsequent aggregation. Following the debut year of CDI ribotype information in financial year 2023 to 2024, we have updated the analysis to focus on the perspective of the HCAI DCS case as the subject and thus common denominator of most of the metrics featured in accompanying tables S17 and S18. For example, in table S18 we produce row percentages instead for the ribotype-specific proportion; this supports the data structure since an HCAI DCS case linked to a CDRN sample can in rare situations return multiple ribotypes per sample.
Independent sector (IS) report
Counts and rates (per 100,000 bed-days and discharges) of MRSA, MSSA, E. coli, Klebsiella spp., P. aeruginosa bacteraemia and CDI are presented by IS organisation for the latest 12-month period with comparison of rates to the previous year.
An IS organisation can comprise a group of private hospitals owned by one company or a single private hospital. It is possible to identify a group versus a hospital using the ‘number of hospitals in organisation’ field in the HCAI DCS.
The modified inpatient bed-days (bed-days plus discharges) are provided for the most recent financial year available as an indication of the size of each facility.
Hospitals are categorised as ‘large’ (50 beds or more) or ‘small’ (fewer than 50 beds). NHS treatment centre and diagnostic centre seeing mainly day case patients, are listed for the hospitals within a group. All types are listed where a group comprises more than one hospital type. IS organisations are requested to submit their bed-day plus discharge denominators. The calculation for the bed-day plus discharge denominator for shorter stay hospitals is the sum of the number of bed days in a year and the number of discharges in a year.
Instead of counting the number of midnights the patient was resident for, this counts the number of different days on which they were in the hospital. A day case will count as 1, a one-night stay in the year will count as 2.
Bed-days in the financial year April 2023 to March 2024 say, is the sum of the number of beds occupied each midnight during the year. So this sum starts with the number of bed occupants at midnight for the day ending 1 April 2023 and ends with the number of bed occupants at midnight on 31 March 2024.
Alternatively, in that financial year if the bed-days is being derived from admission dates and discharge dates, the calculation is the discharge date or 1 April 2024 (whichever is earlier) minus by the admission date or 1 April 2023 (whichever is later).
Only patients who are admitted to hospital before 1 April 2024 and discharged on or after 1 April 2023 are counted towards a bed-day in that financial year. That is, the latest date they could have been admitted was 31 March 2024 and the earliest date they could have been discharged was 1 April 2023. If the patient is still in hospital and does not yet have a discharge date then, 1 April 2024 should be used as discharge date. The sum of the days for all the patients then provides the total number of bed-days.
Discharges in that financial year include the number of patients with a discharge date between 1 April 2023 and 31 March 2024. It is the sum of the number of patients discharged on 1 April 2023 and the number discharged for each subsequent day up to and including 31 March 2024. It should include any day cases that took place during the year.
Figures provided are aggregated for each organisation (which could own more than one hospital or facility) or for the individual hospital if an organisation comprises one hospital or facility. Three Quarterly Mandatory Laboratory Returns (QMLR) indicators measuring blood culture sets examined, overall stool specimens examined, and stool specimens examined for diagnosis of CDI are included and feature in both the QEC and AEC reports mentioned above.
The quarterly population of England is the mid-year population of England in that year, divided by the number of days in that year and multiplied by the number of days in that quarter.
The blood culture sampling rate per 1,000 population is calculated as the number of blood culture sets examined divided by the quarterly or annual population of England and multiplied by 1,000.
The overall stool specimens examined rate per 1,000 population is calculated as the number of stool specimens examined, divided by the quarterly or annual population of England and multiplied by 1,000.
The stool specimens examined for CDI diagnosis rate per 1,000 population is calculated as the number of stool specimens examined for diagnosis of CDI divided by the quarterly or annual population of England and multiplied by 1,000.
The pooled blood culture positivity percentage is calculated by dividing the total number of cases of E. coli, Klebsiella spp., P. aeruginosa, MRSA and MSSA bacteraemia reported via the HCAI DCS by the total number of blood cultures examined in the respective quarter or financial year and multiplied by 100.
The CDI positivity percentage is calculated by dividing the total number of CDI cases reported via the HCAI DCS by the total number of stool specimens examined for CDI diagnosis in the respective quarter or financial year and multiplied by 100.
Quality assurance
Statistical processing for official statistics is performed independently by two scientists and final data cross-checked to verify that the data are correct. In addition, when rates are calculated for our QEC and AEC’s reports, infographics and data tables. We also cross-check the processing of NHS England’s KH03 data (occupied overnight bed days and day admissions) and ONS’ population denominators and any errors promptly fed back to NHS England.
Confidentiality and disclosure control
Personal and confidential data is collected, processed, and used in accordance with the UKHSA privacy notice. All UKHSA staff with access to personal or confidential information must complete mandatory information governance training, which must be refreshed every year. Information is stored on computer systems that are kept up-to-date and regularly tested to make sure they are secure and protected from viruses and hacking. UKHSA staff do not store data on their own laptops or computers. Instead, data is stored centrally on UKHSA servers.
No personally identifiable information is included in the published data. The structure of the published tables prevents them from being broken down in ways that could compromise individual privacy through cross-referencing. Additionally, when small numbers are reported in the data, a careful assessment is conducted to balance the need for detailed reporting with the potential risk of secondary disclosure, ensuring privacy is maintained without compromising the usefulness of the data.
Geography
Mandatory surveillance includes data from all NHS trusts in England. Each report contains data for overall counts and rates (except for monthly tables which include counts only) at different geographic levels. Monthly tables are published at national (England), ICB, UKHSA centre, NHS region, and NHS trust levels. The QEC is published at national (England) level. The AEC is published at national (England), ICB, sub-ICB and NHS trust level and features crude and age-sex standardised rates at the ICB level.
Quality summary
The Code of Practice for Statistics defines quality in statistics as:
- fitting their intended uses
- based on appropriate data and methods
- not materially misleading
Quality requires skilled professional judgement about collecting, preparing, analysing, and publishing statistics and data in ways that meet the needs of people who want to use the statistics.
This section assesses the statistics against the European Statistical System dimensions of quality.
Relevance
Relevance is the degree to which the statistics meet user needs in both coverage and content.
These mandatory surveillance outputs are critical to tracking progress towards controlling key HCAIs. In particular, the National Action Plan for AMR 2024–2029 and 2019–2024 before it, set out the ambition to control Gram-negative bacteraemia including the 3 infections covered by this surveillance (E. coli, Klebsiella spp. and P. aeruginosa).
The data also allows for NHS acute trusts to monitor their infection rates, and benchmark against peers and nationally.
The different statistics published are used in the following ways:
- mandatory HCAI surveillance outputs help monitor progress on controlling key HCAIs and for providing epidemiological evidence to inform action to reduce them,
- mandatory surveillance outputs are routinely used to appraise local/regional NHS management of infection levels within their area
- data provide unique case-level information
- data are used to support the NHS objective of improving the quality and safety of health services and promoting patient choice by providing access to information on NHS performance
- data are used nationally for benchmarking purposes and for the performance management of MRSA bacteraemia and CDI objectives set by NHS Improvement
- Data and outputs are routinely used to answer relevant Parliamentary Questions
- data are used to inform patient choice via the NHS Choices website
- NHS acute trusts and sub-ICB locations use these data to monitor progress against these objectives and to inform action on reducing these infections locally
- the E. coli, Klebsiella spp. and P. aeruginosa bacteraemia surveillance outputs are an integral part of NHS Improvement’s strategy to prevent increase in Gram-negative bloodstream infections by 2029 compared to the 2019 to 2020 financial year baseline, as part of the UK National Action Plan for AMR 2024 to 2029 which superseded the previous UK National Action Plan for AMR 2019 to 2024.
We have continued to make changes to the publications to meet user needs:
- From the financial year 2022 to 2023 AEC, we have added age-standardised incidence rates by deprivation and ethnicity.
- From the financial year 2023 to 2024 AEC, we have also included analysis of two QMLR indicators: total blood culture sets and total CDI toxin tests.
- From the financial year 2024 to 2025 AEC, we have included age-sex standardised deprivation rates, improved linkage for deprivation and ethnicity analyses and updated our standard population from the European Standard Population of 2013 to England standard populations by financial year. We have added positivity metrics for blood culture and CDI, the latter will be a useful additional metric for use during CDI outbreaks.
Accuracy and reliability
Accuracy is the proximity between an estimate and the unknown true value. Reliability is the closeness of early estimates to subsequent estimated values.
Infection cases are reported by NHS acute trusts. As part of the verification process, the CEO of the acute trust signs off infection data reported each month by day 15 of the following month. This sign-off process provides formal assurance that the data are accurate and complete.
Published statistics should include details of all cases for the reported period. However, on occasion, an amendment is required for the following reasons:
- full laboratory results are pending at the time of original sign-off
- deletions are required as an acute trust has entered case information incorrectly or it is a duplicate of a case reported from another trust. In that scenario, a CEO must request the deletion of the wrong information to be replaced with the correct one. NHS acute trusts or external agencies like The Care Quality Commission may also perform audits of local infection data. This can result in requests to add infection episodes that had not previously been entered.
- NHS acute trusts may request to alter their data to improve the sub-ICB location (SICBL) attribution of a given infection record. This process is undertaken via an ‘unlock’ of the HCAI DCS. A log of the number of unlocked cases by data collection and unlock reason is maintained.
A total of 72 (53%) acute trusts requested an unlock of at least 1 case across all organisms affecting data in the financial year 2023 to 2024 which totalled 269 unlocked cases. 38.8% of those unlocks were additions, 5.5% were amendments and 61.7% were deletions to a locked period. Comparing the previous financial year 2022 to 2023 with 2023 to 2024, there was an increase of 8.3% in the number of trusts which requested unlocks to change their data but a 9.4% decrease in the total number of unlocks. The number of unlock requests to add and delete a new case declined by 22.4% and 38.0% respectively while the number of requests to amend a case increased by 63.5% (63 to 103 requests).
The HCAI DCS includes functionality for acute trusts to identify duplicate infection episodes within their trust. A pop-up for potential duplicates at case entry appears to prompt the trust user that no duplicates have been entered for a designated period. Following sign off, as the CEO of an acute trust has verified their data as being accurate, data used for statistical publications are not altered by the UKHSA mandatory HCAI surveillance team to remove potential duplicate records. This may result in multiple listings of the same infection episode in the data set.
There is a possibility that some cases may not be reported to the HCAI DCS, resulting in under-coverage. To ascertain the level and to rectify this, a consistency study is performed comparing voluntary reported laboratory information for England with the mandatory surveillance scheme data set.
Data changes between releases are highlighted in each publication, so that users are made aware of any changes to historical data between publications. Further information on this process is available on the caveats page of each routine publication.
Not all IS organisations have signed off their data or submitted data for the reporting period, potentially leading to unfinalised and inaccurate data.
Measurement error
All mandatory HCAI surveillance data is collected via the HCAI DCS. The appendices of the mandatory HCAI surveillance protocol detail definitions and guidance on each field in the data collection. Therefore, there should be little concern over the interpretation of the questions by different users, although it should be noted that some questions are subjective in nature such as asking the clinical opinion of the treating physicians.
There is a low item non-response error as the bulk of data used to produce the mandatory HCAI surveillance outputs are from mandatory questions in the HCAI DCS, i.e. a response is required to save the infection episode. The exceptions are in the data collected on risk factors for bacteraemias presented in the AEC because the risk factor or source of bacteraemia questions are not mandatory fields. However, there are accompanying statements in the relevant sections of the AEC on the level of response for these data
However, unit non-response where individual NHS acute trusts who have not entered data and/or signed off data does exist. All trust-level outputs highlight such non-responders. Consistent non-responders are further referred to NHS England for follow-up.
Processing error
Processing errors may occur during the data entry stage. The data collected via the HCAI DCS is either entered by hand or partially uploaded (key responses to questions required to save an infection episode) using the HCAI DCS data upload wizard. Data entry errors may occur because the source data at the acute trust is incorrect or missing or in the transcription process.
While it is not possible to provide a level or direction of bias through processing errors for the entire data collection, it is possible to estimate the collective level of processing errors for two key variables - date of birth and NHS number. These can be used as an indicator for the full data collection. Assessing the percentage of all cases which could not be attributed via a match with the NHS Spine provides an indication of data entry errors.
There is the potential for bias in the statistics as organisations aim to meet performance targets. Therefore, there is a conflict between the use of statistics for both epidemiology and public health, and for performance management.
Timeliness and punctuality
Timeliness refers to the time gap between publication and the reference period. Punctuality refers to the gap between planned and actual publication dates.
Mandatory HCAI surveillance data is published in as timely a manner as possible. Data is signed off by acute trusts’ Chief Executives 15 days after the end of each month, meaning that sign off for each month is required by the day 15 of the following month. Data is published on a monthly, quarterly and annual basis and are pre-announced at least 28 days in advance, in line with the Code of Practice for Statistics.
The UKHSA official statistics publication calendar includes mandatory HCAI surveillance-specific announcements.
Monthly data tables
Monthly data is processed and analysed before being published on the first Wednesday of the following month. This occurs between 2 and 6 weeks following the end of a given month, depending on how the month falls. For example, January 2017 data was signed off on 15 February 2017 and then published on 1 March 2017. This is 2 weeks from sign-off to publication.
QEC
The QEC is published approximately 2 months following sign-off of the last full month of data for inclusion. For the April 2019 to March 2020 publications, this was increased to 4 months. The increase is to allow for the inclusion of the most recent hospital admissions data which would otherwise be unavailable at the time of the QEC’s production. This change is relevant due to the lower than usual levels of hospital admissions in April 2019 to March 2020 due to the COVID-19 pandemic. Publication of this report occurs on the first Thursday of the fourth month after the quarter covered in the reported. For example, data up to and including December 2021 were signed off on 15 January 2022 and published on 7 April 2022.
Annual data tables and AEC
Annual data tables and the accompanying AEC is usually published in September each year. The annual data tables include counts and rates for both acute trusts and clinical commissioning groups (CCGs). The AEC represents the most substantial HCAI mandatory surveillance output produced or published each financial year. The lead time necessary for analysis and compilation of data cannot be underestimated. Decreasing the amount of time between sign off and publication of these reports has been considered. However, doing so would not allow enough time to undertake relevant data quality checks on either the data used for preparing the report or the report itself. Hence the benefit of using the current publication schedule far outweighs any minor benefit that might be achieved in reducing the lead time for the QEC publication.
Furthermore, the changes to the publication schedule for 2020 to 2021 was due to those periods having atypical levels of hospital admission, requiring the need to wait and use published admission data.
Accessibility and clarity
Accessibility is the ease with which users can access the data, also reflecting the format in which the data is available and the availability of supporting information. Clarity refers to the quality and sufficiency of the metadata, illustrations and accompanying advice.
All HCAI outputs have been reviewed for accessibility requirements, with several changes made to ensure they are accessible. Since QEC 2021 financial quarter 4 and AEC financial year 2021 to 2022, they have been published in HTML format which provides the accessibility features mentioned in the GOV.UK accessibility statement. This format enhances accessibility by supporting screen readers and allowing easy navigation using a keyboard, ensuring the content is accessible to a wider audience. Additionally, HTML allows for text resizing and media alternatives such as alt text which help to improve the overall user experience. The publications have also been reviewed for clarity, incorporating plain English language, main messages and data visualisations.
The reports include data visualisations which help users to understand the data. These have been reviewed and updated to ensure colours used provide sufficient contrast to be distinguished and are colour-blind friendly. The accompanying data tables are published in ODS format and follow accessibility guidelines. Each sheet contains only one table and no nested tables. Each data table contains a Contents worksheet and a Notes worksheet to describe the subsequent data tables.
Coherence and comparability
Coherence is the degree to which data that are derived from different sources or methods, but refer to the same topic, are similar. Comparability is the degree to which data can be compared over time and domain.
The mandatory HCAI surveillance scheme aligns closely with surveillance processes and definitions of the European Centre for Disease Control (Europe) and the Centre for Disease Control (USA), to allow comparability where possible.
There are, however, some differences between the English mandatory HCAI surveillance programme and the surveillance undertaken by others, including the UK devolved administrations and internationally. These include some case definitions and protocols for diagnosing the infections, definitions on inpatient episode versus trust apportioned or assigned episodes, age groups included in the surveillance schemes and the way in which data are presented by periods. As the population sizes of the other devolved administrations are different to England, crude counts of infections cannot be compared amongst countries in the UK. Furthermore, as the population demographics amongst the devolved administrations differ, the denominators used to calculate infection rates are not directly comparable. Our introduction of age-sex standardised rates by ICB since financial year 2024 to 2025 will help to account for demographical differences.
Uses and users
Users of statistics and data should be at the centre of statistical production, and statistics should meet user needs.
This section explains how the statistics are used, and how we understand user needs.
Appropriate use of the statistics
These statistics present information on cases reported to mandatory surveillance since the start of surveillance for each infection. Data is presented for transparency and to allow tracking of these mandated infections across multiple settings.
The onset algorithm, and since 2017 the prior trust exposure algorithm, helped us align more closely with the European Centre for Disease Control (Europe) and the Centres for Disease Control (USA) sought to attribute cases to either a hospital or community setting and whether a case was healthcare-associated to identify where the infection may have occurred. Prior healthcare exposure definitions require trusts to enter the patient’s exposure in only the reporting trust which may lead to an underestimation of healthcare association as patients may have had contact with other healthcare settings which is not captured. Despite this, this definition has been consistently applied across all surveillance years. Also, when comparing data with other UK nations or countries, it is important to consider any differences in infection definitions, onset and prior trust exposure algorithms, and the deduplication window used.
The IS report does not provide a basis:
- for comparisons between different IS organisations due to their variable size and range (case mix) of patients admitted
- for reliable comparison of these infections between the NHS and IS organisations
Known uses
We are aware that the statistics have been used for:
- monitoring progress on controlling key HCAIs and for providing epidemiological evidence to inform reduction actions
- education and training
- strategy and resource allocation
- benchmarking purposes and for the performance management of MRSA bacteraemia and CDI objectives
- research
- informing patient choice.
Known users
National users
UKHSA use the data to:
- undertake epidemiological analyses at national, regional and local level
- provide, on request, relevant response to parliamentary questions.
Department of Health and Social Care (DHSC) use the data to:
- routinely brief ministers on national and regional incidence of MRSA, MSSA, E. coli, Klebsiella spp. and P. aeruginosa bacteraemia and CDI
- inform and identify national-level targets for interventions and reduction strategies.
NHS England and NHS Improvement use the data to:
- identify and establish performance management
- set national and local-level performance management targets
- assess performance against objectives
Regional or local users
ICBs use the data to assess NHS trust and SICBL performance against targets and objectives at a local level.
UKHSA Field Service and UKHSA Regions use the data to:
- assist in outbreak investigations
- inform public health initiatives at a local level.
NHS acute trusts use the data to:
- inform trust boards of their position on key HCAIs (MRSA, MSSA and E. coli bacteraemia and CDI)
- monitor progress against performance management objectives.
SICBL use the data to:
- monitor progress against performance management objectives
- assist in the commissioning of services from relevant acute level providers.
User engagement
Since AEC 2023 to 2024 we have provided an online survey form at the top of the report to collect readers’ feedback.
A routine ‘Stakeholder Engagement Forum’ is held every 6 months. This meeting includes representation from a wide range of national and local level stakeholders as such as SICBL and acute trusts. The meeting’s agenda includes recent publications, experiences, improvements and future developments.
Following the meeting, a summary of the stakeholder engagement forum discussion is available.
Meeting feedback is used to improve ongoing engagement. It is also used to inform future development and to ensure that data users remain central to the process.
Related statistics
Most health protection functions in the UK are devolved to the other UK nations’ public health agencies. Here are examples of their output including from neighbouring European countries: