Quality and methodology information: Legionellosis in residents of England and Wales
Published 20 November 2025
Applies to England and Wales
About this report
This report outlines the quality and methodology information (QMI) relevant to the Legionellosis in residents of England and Wales’ official statistics release published by the UK Health Security Agency (UKHSA). This QMI report supports users in understanding the strengths and limitations of these statistics, ensuring that UKHSA is compliant with the quality standards stated in the Code of Practice for Statistics.
The report covers the following areas:
- Strengths and limitations of the data used to produce the statistics.
- Methods used to produce the statistics.
- Quality of the statistical outputs.
About the statistics
Background
Legionellosis is a spectrum of diseases caused by Legionella bacteria. Illness can range from mild through to Legionnaires’ disease, a form of atypical pneumonia that can be severe and potentially fatal. Legionella typically inhabits natural water systems such as streams, rivers and lakes. However, Legionella bacteria are also able to survive in artificial water systems, for example cooling towers, evaporative condensers, spa pools and hot and cold-water systems. Such man-made water systems mimic the organism’s natural habitat thereby providing an ideal environment for growth.
In the UK, the principal route of infection is through direct exposure to aerosols generated and dispersed from colonised man-made sources. However, in many cases, the source of the infection is not identified. Inhalation of these aerosols by individuals can result in legionellosis. A colonised water system which is not appropriately managed has the potential to be a source of major outbreaks.
While anyone can be infected by the bacteria, certain underlying conditions and risk factors can make some groups more susceptible. These include being aged 50 years and over, being male, smoking, having immunosuppressive conditions, or having chronic respiratory, liver or kidney diseases.
Statistics
Legionellosis is a notifiable disease, meaning that registered medical practitioners must report confirmed and probable cases to the UK Health Security Agency in accordance with the Public Health (Control of Disease) Act 1984 and the Health Protection (Notification) Regulations 2010.
The National Enhanced Legionnaires’ Surveillance Scheme (NELSS) for residents of England and Wales was established in 1980 to collect enhanced surveillance data on all cases of Legionnaires’ disease. The scheme is managed by UKHSA’s Acute Respiratory Infections (ARI) team.
The primary objectives of NELSS are to:
- understand the epidemiology of Legionnaires’ disease
- monitor trends in incidence, clinical features, risk factors and mortality
- detect clusters and outbreaks of Legionella infection
- identify sources of infection to aid colleagues to apply control measures and prevent further cases
- disseminate Legionella surveillance data and intelligence to stakeholders involved in the investigation and management of cases in the course of their duty to protect public health
The official statistics report of Legionellosis in England and Wales presents the findings from cases of Legionellosis reported to NELSS.
The data in the statistics is provisional and is subject to revision.
Geographical coverage: England and Wales
Publication frequency: annual
Contact
Lead analyst: Hanna Squire
Contact information: legionella@ukhsa.gov.uk
Suitable data sources
Statistics should be based on the most appropriate data to meet intended uses.
This section describes the data used to produce the statistics.
Data sources
The data presented in this report is extracted from the National Enhanced Legionnaires’ Disease Surveillance Scheme (NELSS) database, which holds records of all reported cases of Legionellosis among residents of England and Wales. Cases are reported through the submission of a national surveillance scheme reporting form and entered into the NELSS database by the national ARI team.
The NELSS reporting form captures a wide range of data, including case details, demographic information, social risk factors, clinical and microbiological data, as well as detailed information about the activities of each case in the 10 days prior to symptom onset. This includes information on potential environmental exposures. Once submitted, the reported data is assessed and verified to ensure that the case definition is met. A confirmed case is defined as one that has a clinical or radiological diagnosis of pneumonia with laboratory evidence of one or more of the following:
- isolation (culture) of Legionella species from a clinical lower respiratory tract specimen
- detection of Legionella pneumophila antigen in a urine specimen
- detection of Legionella species nucleic acid (such as via PCR) in a lower respiratory tract specimen (such as sputum, bronchoalveolar lavage (BAL))
The Legionellosis in residents of England and Wales report only includes confirmed cases. It does not include probable cases.
Verified cases are then analysed against the national data set for risk factors and potential links to previously reported cases. Additional information is provided by regional health protection teams (HPTs), Field Services, and the Respiratory and Vaccine Preventable Bacteria Reference Unit (RVPBRU). The national ARI team also conducts further data cleaning and review processes to identify and address duplicate records, inconsistencies, and missing data.
The latest report covers cases with symptom onset between 1 January 2024 and 31 December 2024 among residents of England and Wales. Data from previous years (2020 to 2023) is also included for comparative purposes. Please note that some historical data may differ from earlier publications due to updates made following the receipt of new information and ongoing efforts to modernise NELSS.
All population estimates used in this report have been sourced from the Office for National Statistics (ONS).
Data quality
The data that we use to produce statistics must be fit for purpose. Poor quality data can cause errors and can hinder effective decision making.
We have assessed the quality of the source data against the data quality dimensions in the Government Data Quality Framework.
This assessment covers the quality of the data that was used to produce the statistics, not the quality of the final statistical outputs. The Quality summary section assesses the quality of the final statistical outputs.
Strengths and limitations of the data
The strengths of the data are that:
- reporting of cases is mandatory, ensuring that NELSS provides a comprehensive and timely record of Legionellosis cases in England
- NELSS is a live system, meaning cases are available as soon as they are entered, enhancing the speed of public health responses
- regular data cleaning and review of processes ensure data quality, with validation rules requiring essential fields to be completed
- national surveillance scheme reporting forms and communication with health protection teams help ensure sufficient and accurate information is collected for each case
The limitations of the data are that:
- counts of people with Legionnaire’s disease by region are based on the cases’ residence, which can differ to where people were exposed, diagnosed and treated
- the NELSS database does not include people diagnosed or managed with Legionnaire’s disease in Scotland, therefore some people who are normally resident in England, but diagnosed or managed in Scotland, will not appear in the data
- milder or asymptomatic cases may go untested or unnoticed, leading to potential underreporting as not all cases are diagnosed and reported
- Legionella infections may be misdiagnosed as other respiratory conditions, particularly in settings where Legionella testing is not routinely conducted, leading to missed cases
- most cases are identified using urinary antigen tests (UATs), which mainly detect L. pneumophila serogroup 1. This limits the detection of other Legionella serogroups or species, leading to potential underrepresentation of certain strains
- difficulty in obtaining lower respiratory samples for cultures reduces the number of cultures which can be performed. More samples sent to reference labs would improve the identification of novel strains and help trace environmental sources
- data on sex was collected in the surveillance form using the question ‘Sex: Male / Female’. This is intended to be sex as registered at birth, but there is the potential for different local interpretation on whether this captures sex registered at birth or gender identity, and this should be borne in mind when interpreting the data
- age and age groups were derived from the date of notification and date of birth. Age groupings were chosen to reflect the increased risk of LD in older populations and more severe outcomes. The specific grouping additionally captures more granular insights while additionally providing disclosure control
NELSS is the most appropriate source of data for these statistics. Legionnaires’ disease (LD) is a notifiable disease, which means that UKHSA holds a comprehensive record of LD cases.
Accuracy
Accuracy is about the degree to which the data reflects the real world. This can refer to correct names, addresses or represent factual and up-to-date data.
Registered medical practitioners must report confirmed and probable cases to the UK Health Security Agency within 3 days of a suspected or confirmed diagnosis. Local teams then report these cases to the national ARI team using the case report form. Data is reported to NELSS as soon as reasonably practical.
The NELSS database contains live data, and during the data input and cleaning stages any errors, inconsistencies or missing data are rectified. All data will have had all checks and validation completed by the time of analysis for publication.
Completeness
Completeness describes the degree to which records are present.
For a data set to be complete all records are included and the most important data is present in those records. This means that the data set contains all the records that it should and all essential values in a record are populated.
Completeness is not the same as accuracy as a full data set may still have incorrect values.
The national surveillance scheme reporting form contains mandatory fields that must be completed. In addition to personal and demographic details, mandatory fields include date of symptom onset, pneumonia status, whether the patient died, clinical history, risk factors, exposure-related information, and microbiology results. This ensures that the necessary information is recorded for each case. There is also a series of non-mandatory fields that can be completed where relevant, for example recent travel history and occupation information.
Uniqueness
Uniqueness describes the degree to which there is no duplication in records. This means that the data contains only one record for each entity it represents, and each value is stored once.
Some fields, such as National Insurance number, should be unique. Some data is less likely to be unique, for example geographical data such as town of birth.
To create a new case, NELSS users must first perform a search to check whether the case has already been recorded based on NHS number, Case and Incident Management System (CIMS) number, and date of birth. Data in this report was further inspected for duplicates before publication based on case number and date of birth.
Consistency
Consistency describes the degree to which values in a data set do not contradict other values representing the same entity. For example, a mother’s date of birth should be before her child’s.
Data is consistent if it doesn’t contradict data in another data set. For example, if the date of birth recorded for the same person in 2 different data sets is the same.
The NELSS team conducts routine checks of data fields for consistency to identify potential errors and return queries to the relevant case managers for resolution to ensure data consistency.
Thorough quality checks are routinely conducted on the data set to detect and interrogate inconsistencies across fields or any data anomalies as the data from national surveillance scheme reporting forms are entered into NELSS. These are rectified by the ARI team who manage the NELSS database.
Laboratory microbiology testing results are linked to case information using key ID fields and will only link when these fields match.
Geographical data is linked to case information using postcode information for the case’s residence and will only link if the postcode is correctly entered.
Timeliness
Timeliness describes the degree to which the data is an accurate reflection of the period that it represents, and that the data and its values are up to date.
Some data, such as date of birth, may stay the same whereas some, such as income, may not.
Data is timely if the time lag between collection and availability is appropriate for the intended use.
NELSS is a live database that is managed by the ARI team. Data lag is unlikely to affect this report. There is minimal delay between a case report, and it being entered into the NELSS database. In 2024, the median delay between onset of symptoms to being entered into the database was 13 days. Over 80% of cases are entered into the database within 3 weeks of symptom onset. There is a publishing lag of 11 months since this report is published in November and the data goes up to 31 December of the previous calendar year.
Validity
Validity describes the degree to which the data is in the range and format expected. For example, date of birth does not exceed the present day and is within a reasonable range.
Valid data is stored in a data set in the appropriate format for that type of data. For example, a date of birth is stored in a date format rather than in plain text.
NELSS restricts and prespecifies the format of user entries limiting the probability of entering invalid data. For example, date of birth must be entered in a date format. These rules ensure that the data is entered in the correct format. This is further mitigated by regular checks and cleaning steps required in the production of these statistics.
Sound methods
Statistical outputs should be made using the best available methods and recognised standards.
This section describes how the statistics were produced and quality assured.
Data set production
Case data used in this report come from the NELSS database stored in SQL. Using R, case data was linked to microbiology reference laboratory testing data and geographical information based on a patient’s resident postcode. The linked data set is cleaned by:
- standardising and validating of date fields
- using report date where date of symptom onset is missing
- creating indicator variables for each risk factor based on a free text field recording underlying illness and clinical information
- categorising grouping variables: age, serogroup, species, causative organism, clusters and outbreaks
- calculating age (in years) from date of birth and date of symptom onset, where age is missing
- converting missing data to ‘unknown’ where relevant
- filtering data to only include residents of England and Wales, cases with symptom onset between 1 January 2020 and 31 December 2024, and confirmed cases based on case definitions
- checking for and removing any duplicates
Quality assurance
The NELSS data is stored in SQL and is read directly from there into R. The cleaning of the data, and production of the figures and supplementary data tables have been automated in R. This reduces the risk of human error as users do not have to manually update figures or copy and paste between documents. Quality assurance is done on the produced report, as well as on the code itself by a member of the ARI team who did not develop the code, on an annual basis.
The figures and tables are sense-checked and compared with figures from previous reports for irregularities. All the automated outputs are manually checked in this way. If concerns are raised regarding one figure, further checks are conducted to assess possible errors in the data.
Confidentiality and disclosure control
Personal and confidential data is collected, processed, and used in accordance with the UKHSA Privacy Notice. All UKHSA staff with access to personal or confidential information must complete mandatory information governance training, which must be refreshed every year. Information is stored on computer systems that are kept up-to-date and regularly tested to make sure they are secure and protected from viruses and hacking. UKHSA staff do not store data on their own laptops or computers. Instead, data is stored centrally on UKHSA servers.
No personally identifiable information is included in published data. No specific disclosure control methods were used, as aggregation of the published figures to national and regional level protects people’s personal data and tables presented cannot be cross tabulated to reveal sufficient information about individuals to pose a meaningful risk of secondary disclosure.
The benefits of reporting small numbers in aggregated data are compared with the risk of secondary disclosure on a case-by-case basis. For example, there are relatively few L. bozemanii cases, but the risk of identification is low and is not used in conjunction with other identifiable information (for example, age category).
Geography
The statistics in this report are published at 2 geographical levels: country (England and Wales), and UKHSA region. UKHSA region is based on an individual’s residential postcode. If the postcode is missing, the UKHSA region of the health protection team is used. All postcodes were complete.
Quality summary
The Code of Practice for Statistics states that quality means that statistics fit their intended uses, are based on appropriate data and methods, and are not materially misleading.
Quality requires skilled professional judgement about collecting, preparing, analysing, and publishing statistics and data in ways that meet the needs of people who want to use the statistics.
This section assesses the statistics against the European Statistical System dimensions of quality.
Relevance
Relevance is the degree to which the statistics meet user needs in both coverage and content.
There is a clear need for timely Legionnaires’ disease statistics. This data provides critical insights into the prevention and control of Legionella outbreaks in England and Wales. UKHSA monitors the incidence of Legionnaires’ disease through routine surveillance, contributing to national and international efforts to reduce cases by identifying risk factors and supporting outbreak investigations.
Legionnaires’ disease is a relatively rare but serious illness. Most cases are sporadic, with some occurring in clusters or as part of wider outbreaks. Given the potential public health risk, annual reporting provides a timely overview of trends and helps detect any emerging patterns. The data is essential for public health officials and healthcare professionals to assess progress in controlling the disease.
The Legionnaires’ disease statistics are used by a variety of stakeholders, including public health professionals, policymakers, and environmental health officers.
We have published a survey to get user feedback on the report for further improvements.
We have expanded our reporting formats to better serve users’ needs and we publish the following:
- main statistics report
- supplementary data tables, providing more granular insights
- QMI report
This variety of outputs ensures that the data is accessible and useful to a broad range of users, helping to inform prevention strategies and improve public health outcomes. By providing this range of different outputs, we can better cater to the needs of different users from a range of backgrounds, in line with the Office for National Statistics user personas.
Accuracy and reliability
Accuracy is the proximity between an estimate and the unknown true value. Reliability is the closeness of early estimates to subsequent estimated values.
The accuracy of the statistics is largely dependent on the accuracy of the source data. We have assessed the source data to be accurate (see the Data quality section) as the design of NELSS helps prevent data entry errors, and guidance given to users helps ensure the right information is collected in the proper format. The statistics therefore represent the entire known population of all Legionella cases in England and Wales reported to UKHSA.
Where outputs are a result of a calculation, such as the average of a rolling period or an incidence rate per 100,000 population, a 95% confidence interval is presented.
The statistics present provisional data. All data in NELSS can be revised and updated as additional verification, data cleaning, and recoding are completed.
Timeliness and punctuality
Timeliness refers to the time gap between publication and the reference period. Punctuality refers to the gap between planned and actual publication dates.
This is the second annual official statistics report on Legionellosis published by UKHSA (the previous report was Legionellosis in residents of England and Wales: 2017 to 2023 report). The reports are published in November.
Prior to this series, annual and monthly surveillance reports on Legionnaires’ disease were published by Public Health England (PHE). However, the annual series was paused due to capacity issues with the last annual report in that series published in 2018 (covering 2016 data). Subsequently, the monthly series was put on hold to redirect resource towards the COVID-19 pandemic, with the last monthly report published in February 2020.
The current series provides timely and up-to-date figures on Legionella epidemiological surveillance in England and Wales.
The annual reports are official statistics and are pre-announced at least 28 days in advance, in line with the Code of Practice for Statistics. Provisional publication dates for the year ahead are pre-announced online in December and can be found on the UKHSA release calendar.
Accessibility and clarity
Accessibility is the ease with which users can access the data, also reflecting the format in which the data is available and the availability of supporting information. Clarity refers to the quality and sufficiency of the metadata, illustrations and accompanying advice.
We currently publish 3 statistical products as part of this statistical release: the main statistics report, supporting data tables, and this QMI report.
The main statistics report is published as an HTML web page making the report accessible across different devices and inherits the accessibility features mentioned in the GOV.UK accessibility statement.
The publication includes visualisations that help explain the data. These are designed to be colour-blind friendly. Each element in a visualisation has a different luminance value. This means that there is always enough contrast between elements for them to be distinguished.
We have simplified commentary in the publication, focusing on plain English. We also now include ‘Main messages’ in publications to help users understand the key findings from the statistics.
The supplementary data tables are published in ODS format and follow accessibility guidelines. Each sheet contains only one table. We also do not use nested tables as these do not always work well with screen readers. We avoid using empty cells for the same reason. Each sheet has a descriptive heading.
Coherence and comparability
Coherence is the degree to which data that are derived from different sources or methods, but refer to the same topic, are similar. Comparability is the degree to which data can be compared over time and domain.
Data included in these reports has been collected in a consistent manner over time primarily using national surveillance scheme reporting forms. We continue to modernise NELSS and the methods of data collection. Where there have been changes in specific variables over time, either through addition or changes in definition, these are detailed in the report.
These statistics for England and Wales are not directly comparable with those from Public Health Scotland or European Legionnaires’ Disease Surveillance Network (ELDSNet) (europa.eu). Differences in data collection methods, processes, reporting criteria, and timelines can make direct comparisons unreliable.
Uses and users
Users of statistics and data should be at the centre of statistical production, and statistics should meet user needs.
This section explains how the statistics are used, and how we understand user needs.
Appropriate use of the statistics
The statistics present Legionnaire’s disease cases. A case report is produced when someone is suspected or confirmed with a diagnosis of Legionellosis. Some individuals will not receive a diagnosis or start treatment, so their case will never be notified. Users therefore should not use these statistics as a measure of definitive Legionellosis incidence.
There are seasonal trends in Legionellosis cases, with a peak around summer. Users should generally compare the same period year on year, rather than different periods in the same year.
Known uses
The Legionnaires’ disease statistics are used by a variety of stakeholders, including public health professionals and policymakers. These users utilise the data to understand the epidemiology of Legionellosis and research. The data is essential for public health officials and healthcare professionals to assess progress in controlling the disease.
We are conducting a user feedback survey with the release of this statistic to gain a better understanding of user needs to make future improvements.
User engagement
NELSS is currently reviewing our outputs to align with the needs of our stakeholders. The results of the user survey published alongside last year’s report were considered when developing this report. Users are asked to provide information about who they are and what they use the publication for. This provides new insights into our users, including how they use the publication, and what they would like to see in it. The survey includes some detail on the specific parts of the publication that users find most useful, as well as suggestions for improvements.
For feedback please contact legionella@ukhsa.gov.uk.
Related statistics
Most health protection functions in the UK are devolved to the other UK nations’ public health organisations. Public Health Scotland publishes the annual Legionnaires’ disease in Scotland report. Please note that differences in data collection methods, reporting processes, criteria, and timelines can make direct comparisons between reports unreliable.
The European Centre for Disease Prevention and Control publishes reports on Legionnaires’ disease surveillance and monitoring in Europe providing an overview of Legionnaires’ disease in Europe.
The World Health Organization publishes guidance on prevention, diagnosis, and treatment of the disease, at global, regional and country levels.