Research and analysis

Methodology and definitions

Updated 15 February 2024

National Tuberculosis Surveillance System data set

TB notifications

People who are diagnosed with tuberculosis (TB) in England, Wales and Northern Ireland must be notified through the National Tuberculosis Surveillance System (NTBS). This report only includes data for individuals with TB who are resident in England or are treated in England (including individuals who are homeless or visiting from abroad).

Only individuals with disease caused by Mycobacterium tuberculosis complex (MTBC) are reported. Individuals were denotified and removed from the data set if the infective agent was identified as non-MTBC or M. bovis Bacillus Calmette-Guerin (BCG) subspecies.

Data production

In 2021, NTBS was launched and replaced 2 historical surveillance systems:

  • the Enhanced Tuberculosis Surveillance system (ETS)
  • the London TB Register (LTBR)

Data sets from 2018 onwards were extracted from ETS and LTBR and were merged with NTBS following a series of data migrations between July and December 2021. Data reported here was obtained from the merged data sets (NTBS, ETS, LTBR) and the final extract was made on 7 November 2023.

Data cleaning to improve data quality

Denotifications

People with BCGosis, on chemoprophylaxis for latent TB infection or with a non-tuberculous mycobacterial infection who were notified in error were identified using comments fields, and denotified. People with culture-confirmed TB who had been denotified were queried with clinics, and lab contaminations were removed, or people were renotified if they were found to have been denotified in error.

In addition, a probabilistic matching process was carried out for notifications between January 2020 and December 2022 to identify people with more than one notification within a 12-month period. Identified duplicates were denotified, with any missing information transferred from the duplicate to the original notification.

Geography

The postcode field (used to map postcodes to geographic areas) was cleaned by identifying invalid postcodes based on matching to the May 2023 Postcode Directory from the Office for National Statistics (ONS). Where cleaning was necessary, the correct postcode was identified using the address fields.

For people who were homeless or who had a residence outside the UK but were notified in England, the postcode of the clinic or hospital where they were treated was assigned to the notification. For people with no postcode or treatment clinic or hospital, the local authority and UK Health Security Agency (UKHSA) centre were updated using the local authority field recorded based on the area that the notifying case manager was located in.

UKHSA region was derived from UKHSA region of residence based on the individual’s residential postcode. If missing, UKHSA region in which treatment occurred (most recently, as care may have been transferred) was used, for example if a person had no fixed abode.

Cleaned postcodes were assigned boundary layers and merged with boundaries for clinical commissioning groups, integrated care boards, upper tier local authorities and local authorities sourced from the Central Lookups Database within the UKHSA Data Lake, which is managed by the Public Health Data Science (PHDS) team. These are available in the UKHSA layers of the map software (GIS).

Site of disease

The site of disease was reclassified to pulmonary if a positive sputum smear (microscopy) sample was recorded or if a positive culture was grown from a pulmonary laboratory specimen. People with laryngeal TB were included in pulmonary breakdowns, and people with miliary TB were included in both pulmonary and extra-pulmonary breakdowns. Site of disease for people with culture confirmation was reclassified based on the site in the body from which the specimen was taken. Site of disease classifications were also updated using the free text field for site of disease.

Social risk factors including prison and asylum status

Social risk factors

Social risk factors (SRFs) include:

  • current or a history of drug misuse
  • current alcohol misuse
  • current or history of homelessness
  • current or history of prison
  • current mental health needs
  • current asylum status (including if remanded in an immigration detention centre)

The presence or absence of the social risk factors were updated from missing or unknown, if relevant information was found in the free text comments fields in NTBS.

Homelessness was updated to ‘yes’ if mentioned in the comments fields or if the address given was ‘no fixed abode’ or a shelter or hostel for homeless people.

Prison (current or in the past) was updated to ‘yes’ if mentioned in the comment’s fields, if His Majesty’s Prisons (HMP) or a prison name was recorded as the address, or if the residential postcode corresponded with a prison. Up until 2020, data on incident TB individuals reported to the Public Health in Prisons (PHiP) log were used to further identify people who had been imprisoned, but this was not conducted in 2022.

The immigration detainee variable was updated if the address given at notification, if the comments fields or occupation field showed the person to be an immigration detainee. The asylum seeker variable (newly introduced in NTBS) was updated as asylum seeker if recorded in the occupation field sub-category under ‘no occupation’.

For analysis, asylum seeker was then recoded as ‘yes’ if either asylum seeker variable or immigration detainee variable were ‘yes’. The asylum seeker variable was further updated so that all UK born individuals with a missing value for this variable were updated to ‘no’.

Demographic characteristics

Sex is reported as male or female. Where missing from the raw data, it was derived from the name of the individual where names were unambiguous.

Age and age groups were derived from the date of notification and date of birth. Notification demographics were used for tracing against the Personal Demographics Service (PDS). The Demographics Batch Service (DBS) enables a user to submit a file of patient demographics for tracing against the PDS, providing back the NHS number and most up-to-date demographics where an exact match is found. Those with conflicting values for age or inconsistent mortality information were cross-referenced against the matched PDS data to resolve and checked with case managers.

UK and non-UK born status occurs in the raw data. It was amended if missing and the country of birth indicated non-UK birth.

Entry to the UK is entered as year only by NTBS users. Time since entry is derived as year of notification minus entry year.

Report methodology

TB notifications

Individuals with TB are reported by area of residence and by calendar year of notification.

Social risk factors

People with TB are reported as having at least one SRF (‘yes’) if any of the 6 social risk factors (current alcohol misuse, current or a history of homelessness, drug misuse, imprisonment, current asylum seeker status, current mental health needs) had ‘yes’ recorded. As a result, the denominator is all notifications. This assumes that people for whom no data was recorded for individual SRFs were a ‘no’ and may result in under-estimation.

Data for individual social risk factors reported is limited to those with recorded data, for example a ‘yes’ or a ‘no’. As a result, the denominators for these are smaller than all notifications due to missing data. If there is significant under-reporting of SRFs in those with missing data, this should result in a better estimate of the true proportion of the people with each SRF. However, if data is more likely to be recorded if the response is a ‘yes’, this could result in an over-estimates. This may be the case for the asylum seeker SRF.

Mental health is recorded by TB case managers and is based on their judgement of whether mental health concerns are likely to affect the person’s ability to adhere to treatment. This was added to surveillance in the London UKHSA centre in 2018 and is a simple ‘yes’ or ‘no’ response. It was introduced nationally in 2021 with the introduction of the NTBS. Here we report this as the person has need of support for mental health and therefore has ‘mental health needs’.

Asylum seeker status and immigration removal centre were added to national surveillance as discrete variables in 2020. Prior to this, ‘asylum seeker’ status was extracted from free-text comment fields and user entered values within occupation (LTBR). As a result, more complete data on this exposure is assumed from 2020 to 2022 compared with previous years.

Alcohol misuse is as recorded by case managers and is based on their judgement if current alcohol misuse is likely to affect adherence to treatment.

History of drug misuse, homelessness and prison are self-reported by individuals and are first asked as a ‘yes’ or ‘no’ response and then with additional information on duration: as current, within last 5 years or more than 5 years ago. Unless indicated otherwise, analyses here present these SRFs as ‘yes’ if either history of, or a duration value, was recorded.

Diagnostic and laboratory tests

Data for TB isolates from the National Mycobacterial Reference Service (NMRS) is matched to TB notifications. Isolates are deduplicated and summarised to only report one isolate per TB notification per notification period.

The NTBS also includes user-entered fields to record whether a culture sample and other diagnostic tests, such as polymerase chain reaction (PCR), were undertaken and the results of these tests. These data fields are combined to generate a final test status variable for the different tests for all the notified cases.

Culture and other diagnostic test results are then reported as follows:

Any test performed

Yes: any value recorded in the NTBS of any test type variables (culture, PCR, microscopy, histology, or chest X-ray), test result and date of test regardless of result.

No: no recorded value of variables of test type, test result and date of test.

Any test positive

Yes: positive test result recorded for any test type (culture, PCR, microscopy, histology or chest x-ray).

No: no results or negative test result recorded to all test types (as above).

Culture confirmed

Culture confirmed: supported by NMRS laboratory result of a positive culture for MTBC.

Culture unconfirmed: negative culture, or no NMRS results for culture, surveillance system states no culture undertaken, no other supporting information.

Speciation

Species defined as M. tuberculosis, M. bovis, M. microti, M. africanum or Mycobacterium tuberculosis complex.

MTBC is assigned to those not fully speciated by previous PCR based methods or WGS. The introduction of WGS has decreased the number of notifications in this category.

Drug resistance definitions

The resistance reported follows this classification. All resistance proportions use the number of culture positive cases as the denominator.

  • rifampicin-resistant (RR) or multidrug-resistant TB (MDR TB) are defined as resistance to rifampicin with or without isoniazid resistance
  • isoniazid mono-resistance is defined as resistant to isoniazid but not reported as resistant to rifampicin
  • pre-extensively drug-resistant TB (pre-XDR TB) are TB strains which fulfil the definition of multidrug-resistant or rifampicin-resistant TB and which are also resistant to any fluoroquinolone (levofloxacin and or moxifloxacin plus historically used levofloxacin and ofloxacin)
  • extensively drug-resistant TB (XDR TB) are strains that fulfil the definition of MDR or RR TB and which are also resistant to any fluoroquinolone and at least one additional group A drug in the WHO updated classification (group A drugs are the most potent group of drugs in the ranking of second-line medicines for the treatment of drug-resistant forms of TB using longer treatment regimens and comprise levofloxacin, moxifloxacin, bedaquiline and linezolid)

Isolates may be resistant to other antibiotics in addition to those described above.

Laboratory-confirmed resistance

Resistance is reported as either resistant or sensitive. Testing is by whole genome sequencing (WGS) alone or in combination with phenotypic testing. Discordances between the 2 testing methods were resolved by the reference laboratory and the reported value is used for this data analysis. The denominator for all resistance proportions is culture-positive notifications, for example, known resistance reported as a proportion of culture-positive cases (regardless of whether the others were sensitive or unknown resistance).

M. bovis is intrinsically resistant to pyrazinamide.

The designation of resistance using genomics relies on a database of known resistance. A change in the pncA gene (gene encoding pyrazinamidase) associated with lineage 1 confers resistance to pyrazinamide. Isolates that are lineage 1 with changes in above gene are checked using phenotypic methods and coded as resistant if resistant by phenotype and sensitive by WGS.

Where ethambutol and pyrazinamide results are missing or unknown (and not lineage 1 for pyrazinamide) but results are known to be sensitive for isoniazid and rifampicin, these are coded as sensitive for ethambutol and pyrazinamide.

Acquired resistance

This is resistance in a person with more than one sample over time where the first sample shows sensitivity to a given drug and second sample is resistant.

Treated as resistant

This includes notifications that have no culture result but are recorded in NTBS, or the multidrug resistance database, or comments in NTBS indicate the individual has been treated as MDR with a second line drug regimen (for example, contacts of MDR individuals with active TB treated for MDR, or those diagnosed and or  started treatment abroad).

Total MDR or RR cohort

This includes both those with culture confirmed MDR or RR TB and those who were treated as resistant with second line drug regimen.

Clustering isolates

WGS was implemented for all of England in 2018. Results are available only if the isolate was successfully cultured. An isolate is defined as being in a cluster if it has 12 or fewer genetic differences (known as single nucleotide polymorphisms or SNPs) between it and another isolate that has previously been sequenced.

More detail on UKHSA’s approach to WGS-based typing is found in the WGS handbook.

The current database includes samples from devolved nations and research samples. Therefore, we report positive clusters where there is more than one person in the cluster from England. The definition for clustered is:

Yes: 12 SNPs or fewer from another person’s sample and there is more than one person resident in England in the cluster.

No: the sample is 12 SNPs or more from any other person’s sample that has been sequenced in the UKHSA database.

However, the proportions of notifications clustered are reported as the percentage of clustered isolates (corresponding to a single notification) as a percentage of all notifications. This is also used for the risk ratio analysis of risk of a notification being in a cluster.

Note that contacts may be identified and assumed to be in clusters based on epidemiological information obtained through contact tracing. However, only those with active disease and WGS information are reported here.

TB treatment, diagnostic and treatment delays

Enhanced Case Management and directly observed treatment

Numbers and proportions of people with enhanced case management (ECM) per level, and those receiving  directly observed treatment (DOT) were calculated for all of those with information on ECM and DOT available. People who had information on DOT but were missing ECM data were coded as ‘Yes’ for any ECM and coded into level 3 of ECM. Those who had missing information on any ECM but were recorded as being in level 0 of ECM (equivalent to standard treatment) were recoded as having ‘No’ in the ECM binary variable of ECM required, thereby considerably reducing proportion of notification with missing information.

The percentage of any ECM was calculated as the proportion of cases that reported ‘Yes’ (1) to ECM, or (2) to DOT offered, or (3) DOT received, out of all cases with information. The percentage of ECM per level was calculated as the proportion of cases with a known level of ECM out of all cases with information on ‘any ECM required’ (‘Yes’ or ‘No’). The percentage missing data is calculated as the proportion of all TB notifications for each year with no information recorded in (1) ECM required, (2) ECM level required or (3) DOT offered or (4) DOT received.

Diagnostic delays

Delay to TB diagnosis is calculated as the days difference between the self-reported date of TB symptom onset and the date of TB diagnosis as recorded in NTBS Diagnostic. Delays are not calculated for those who were diagnosed with TB at post-mortem and those with missing data, so these are not included in the denominator for the proportion of people with delays to TB diagnosis. Diagnostic delays exceeding 2 years (730 days) are excluded from analysis as symptoms lasting for over 2 years are thought to relate to another episode of TB. Negative diagnostic delays, resulting from symptoms presenting post diagnosis, were also excluded from the analysis as these are likely to indicate data errors or treatment side effects as opposed to disease symptoms.

Reporting delays

Reporting delay is calculated as the days difference between TB diagnosis date and date of TB notification to NTBS/ETS. Reporting delays are not calculated for those who were diagnosed with TB at post-mortem and those with missing data, so these are not included in the denominator for the proportion of people with reporting delays. Reporting delays exceeding 3 months (90 days) are excluded from analysis as these delays typically reflect changes outside of healthcare control (such as a patient moving abroad, not attending treatment, patient having died) as opposed to true healthcare delays to notifying the case.

Treatment delays

Treatment delay is calculated as the days difference between self-reported date of TB symptom onset and the date treatment started, as recorded in UKHSA’s surveillance systems (NTBS/ETS). Treatment delays are not calculated for those who have not started treatment, those who were diagnosed with TB at post-mortem and those with missing data, so these are not included in the denominator for the proportion of people with treatment delays.

Treatment delays exceeding 2 years (730 days) are excluded from analysis as symptoms lasting for over 2 years are thought to relate to another episode of TB. Negative treatment delays, resulting from symptoms presenting post treatment start, were also excluded from analysis as these are likely to indicate data errors or treatment side effects as opposed to disease symptoms. Where treatment delays are categorised, categories comprise:

  • 0 to 2 months (0 to 60 days)
  • 2 to 4 months (61 to 121 days)
  • more than 4 months (121 to 730 days)

TB cohort definitions

For the purposes of reporting treatment outcomes for people with TB, 2 mutually exclusive cohorts are defined. They are:

  • MDR/RR-TB cohort: people with TB who were diagnosed with MDR or RR-TB and or were treated with a second line drug regimen for MDR or RR TB
  • non-MDR/non-RR-TB cohort: people who were not identified as MDR or RR-TB and were treated with a first line treatment regimen for non-MDR or non-RR TB

Under this definition, people with TB resistance to isoniazid, ethambutol and/or pyrazinamide but without resistance to rifampicin are included in the non-MDR /non-RR-TB cohort.

Outcomes are reported for the non-MDR/non-RR-TB cohort according to the year of notification up to, and including, 2021. This is to ensure that at least one year of data is available to report treatment outcome by the expected standard treatment duration of less than 12 months. In this cohort, outcomes are reported separately for persons with central nervous system (CNS) disease, or for those in whom CNS disease cannot be excluded, which includes those with spinal, cryptic disseminated or miliary disease. For this sub-group, the last recorded treatment outcome is reported as standard treatment is a minimum of 12 months.

Outcomes are reported for the MDRRR-TB cohort according to the year of notification, up to, and including, 2020. This is to ensure availability of data for the expected standard treatment duration of up to 24 months.

TB treatment outcomes were extracted from NTBS (2020 to 2022) and the Enhanced TB Surveillance system (ETS) (2001 to 2020) and cleaned and validated using comment fields, post-mortem diagnoses, date of key events and case manager follow-up. TB diagnoses that were recorded at post-mortem were excluded from TB treatment outcomes as these cases were not treated. These deaths are reported separately and added to TB treatment deaths to report total TB deaths. This is a change from methodology in reports earlier than the 2021 data annual report. Therefore, please note that all treatment outcome results in this report are not directly comparable with reports prior to this.

Disclosure control methods

Only aggregate data is reported. Aggregated data values less than 5 are suppressed except if it is:

  • the aggregate number of notifications within a single year for England for children aged under 5 years for each sex, as the risk of disclosure is considered very low compared with the importance of monitoring changes in young children
  • the aggregated number across multiple years for large geographic areas (England or UKHSA centre)
  • the average notifications over multiple years for a geographical area, the smallest of which (by population) is lower tier local authority

Data analysis

Incidence and Epidemiology

TB rates

TB rates per 100,000 population are calculated using the mid-year population estimates from ONS.

Average annual rates per 100,000 for the 3-year period are calculated by dividing the numerator (the number of TB notifications in the 3-year period) by the denominator (the sum of the mid-year population estimates for the same 3-year period) and multiplying by 100,000.

Confidence intervals

95% confidence intervals are model derived and were calculated using assumptions of the Poisson distribution for rates and the binomial distribution for proportions.

Risk ratios

Risk ratios are models derived using the binomial distribution for proportions.

Data sources

This report uses data from NTBS which is a live user-entered database.

Software packages and code

Data cleaning and analyses were undertaken using R (R4.3.1) and Stata 17 SE. The code is reviewed and held in the UKHSA internal GitHub repository.

Glossary

95% confidence interval

In this report, model-derived 95% confidence intervals (CI) are often presented alongside percentages and rates. For example, the percentage of TB notifications with pulmonary disease is 52.7% (95% CI 51.3 to 54.2%).

In layperson terms, this can be loosely interpreted as that we have 95% confidence that the true but unknown value of this percentage in the population lies within the range of 51.3% to 54.2%.

Diagnostic delay

The diagnostic delay represents the time (in days) from when a person self-reported TB symptom onset to when they are diagnosed with TB.

Directly observed treatment (DOT)

DOT is a treatment strategy which refers to the patient taking treatment under direct in-person observation of a trained health care worker or designated individual to ensure treatment adherence for patients requiring ECM.

Enhanced case management

ECM is defined as the increased level of patient monitoring for people with (complex) clinical or social issues or both affecting treatment. There are 3 levels of ECM depending on the complexity of the clinical or social issues or both and the intensity of patient monitoring required, ranging from fortnightly or weekly visits to necessitating DOT or VOT. 

ECM may be required for children with TB, those with HIV and taking antiretrovirals, people with complex side effects or single drug resistance and those with complex contact tracing or cases in which the involvement of social services is required. For more information see the Nurse guidance document.

International migrant

An international migrant is classified as the movement of a person across international borders to seek temporary or permanent residence in another country.

Isoniazid resistant (INH) resistant

Isoniazid resistant (INH) is TB that is resistant to isoniazid, a first-line anti-TB drug, and not other drugs.

Monoresistant to a drug other than INH

Resistance to a first-line treatment drug other than INH, for example, ethambutol.

Multidrug-resistant TB

Multidrug-resistant TB (MDR TB) is defined as resistance to at least isoniazid and rifampicin, with or without resistance to other drugs.

Pansensitive

Fully sensitive to all first line drugs, for example, isoniazid.

Poly-drug resistant

Poly-drug resistance refers to resistance to 2 or more first-line drugs but not to both isoniazid and rifampicin.

Post-mortem diagnosis

A person diagnosed at post-mortem is defined as having TB which was not suspected before death, but a TB diagnosis was made at post-mortem, with pathological and/or microbiological findings consistent with active TB that would have warranted anti-TB treatment if discovered before death.

Pulmonary TB

A person with pulmonary TB is defined as having TB involving the lungs and/or tracheo-bronchial tree, with or without extra-pulmonary TB diagnosis. In this report, in line with the World Health Organization (WHO)’s recommendation and international reporting definitions, miliary TB is classified as pulmonary TB due to the presence of lesions in the lungs, and laryngeal TB is also classified as pulmonary TB.

Social risk factors for TB

These include current alcohol misuse, current or history of homelessness, current or history of imprisonment, current or history of drug misuse, current mental health needs, or current status as an asylum seeker or detainee in an immigration removal centre. Please see relevant section under reporting methodology for further details of these variables.

Risk ratios

A risk ratios (RR) quantifies the relative risk of the outcome we are interested in between 2 different groups. For example, the relative risk of pulmonary disease in males compared with females. This is calculated as the proportion of males with pulmonary disease divided by the proportion of females with pulmonary disease, which is a RR of 1.18, (95% CI 1.11 to 1.25). This is interpreted that males have an 18% increased risk of pulmonary disease compared with females and we have 95% confidence that the true increased risk lies within the range of 11% to 25%. If a 95% CI for a RR includes the value of 1.0, then we cannot infer that the true RR is different from 1.

As a result, we would say that these results are not providing any evidence that the observed magnitude of the RR is ‘statistically important’. If an RR of less than 1.0 is reported, such as RR 0.85, this is interpreted that the group of interest have a 15% reduced risk of the outcome.

RR TB

Resistant to rifampicin, a first-line drug, and not other drugs.

Under-served populations

Under-served populations refer to people with TB who have a social risk factor as well as those who were remanded in an immigration removal centre, identified as asylum seekers or unemployed.