Research and analysis

Appendix B: Quarterly progress report on improvements to health datasets

Updated 3 September 2021

Background

The Minister for Equalities’ second quarterly report on COVID-19 health disparities recommended that NHS England and NHS Improvement (NHSEI) – working with the Department of Health and Social Care (DHSC), Public Health England (PHE) and others – provide quarterly progress updates outlining improvements to health datasets.

This document represents the first of these updates, and focuses on a new method of assigning ethnicity using Hospital Episode Statistics (HES). The new method, developed by PHE, is based on the NHS Digital HES ethnicity index with a few modifications.

The next update on improvements to health datasets will appear with the fourth quarterly report.

Assigning ethnicity from HES

Analyses split by ethnic groups are needed to assess overall inequalities in the population, and more recently for measuring inequalities in health outcomes due to COVID-19. However, ethnicity is not recorded in many health-related datasets.

In order to generate breakdowns in data by ethnic group, analysts within PHE:

  1. link datasets of interest to HES
  2. attach ethnicity from HES to their dataset if it does not already have ethnicity recorded

Patients may report different ethnicities in different episodes of care – for example, as an inpatient, as an outpatient, or during a visit to A&E – so a method of choosing which ethnicity to take is required. During the COVID-19 pandemic, it has become evident that the original method of assigning ethnicity has overestimated the number of people in the ‘Other’ ethnic group, so alternative methods of assigning ethnicity from HES were investigated.

The alternative methods were discussed with stakeholders in PHE, as well as external stakeholders from:

  • the Office for National Statistics
  • the Race Disparity Unit
  • NHS Digital
  • The King’s Fund
  • the Institute of Health Equity

An alternative method of assigning ethnicity was agreed. This document shows the original method for ethnicity, as well the agreed alternative method.

Original method: using the most recent usable ethnic code

The original method used within PHE looked at the most recent usable ethnic code for an individual (see Appendix B.1 for further details about ethnic codes) available in these datasets in this order:

  • HES APC (admitted patient care) – from 1997 to 1998 onwards
  • HES OP (outpatients) – from 2003 to 2004 onwards
  • HES AE (accident and emergency) – from 2003 to 2004 onwards
  • If a usable ethnic code still hadn’t been found, the person would not have a usable ethnic code recorded. Instead, they would have their most recent unusable code recorded (unknown, not stated)

For recent analyses in PHE, during the COVID-19 pandemic, the Secondary Uses Service (SUS) dataset has also been used to assign ethnicity. This is given top priority, followed by the data sets listed above.

Appendix B.2 shows the age-standardised mortality rates for deaths from all causes, and deaths mentioning COVID-19 between 21 March and 1 May 2020, compared with baseline mortality rates (2014 to 2018), by ethnicity and sex for England.

These charts show that, even in the baseline period, the age-standardised mortality rates for the ‘Other’ ethnic group were unrealistic. For the baseline period, the ‘Other’ ethnic group had an age-standardised mortality rate of 2,792 – all other ethnic groups had rates less than 1,000.

These results led PHE to consider alternative methods of assigning ethnicity. The method shown in this document has been agreed as the best option for PHE to take.

New method: using the most frequent ethnicity recorded

This method uses the most frequent ethnicity recorded across the 3 HES data sets used in the original method, excluding any unknown values. Outpatient (OP) data was not used from 2006/07 through to 2009/10 as no ethnic code entries were recorded in those years because of a technical issue. Admitted patient care (APC) data is restricted to 2003/04 onwards, as the quality and completeness of admitted patient care data was lower before then.

If there are multiple ethnicities in the data sets with the same frequency, the most recent is chosen.

If there are multiple ethnicities with the same frequency and latest date, precedence is given to the most recent value from the APC data set, followed by the accident and emergency (AE) data set, and the OP data set. Checks completed by NHS Digital indicate completeness in the AE data set was greater than the OP data set.

If there are multiple ethnicities with the same frequency, latest date and source of data, we would select the ethnicity that occurs more frequently in the general population of England and Wales, according to the 2011 Census. (See Appendix B.3).

To put into context, the 2011 Census indicated that 80.5% of the population were White British. If a person has multiple ethnicities recorded (such as White British and White Irish) with the same frequency, the same latest date and the same source, precedence would be given to the White British ethnicity, as more of the population are in that ethnic group, compared with the White Irish. It should be noted that incidences of this are very small, and this step was introduced in order to automate the process and to receive the exact same result each time the analysis is completed.

A value of ethnicity unknown will only be present if there are no known ethnicities in any of the HES data sets.

To take into account the overrepresentation of the ‘Other’ ethnic group, if the most common ethnic group assigned by the method above is ‘Other’, the second most common usable ethnic group is assigned instead. If there are no other usable ethnic groups, the person is still assigned to the ‘Other’ ethnic group.

It is perfectly valid for patients to decide not to state their ethnicity when this information is collected in hospital data. People may also decide to state their ethnicity on some occasions but not others. The original and alternative methods used for assigning ethnicity do not select ‘Not Stated’ records if there are alternative ethnic codes available. Only those who do not have a usable ethnic code and have repeatedly not stated their ethnicity will have the ethnicity ‘Not Stated’ recorded.

Appendix B.1 – Ethnic codes

Code (2001/02 onwards) Description (2001/02 onwards) Code (1995/96 to 2000/01) Description (1995/96 to 2000/01) Usable or not usable ethnic code
A British (White) 0 White Usable
B Irish (White) 0 White Usable
C Any other White background 0 White Usable
D White and Black Caribbean (Mixed)     Usable
E White and Black African (Mixed)     Usable
F White and Asian (Mixed)     Usable
G Any other Mixed background     Usable
H Indian (Asian or Asian British) 4 Indian Usable
J Pakistani (Asian or Asian British) 5 Pakistani Usable
K Bangladeshi (Asian or Asian British) 6 Bangladeshi Usable
L Any other Asian background     Usable
M Caribbean (Black or Black British) 1 Black - Caribbean Usable
N African (Black or Black British) 2 Black – African Usable
P Any other Black background 3 Black – Other Usable
R Chinese (Other ethnic group) 7 Chinese Usable
S Any other ethnic group 8 Any other ethnic group Usable
Z Not stated 9 Not given Not usable
X Not known (prior to 2013) 99 Not known Not usable
99 Not known (2013 onwards) 99 Not known Not usable

Appendix B.2: Age-standardised mortality rates

Figure 1: Age-standardised mortality rates among men for all deaths, and deaths mentioning COVID-19 (21 March to 1 May 2020), compared with baseline mortality rates (2014 to 2018), by ethnicity (England)

Figure 1: Percentage of people who said they were likely to accept or had already accepted the COVID-19 vaccine, by ethnicity and research period

Figure 2: Age-standardised mortality rates among women for all deaths, and deaths mentioning COVID-19 (21 March to 1 May 2020), compared with baseline mortality rates (2014 to 2018), by ethnicity (England)

Figure 2: Percentage of over-80s who had received at least one COVID-19 vaccination by 4 February 2021 and by 14 April 2021, by ethnicity

Appendix B.3: Population of England and Wales, by ethnicity (Census 2011)

Ethnicity Ethnic code Percentage Order
White British A 80.5% 1
White Other (including Gypsy and Traveller) C 4.5% 2
Indian H 2.5% 3
Pakistani J 2.0% 4
Black African N 1.8% 5
Asian Other L 1.5% 6
Black Caribbean M 1.1% 7
White Irish B 0.9% 8
Bangladeshi K 0.8% 9
Mixed White and Black Caribbean D 0.8% 10
Chinese R 0.7% 11
Mixed White and Asian F 0.6% 12
Mixed Other G 0.5% 13
Black Other P 0.5% 14
Mixed White and Black African E 0.3% 15
Other S 1.0% 16