Research and analysis

Appendix F: Prioritisation and progress of data quality recommendations

Published 3 December 2021

In August 2021, the Race Disparity Unit (RDU) and the Office for Statistics Regulation (OSR) held a joint roundtable discussion with owners, providers and users of English healthcare data. This annex summarises discussions about health ethnicity data priorities at the roundtable, has further supporting information for each of the priority recommendations, and updates on work already in progress.

The roundtable

The following organisations were represented at the roundtable:

  • Department for Health and Social Care
  • King’s Fund
  • NHS Digital
  • NHS England and NHS Improvement
  • NHS Race and Health Observatory
  • NHSX
  • Nuffield Trust
  • Office for National Statistics
  • Office for Statistics Regulation
  • Public Health England
  • Race Disparity Unit

Background and aims

The third quarterly report had the following recommendation: “RDU should engage with the Office for Statistics Regulation about priorities for improving the quality of ethnicity data on health records, drawing on others’ expertise as appropriate, and report back in the final [fourth] quarterly report.”

This formed the basis for the aim of the roundtable which was to discuss the recommendations from the 3 published quarterly reports plus draft recommendations that were being considered for the fourth report. Some of these were derived from the Nuffield Trust report on Ethnicity Coding in Health Datasets, and presented by Nuffield Trust at the meeting. The recommendations focussed on:

  • coding of ethnicity data
  • data collection
  • analysis
  • reporting
  • transparency and publication of health ethnicity data

While many of these have been concerned with improving health data, some had wider applicability to other datasets.

Aims of the roundtable

The recommendations were split into 3 priority types, based on an initial RDU assessment, and then discussed at the meeting. When considering priorities, the following were considered:

  • dependencies with other recommendations
  • impact on the overall quality
  • resources required
  • amount of data and analyses available
  • wider applicability
  • improved stakeholder and user views

Outcomes

The outcome of the discussion was broad agreement on the priorities. In order to progress these priorities, RDU recommends that the main organisations associated with improvements to health ethnicity data should create a Programme Board, involving representatives of the user community and other relevant stakeholders (including Devolved Administrations), to oversee the implementation of the priorities listed. This and associated governance arrangements are vital to ensure that individual projects and tasks, and interdependencies, are managed coherently.

The Board should also provide regular reports outlining progress towards implementing the priorities. This was outlined in the second quarterly report.

Higher priority next steps

2 next steps were agreed as higher priority in the roundtable discussions:

  • improving ethnicity coding in health datasets
  • reviewing data dissemination

Improving ethnicity coding in health datasets

There was agreement between departments and agencies at the roundtable that the highest priority next step was to improve the coding and recording of ethnicity data. Previous quarterly reports and the Nuffield Trust report on ethnicity data in health records highlighted that data coding issues will have a significant impact on the “understanding of ethnic inequalities and the ability to identify effective responses”.

The coding issues affect records for patients from ethnic minority groups disproportionately. Some of them have been identified in previous quarterly reports and include:

  • incomplete recording of ethnicity, inconsistent use of ethnicity codes, and use of 2001 Census codes
  • the issue of patients’ records showing multiple, different, ethnicities
  • a large and growing proportion of patients whose ethnicity has been recorded as “not known”, “not stated” or “other” which impedes reliable analyses of ethnic differences
  • quality issues affect some groups more than others – for example, data quality is worse for people in London, in adults of working age, for patients with short hospital stays, and for independent providers

Analyses to further understand COVID-19 and other health disparities between ethnic groups will significantly improve if the coding of ethnicity for patients improves. This will also help identify ethnicity on death certificates.

Improving ethnicity data could come from a 2-part approach:

  • improve existing data quality at source (ethnicity coding)
  • understand and report on the limitations of existing data, and use new analytical methods to improve it

Consideration is being given by DHSC to interdependent pieces of work proposed by NHS England to improve the coding of ethnicity, on receipt of which responsibility will be outlined to relevant leads. This basis of this programme of work would include developing an ethnicity information standard for the NHS and a plan for implementing the new standard.

As noted in the third quarterly report, and by Nuffield Trust, this standard should include new, up-to-date guidance on ethnicity coding for health service providers and GPs.

The guidance needs to cover all NHS-funded care and cover how patients are asked for their ethnicity and how it is recorded in health records. It should include which ethnicity categories are used. The categories might be derived from recommendations in the Unified Information Standard for Protected Characteristics (UISPC) project, described in the first quarterly report. Where ethnicity for a patient needs to be recorded by someone else (reporting “by proxy”), guidelines should cover how this should happen. Proxy reporting of ethnicity is usually of poorer quality than that reported by the individual. Some issues with the proxy reporting of ethnicity data are covered in a recent RDU Methods and Quality Report.

RDU has previously noted the importance of harmonising ethnicity classifications (and those of other personal characteristics). ONS is also in the process of planning what work must go into developing a new harmonised standard for ethnicity, which will involve discussions across the GSS and with users. There is a risk that the UISPC and GSS classifications for ethnicity might end up being unable to be reconciled. There should be dialogue between DHSC, NHS England and the ONS Harmonisation team to ensure that the 2 classifications can be reconciled. This does not necessarily mean that the classifications need to be identical.

Updating the ethnicity classification in health datasets would increase consistency with other health and non-health datasets and in practical terms possibly allow information for Gypsy, Irish Traveller and Arab groups to be presented. No final decision has been made on the categories from the UISPC project and RDU reiterates the importance of progressing this work as part of the wider work programme.

Subsequently, and to ensure that the information standard is implemented correctly and consistently, and on a continuous basis, integrated care system leaders should ensure that the updated guidance on ethnicity coding is implemented. Boards and leaders of NHS providers and commissioners, and GP practices, should take ownership of the quality of ethnicity coding for their patients, ensure that the updated guidance is implemented, routinely monitor the quality of coding, identify how it can be improved, and put in place actions to achieve this. Once guidance on ethnicity coding is available, all health care providers should endeavour to record, update and correct ethnicity coding in all patient records. The guidance should be reviewed and updated on a regular basis.

RDU appreciates that this is a long-term initiative that needs to consider, for example, how to reconcile the numerous standards that already exist across NHS and other healthcare providers systems. It would involve liaison with the system suppliers to develop data capture interfaces. Decisions would also need to be made on how to migrate existing data.

Finally, data linkage is a powerful tool that can be used to better understand the quality of ethnicity recording in different datasets and improve estimates. The ONS has demonstrated that it is possible to use an anonymised process to link NHS Hospital and Episode Statistics, and GP data for England to the 2011 Census ethnicity data in the ONS’s internal secure analysis environment. The quality of linkage rates and issues such as immigration since the last Census remain limitations but self-reported ethnicity data from the Census is widely regarded as the most robust information available for analysing ethnic health disparities.

The ONS has used this linked data to produce key analyses for understanding the pandemic and the health of ethnic minorities more widely. ONS is also making the linked data available in de-identified form to accredited researchers through its outward-facing secure research environment (SRS) as quickly as possible within technical constraints and the governance framework. The ONS is also planning research to investigate the quality of ethnicity data recording in different health datasets in the UK and to propose methods to account for bias in these sources.

A similar approach could be taken using the 2021 Census records when they become available and RDU recommends as a next step that ONS, collaborating with the other relevant health departments, considers how this work linking health and Census data could be improved and extended to facilitate more reliable, timely and detailed estimates of ethnic health disparities on a regular basis.

Any work of this kind should respect the legal and ethical constraints around Census and patient data, while seeking every opportunity to achieve the overarching objective of improved data quality.

Reviewing data dissemination

RDU has recommended 2 next steps on data dissemination. First, that health departments including DHSC, NHSEI and NHS Digital should review and action existing requests for data from RDU and others for the purposes of analyses of the pandemic. There have been protracted difficulties in securing access to NHS Electronic Staff Record data by ethnicity for the UK-REACH team to be able to link with regulator data and healthcare outcomes. The delays appear to be the result of a lack of clarity about the ownership (controller/processor) of the ESR data. Obtaining this would be hugely beneficial for improving understanding of COVID-19 impacts.

NHS Digital has taken steps to make more information available on vaccinations recently for different areas and characteristics through the NHS COVID-19 vaccinations website. But some data remains unpublished that would add huge value to the evidence base. This includes, for example, information on the number of COVID-19 deaths of healthcare workers, by ethnicity. This data is crucial for the UK-REACH study into ethnicity and COVID-19 outcomes in healthcare workers. The study abstract notes that “current evidence of the association between ethnicity and COVID-19 outcomes in people working in healthcare settings is insufficient to inform plans to address health inequalities.” RDU has been told that NHS England has concerns about the quality of these data, but in general RDU considers that the appropriate approach in such circumstances is to make the data available, with a description of the quality limitations and their impact on interpretation.

Data on the number of hospital-acquired COVID-19 infections and deaths was collected through FOI requests to NHS Trusts. NHS England has published data on hospital-acquired COVID-19 infections but it has not published statistics on people who died as a result. Furthermore, 45 acute hospital trusts did not respond to the FOI request with data.

Data on the uptake and use of the NHS COVID-19 app by different ethnic groups is currently collected in the PIRU tracker survey, as the app is anonymous and does not collect ethnicity[footnote 1]. The third quarterly report recommended that this data be published to inform activity to increase the uptake and continued use of the NHS COVID-19 app. As well as ethnicity, data from the research on app usage for other groups (such as disabled people) would also be useful.

Second, RDU recommends that there is an independent strategic review of the dissemination of healthcare data and the publication of statistics and analysis. This review should consider 2 aspects in particular:

  • changes to processes that might facilitate and streamline data sharing and access in the future, while respecting legal and ethical constraints of the data
  • that all useful and relevant microdata and aggregate statistics pertaining to the pandemic should be released in the future

The review might usefully consider the importance of leadership in developing a culture in which data are shared and statistics published unless there are compelling reasons not to do so.

The basis of the review should be underpinned by a complete commitment to transparency in all instances unless patient confidentiality is threatened.

The general principles of the Code of Practice for Statistics, and particularly the principles of ‘honesty and integrity’ are relevant here: “The collection, access, use and sharing of statistics and data should be ethical and for the public good”. And there are significant benefits to the implementation of these 2 recommendations around data access and sharing. These include more and better quality research being possible, more options for data linkage and increased transparency and trustworthiness in outputs, and thus the creation of better policy interventions to improve outcomes for different groups now and in the future.

Lower priority next steps

The following next steps were discussed and agreed as lower priority in the roundtable discussion. It should be stressed that they are important actions for improving the quality of ethnicity data collection, analysis and reporting, and should not be interpreted as being of “low” priority.

Reporting unknown ethnicity

The proportion of records that have been coded with an ethnic group – are not unknown, or missing, for example – can vary between different areas and providers. This demonstrated in management information on ethnicity coverage rates by Clinical Commissioning Group (CCG) are published by NHS Digital. The coverage of ethnicity data for providers is also part of the NHS Data Quality Maturity Index (DQMI), a monthly publication about data quality in the NHS.

NHS data combined from Hospital Episode Statistics and GPES Data for Pandemic Planning and Research (GDPPR) shows that across the 106 CCGs, ethnicity was reported for 93.6% of patients by 4 October 2021.

Out of the 10 CCGs with the highest proportion of patients with a known ethnic category, 6 were located in the North West of England. NHS Knowsley had the highest coverage, at 98.4%, followed by NHS South Sefton and NHS Southport and Formby, both at 97.8%.

The lowest rate of ethnicity reporting was in NHS Bury, at 82.6%. NHS North West London was among the 10 CCGs with the lowest level of ethnicity coverage, at 90.1%, and had the highest number of patients with a known ethnicity (2,721,625) out of all CCGs.

Higher rates of recording of ethnicity do not necessarily mean better quality data. Records might have valid codes but they might not be coded correctly and the high priority work we have described will help to improve this coding.

The Nuffield Trust report recommended that the DQMI should include the proportion of records coded as not known, not stated, an ‘other’ group and ‘any other ethnic group’ in order to better understand the data quality of NHS datasets and for monitoring how data quality changes over time. RDU believes this is a good approach and the work should be progressed by NHS Digital.

More widely, though, all other datasets and analyses (for example, of vaccine uptake) should also include levels of unknown ethnicity and an assessment of how this might affect the interpretation for different ethnic groups.

This is important as it allows users to gain a better understanding of data quality across different datasets, aiding interpretation of data and analysis, and the quality of health datasets can be more effectively monitored and action taken to improve quality over time.

Increasing representation of ethnic minority groups in clinical trials

Research by the National Institute for Health Research (NIHR) has shown that ethnic minority groups are also under-represented in clinical trials with participation of ethnic minority groups falling below the UK population average of 13.8%.

From a total of 622,978 participants taking part in COVID-19 studies across the UK, for example:

  • the percentage of ethnic minority participants involved in COVID-19 studies is 9.3% (57,661 participants)
  • the percentage of ethnic minority participants involved in interventional studies is 9.6% (4,743 participants)
  • ethnic minority participants taking part in observational studies make up 9.2% (52,918 participants)

Participation in COVID-19 vaccine studies is lower – with just 5.7% of the total (1,509 participants) from an ethnic minority.

It is essential that the ethnicity of participants reflects the wider population. A University of London research paper noted that the underrepresentation of ethnic minority groups in COVID-19 might likely be due “to a combination of personal and structural factors. Socio-political factors may include social deprivation limiting access to health services, and subsequently, participation in – and awareness of – health research. Participant-related factors may include language and cultural barriers, and mistrust towards researchers and research institutions and may vary between different ethnic groups.

The importance of this was also noted in the Commission on Race and Ethnic Disparities’ final report. Some of the first priorities for the proposed new Office for Health Disparities would be:

  • investigating barriers to increasing diversity of participants into clinical research studies including clinical trials and genetic studies and identifying solutions
  • campaigns to improve the participation in clinical trials and cohort studies of underrepresented groups, including ethnic minority groups and more deprived populations

Samples should be representative of ethnic minorities, so that new treatments and vaccines being trialled are effective for everybody, and there is also a strong argument for targeted over-representativeness to ensure significant differences between groups can be identified.

Quality of health ethnicity data and statistics

The ONS plans to study the quality of ethnicity data recording across different health datasets in England and to propose methods to account for bias in these sources. The Office for Statistics Regulation will also continue to hold statistics producers to account to ensure the quality of ethnicity data and that statistics meet users’ needs.

The Office for Statistics Regulation would encourage DHSC to keep users informed on the progress with the priority next steps.

Next steps already being progressed

The following ‘next steps’ are already being progressed. The roundtable discussion reflected on the progress that has been made on data quality improvements in some areas, to ensure that this work continues.

Collecting ethnicity as part of death certification process

Work is progressing to make ethnicity a mandatory question for healthcare professionals to ask of patients, and transferring that ethnicity data to a new, digitised Medical Certificate Cause of Death which can then inform ONS mortality statistics.

As part of this process, it is important to confirm that the ethnicity of the person who has died will come from patient records. As described in the quarterly reports, these and other potential sources of an ethnicity record will have strengths and limitations.

Consideration should be given to reviewing and learning from the experience of recording ethnicity at death certification in Scotland.

Harmonising datasets across government and the agencies

Work is progressing to encourage departments and agencies to commit to aligning their ethnicity data collections to the harmonised ethnicity standard as defined by the Government Statistical Service (GSS), and publish their commitment to doing so, including timescales. This was a recommendation from the second quarterly report. [footnote 2].

RDU are currently working with departmental representatives from the Harmonisation Champions Network to action this, pending a new harmonised standard being produced by the GSS, led by the Harmonisation team in ONS.

Describing analysis methods

The Nuffield Trust recommended that methods to address data quality issues in the analysis of ethnic differences should be clearly described. Progress in this area has been made in the analysis and descriptive metadata of linked Census and Hospital records by ONS and the methodology developed by Public Health England that focused on a new method of determining ethnicity using Hospital Episode Statistics (HES), described in the first quarterly progress report.

Since patients may report different ethnicities in different episodes of care – for example, as an inpatient, as an outpatient, or during a visit to A&E – a method of choosing which ethnicity to take is required. During the COVID-19 pandemic, it has become evident that the original method of determining ethnicity has overestimated the number of people in the ‘other’ ethnic group, so alternative methods of determining ethnicity from HES were investigated.

The original method used the most recent ethnicity recorded through linkage to Hospital Episode Statistics. This was supplemented by self-reported ethnicity recorded on test request forms using the Census 2011-based harmonised standard for ethnicity. The new method uses self-reported ethnicity from test request forms and supplements this with the most frequent ethnicity recorded through linkage to Hospital Episode Statistics, unless the most frequent was ‘other’ when the second most frequent was chosen. The new method has resulted in a reduction in the number of cases allocated to the ‘other’ ethnic group and a slight increase in the percentage allocated to all other ethnic groups.

As noted in the data analysis section, ethnicity data in the PHE report on confirmed deaths has been updated based on this new method.

Similarly, any new analytical methods to address data quality issues in the analysis of ethnic differences like the new PHE methodology should continue to be clearly described, and published in the interests of transparency.

Increasing representation of ethnic minority groups in surveys

Progress has been made in increasing the sample sizes in some of the surveys used to measure COVID-19 outcomes, such as the Opinions and Lifestyle Survey, the COVID-19 Infection Survey and the REACT-2 survey. However, often the representation of ethnic minority groups is lower in sample surveys than in the corresponding population (meaning that findings may not be representative), and sample sizes in those surveys might not be large enough to detect statistically significant differences between groups or over time for some ethnic minority groups. This is especially true for individual (rather than aggregated) ethnic groups – for example, analysing the Black African and Black Caribbean groups rather than the black group as a whole.

The first quarterly report noted how ONS has recently initiated a wider project to improve how they engage with under-represented groups. The project will develop evidence-based recommendations to ensure that future mixed-mode social survey designs are more representative.

As part of this project ONS is also going to consider its approach to sample design to investigate whether the samples drawn could be more inclusive and representative of minority groups than at present.

Increasing and improving the use of long COVID codes

The main report notes the significant work ongoing between NHS-X and GP suppliers to improve the capture of data about long COVID. The GP Enhanced Service for long COVID has supported GP training and education, and activity around reducing inequalities. It has also supported GPs in recording long COVID codes in databases when the condition is diagnosed.

Developing the database for health and care statistics in England

There are many data sources that are currently being used to analyse the impact of COVID-19, rates of vaccination uptake and vaccine sentiment, and more recently long COVID. In order to bring some of these datasets together for the benefit of users, the ONS has been developing a website tool that compiles official statistics relating to health and care in England into one location.

This is a great start in helping users understand the complex health data landscape. This complexity is highlighted by the fact there are 760 separate data records in the ONS tool already. The tool has enormous potential to lead to higher quality research and analysis, and provide increased trustworthiness from users from greater transparency.

Relevant publications and datasets covering England are identified in the tool through the GOV.UK statistics release calendar in collaboration with the English Health Statistics Steering Group. The tool is updated each month with new publications from the previous month. In addition to the main update once a month, ONS also does a weekly update of COVID-19 publications, due to the volume of publications and the need for timely statistics on COVID-19.

Datasets published for customers on an ad-hoc basis are not included in the tool. These can be found on the ONS User Requested Data and NHS Digital Supplementary Information websites. Other data and analysis have been made available in different ways, including through data repositories, data dashboards such as the ONS COVID-19 latest insights interactive tool, analysis of new or existing datasets, and published FOI requests.

In addition, numerous organisations external to the public sector have contributed research and data collection during the pandemic. Data and analysis from some of these organisations have been referenced in previous quarterly reports, such as OpenSAFELY’s analysis of mortality, hospitalisations, infections and vaccine take-up, and Virus Watch’s analysis of vaccine sentiment for different ethnic minority groups.

When considering future developments to this tool, RDU recommends for this report that ONS should continue to develop the tool and might investigate:

There might also be benefits in providing more guidance and signposting for health statistics to help a layperson navigate their way through the health data landscape (with a wider focus than COVID-19 data about ethnicity).

RDU will consult with the English Health Statistics Steering Group about this.

Reporting on the quality of coding

Analyses of healthcare activity should routinely include the ethnic dimension, and should include reporting on the quality of coding. This action is incumbent on all organisations reporting on health data.

This would mean more data is available for ethnic groups to inform policy-making and monitoring, and users will gain a better understanding of data quality across different health datasets that will aid interpretation of the data.

  1. A nationally representative sample of smartphone users aged 18 to 79 years in England and Wales, recruited through YouGov’s online panel. 

  2. https://www.gov.uk/government/publications/second-quarterly-report-on-progress-to-address-covid-19-health-inequalities/second-quarterly-report-on-progress-to-address-covid-19-health-inequalities, ”Departments and other agencies should publish a statement on GOV.UK outlining their plans to move their data collections to the Government Statistical Service’s (GSS) harmonised ethnicity data standard.”