Research and analysis

Differences in the quality of ethnicity data reported by individuals and third parties

Published 3 December 2021

1. Introduction

The Race Disparity Unit’s (RDU) Ethnicity facts and figures website presents data about the experiences and outcomes of people in different ethnic groups in areas including crime and policing, justice, education and employment.

Our work aims to identify the biggest disparities and how these have been changing over time. It also explores issues around the quality of ethnicity data, making recommendations for ways to improve it.

This report compares the quality issues associated with ethnicity data that has been:

  • self-reported – reported by the individual
  • proxy reported – reported about an individual by someone else

2. How ethnicity is reported

Ethnicity is a complex and multifaceted term which can be personal to each individual.

It can be informed by someone’s:

  • shared traditions and practices
  • place or birth or where they live
  • religion
  • language
  • race

This means that the most practical way of collecting data about people’s ethnicity is to ask them, although how they respond can depend on the context and who is collecting the data. For example, when a local authority collects data it might include an ethnic group as a category that wouldn’t be reflected in the harmonised categories based on the census.

The self-reporting approach is followed in most major government surveys, such as the Annual Population Survey.

However, sometimes it is not possible or appropriate for people to report their own ethnicity. For example:

  • when it is being collected for young children
  • when it is recorded for someone who has died
  • due to circumstances, such as during a trial for a summary offence at a magistrates’ court when the defendant is not present

In cases like these, ethnicity data needs to be reported by a third party. This is sometimes called proxy reporting.

Ethnicity data that has been reported by a third party is generally considered to be of lower quality than data that has been self-reported. This is because of the complex nature of ethnicity.

3. Summary and recommendations

In this report, we start by exploring the quality of self-reported ethnicity data. We then detail some of the different types of proxy reporting and comment on their relative quality.

We suggest that the quality of proxy-reported data is higher when the third party is closer to the individual, such as a relative, rather than someone who does not know the individual well.

We then make recommendations for how the quality of ethnicity data might be improved in general.

These recommendations can be broadly summarised as:

  • collect self-reported data instead of proxy-reported data where possible and maximise the quality of self-reported data
  • adequately explain the relevant quality issues associated with ethnicity data, regardless of how it has been reported
  • report the proportions of the data that has been self-reported and proxy reported if both methods have been used
  • give people the maximum opportunity to report the ethnicity they identify with, for example, by letting them write out their answers or by using extended or tailored lists of codes
  • improve guidance to help people answer ethnicity questions in surveys, and advise organisations on how ethnicity questions should be asked

4. Self-reported ethnicity data

An individual’s ethnicity is considered by surveys and administrative datasets to be a characteristic that does not change over time, like their name or date of birth.

However, analysis carried out by the RDU into changes in children’s ethnic identities over time provides evidence that:

However, analysis carried out by the RDU based on the ethnicity of children provides evidence that:

  • the ethnicity identity reported by a child can change over time
  • self-reported ethnicity data has quality issues

The analysis used data from the Understanding Society youth survey. Every other year, it asks children aged between 10 and 15 years old about their ethnicity. We counted the number of children who responded with a different ethnicity in the survey at least once.

Figure 1 shows that, on average, about 1 in 10 children reported a different ethnicity at least once before they became an adult. However, this figure was mostly influenced by the large number of White British children in the survey. Seven percent of White British children reported a different ethnicity at least once, by far the lowest percentage out of all ethnic groups.

Those in ethnic minority groups were often more likely than average to change how they reported their ethnic group at least once. For example, among those who identified as Black Other at least once, 94% did this.

Figures were especially high for children from the mixed and other ethnic groups. With the exception of Mixed White and Black Caribbean children, children from mixed and other ethnic groups were more likely to report different ethnicities more than once.

On the other hand, fewer than 1 in 5 children from Bangladeshi, Pakistani and Indian ethnic backgrounds in the study changed how they reported their ethnicity. For example, the figure for Bangladeshi children was 13%.

Figure 1: Percentage of children who changed their ethnicity at least once
Chart showing the percentage of children who changed their ethnic identity at least once

Note: Unweighted counts are used instead of weighted counts because the findings are not meant to be representative of the general population. These findings should not be used to make generalisations about the general population. This is because the unweighted counts do not correct for different non-response rates between ethnic groups and could therefore be biased. Figures based on fewer than 30 children are not shown.

The analysis also looked at the self-reported ethnicities of the parents of these children to see if this had an impact on the likelihood of children changing their ethnicity over the different waves of the survey.

Where the ethnicities of both parents were known[footnote 1], if both parents reported the same ethnicity then children were less likely to change their ethnicity (6%) compared to those whose parents had different ethnicities to each other (31%).

For those whose parents were both White British, 2% of children changed their ethnicities at least once.

Table 1 shows the combinations of parent ethnicities associated with the highest level of ethnicity change among children in the study. The combination with the highest percentage was children where both parents were White Irish (42%). The percentage was also high for children with one White Irish parent.

A closer look at these cases shows that, of the 70 children who had one or two White Irish parents and who changed their ethnicity at least once, 68 of them moved between the White British and White Irish ethnic groups. Of those, 64 out of 70 lived in Northern Ireland.

The next highest percentage was when both parents were from the Asian Other group (41%). However, there are multiple Asian ethnicities within the Asian Other group so the parents may have different backgrounds.

Table 1: the combinations of parent ethnicities and the ethnicity of their children, where the number of children was 30 or more
Father’s ethnicity Mother’s ethnicity Number of children (unweighted counts) Number of children who changed their ethnicity at least once (unweighted counts) % of children who changed their ethnicity at least once
White: Irish White: Irish 60 25 41.7
Asian: Other Asian: Other 39 16 41.0
Unknown White: Irish 51 18 35.3
White: Irish White: British 50 17 34.0
Unknown White: Other 49 15 30.6
Unknown Asian: Indian 43 13 30.2
Unknown Black: Caribbean 76 21 27.6
White: Other White: Other 50 13 26.0
White: British White: Other 72 18 25.0
Unknown Asian: Pakistani 57 14 24.6
White: British White: British 2623 54 2.1

Note: Counts are unweighted

This analysis might demonstrate that when children respond to ethnicity questions, they are confused by the number of available options, the complexity of the question, and the guidance.

The Understanding Society youth survey does not have an option for a person to write their identity if they select one of the Other ethnic groups. In these cases, it could reflect a child’s changing opinions about these groups and the extent to which they adequately represent them. This might be a result of the process of self-learning, or the way that younger children might define themselves in relation to others.

For these reasons, a possible conclusion is that it is either inappropriate to ask children to self-report their ethnicity or that there need to be improvements in data collection to reduce the amount of change over time.

On the other hand, the children who change the reporting of their ethnicity over time could be acting deliberately. The ethnicity question might give a child an opportunity to work through questions about their personal identity. Therefore, surveys which do not give children the chance to alter their ethnicity data after it has been collected, might not be keeping up with changes to the way a child might respond to an ethnicity question. This will also have important implications for data collection.

What is clear from the analysis is that the continuity of a child’s ethnicity shouldn’t always be assumed, and this will have an impact on data quality.

There is also some research that looks at changing ethnicity among the whole population. It compared UK census records from 2001 and 2011 and found that 4% of the population who provided ethnicity information, reported ethnicity differently between censuses.

Potential reasons identified by the research include:

  • imputation, where census officials have to estimate data because the question in the census was not answered
  • changes to the wording of the ethnicity question
  • people reporting the ethnicity of other people in their household

The research concludes that genuine changes to individuals’ ethnicities are less likely to be a cause for a change in ethnicity.

4.1 Recommendations

It should not be assumed that ethnicity does not change, particularly among children. Government departments and research bodies should be aware of how this could impact their work.

More studies which gather information on people over time will give respondents the opportunity to reconsider their ethnicity. Studies can do this by continuing to ask questions about ethnicity during each wave of information gathering.

Improved guidance can improve data accuracy when collecting ethnicity data by helping people:

  • understand that they can change their ethnicity if they want to
  • avoid misunderstanding questions about ethnicity
  • choose the ethnicity that they want to identify as
  • understand what it means if they select the option ‘prefer not to say’[footnote 2]

The 2021 Census for England and Wales used 19 categories for ethnicity, with some questions giving respondents the chance to write-in their ethnicity. It is important that organisations collecting data should allow people who choose Other ethnic groups to write their specific ethnicity, or use extended or tailored code lists if possible. This will likely increase administration and analytical burden for data collectors. However, it also allows people the maximum opportunity to report the ethnicity that they most identify with. Ultimately, this will help to improve the data collected.

5. Proxy-reported ethnicity data

We have identified 4 ways that ethnicity data is proxy reported. These are when someone:

  • is related to the person or knows them well
  • ‘stands in’ for the person in the analysis of data – for example, if ethnicity data is collected for a household as a whole, not for the people living in it
  • doesn’t know the person well but has interacted with them
  • doesn’t know the person and has never interacted with them

We have put these methods in order according to how close the third party is to the person. We work on the assumption that the closer they are to that person, the more likely it is they will provide the same answer to what would have been self-reported. However, the quality of each method may differ when applied to different situations.

For each method, we have provided examples of where they are used across government and public services and discuss any quality issues.

Schools have to collect information about their pupils and send it to the Department for Education (DfE). This is then recorded in the national pupil database. Schools collect this by carrying out a school census in each of the 3 terms in the school year. Ethnicity data is collected annually through the Spring census, although it is our understanding that this data is not requested again if it has been gathered through the first census that took place after the pupil joined the school.

DfE’s guidance to schools on what ethnicity data needs to be collected about pupils says:

“We require data on ethnicity for all pupils. The school must not ascribe any ethnicity to the pupil. This information must come from the parent / guardian or pupil.”

While we know that ethnicity data in the database is a mixture of self-reported and proxy-reported data, we don’t know the percentage of either. It is DfE’s belief that only pupils aged 14 and over are likely to self-report.

When the information is proxy reported, the third party is the legal guardian of the pupil so is well positioned to report the child’s ethnicity for them. It is likely they will select the same ethnicity for the child that the child themselves would identify with.

Ethnicity data which is proxy reported by someone who is related to the person or knows them well is the best alternative when self-reporting is not possible.

This can still introduce some issues with data quality. In Scotland, when an individual registers a person’s death, they are asked to select the ethnicity of the deceased from those listed under the 2011 Scottish census classification. Analysis by the National Records of Scotland (NRS) looked at the consistency between ethnicity data recorded in the 2011 Census and data recorded for people that had then died between 2012 and 2014. It found that where there was data available from the census and the death registration record, consistency was high (93.8%).

But this was due to high consistency among White Scottish people, who make up the vast majority of the records. Consistency among other ethnic groups was much lower. Sixty-four percent of those with ‘White - Other British’ and 78% of those with ‘Indian, Indian Scottish or Indian British’ had matching ethnicity in their death and census records. The analysis estimated that in a significant number of cases the ethnicity was being wrongly estimated from the country of birth.

As a result, the NRS recommended that the ethnicity question be asked before place of birth in the death registration form, and that guidance around the differences between ethnicity and place of birth be improved.

Recommendations

DfE could improve its school census guidance by saying whether or not ethnicity data for pupils should be collected again each year, even if it has already been recorded in a previous year. The extent to which schools currently reuse ethnicity data and the extent to which they ask families again for a child’s ethnicity is unknown.

We also recommend that schools give families the chance to change any of the child’s information in every school census. When possible, children should also have the chance to change their ethnic identities if they want to.

Schools should also ask if ethnicity data has been self-reported or proxy reported, while DfE could also say by which age children should be self-reporting. Asking families to state how the questions have been answered would be better as the understanding of ethnicity in children varies depending on age.

The Minister for Equalities said in the quarterly report on addressing COVID-19 health inequalities that recording ethnicity should become mandatory to understand the impact of the pandemic on ethnic minorities. The process would involve making ethnicity a mandatory question for healthcare professionals to ask patients. This data would then be added to a new, digitised death certificate which could contribute to the Office for National Statistics’ mortality data.

5.2 Proxy reporting by someone who ‘stands in’ for the person

Surveys and administrative data collections usually gather ethnicity data for an individual. But a number of measures on Ethnicity facts and figures, such as Home ownership and Persistent low income, analyse data by household.

In order to meaningfully compare ethnicity by household, the ethnicity of the head of the household (usually the person with the highest income) needs to be used as a ‘stand-in’ for all members of that household.

This kind of proxy reporting is another example of a third party who is related to the person or knows them providing the required ethnicity data. But it differs from the previous method because the ethnicity of the head of the household will be used even if it is different to some or all of the other household members.

We explored the quality of stand-in ethnicity data by looking at individual and household data from the 10th wave of Understanding Society’s main study. We first identified how many households had more than one person – for those households, we analysed the ethnicities of the head of the household and all the other people living there. It’s worth noting that in Understanding Society, most data is self-reported by adults, though some is reported by others in the household[footnote 3].

We then calculated the percentage of households which contain people who have:

  • the same ethnicity (single ethnicity households)
  • different ethnicity to the head of the household (multi-ethnicity households)

Figure 2 shows the results of this analysis.

The figures (based on unweighted counts) show that, on average, 72% of households in the study were multi-person households. Of these households, 11% were multi-ethnicity households. However, this figure was influenced by the large number of multi-person households analysed where the head of the household was white, of which 9% were multi-ethnicity households.

The figures for the Mixed and Other groups are much higher: 68% of multi-person households where the head of the household was from a mixed background were multi-ethnicity households. The figure was 47% for multi-person households where the head of the household was from the Other ethnic group. Note that the figures for both groups have wide confidence intervals.

Multi-person households made up 65% and 76% of all households headed by people from these ethnicities respectively. 28% of multi-person households where the head of the household is black are multi-ethnicity households.

It’s worth noting that these findings should not be used to make generalisations about the ethnic make-up of UK households. This is because the unweighted counts do not correct for different non-response rates between ethnic groups and could therefore be biased.

Figure 2: The percentage of multi-person households which are multi-ethnicity households, by ethnicity of the head of the household, based on unweighted counts
Chart showing how many households are multi-person and how many of those are multi-ethnicity households

The analysis shows that stand-in proxy reporting is most representative of the ethnicities of all people in a household where the head of the household is white.

For households where the head of the household is from another ethnic group, the quality of stand-in proxy reporting is not as good. This is because in a larger percentage of cases at least one member of the household is being assigned an ethnicity which they do not self-identify with.

To note:

  • analysis was calculated on the 18 ethnic groups before they were aggregated into the 5 groups in Figure 2, which means that a black household would be considered a multi-ethnicity household if a Black African person and a Black Caribbean person live there
  • households were removed from the analysis if the ethnicity of the head of the household was not known
  • 2.3% of all multi-person households had members from the same ‘other’ group (Asian Other, Black Other, Mixed Other, White Other or Any other) so have been counted as single-ethnicity households for this analysis – they could be multi-ethnicity households (for example, if someone is white Western European and another is white Eastern European) but we don’t know this from the data

Recommendations

When stand-in ethnicity data is used to analyse household data, care needs to be taken when the head of the household is from an ethnic minority, particularly those of mixed or other ethnicity. The analysis should be accompanied with an explanation of the issues with stand-in ethnicity data.

Analysis should use people as the unit of analysis rather than households and use self-reported ethnicity data rather than stand-in ethnicity data. Where this is not possible and households need to be taken as the unit of analysis, findings should be reported for both single-ethnicity and multi-ethnicity households.

5.3 Proxy reporting by someone who doesn’t know the person well but has interacted with them

Sometimes ethnicity data is not reported by a third party who knows the person well, but by somebody who has only briefly interacted with them and who decides their ethnicity from that interaction.

For example, some ethnicity data collected by the police is determined by a police officer. This may happen after force is used by the police and the officer completes a use of force report and states the ethnicity of the person involved ‘as perceived by the reporting officer’.

This type of data also informs the Youth cautions and Proven reoffending measures on Ethnicity facts and figures.

When ethnicity data is identified by an officer, it is entered into the court proceedings database using the following ethnic groups:

  • White – North European
  • White – South European
  • Black
  • Asian
  • Chinese, Japanese, or South East Asian
  • Middle Eastern
  • Unknown

Once in the database, the data is recategorised into a 4-point classification. The 4 categories follow the 2001 Census ‘5+1’ ethnicity classification, but there is no category for people from a mixed background. The categories are only Asian, Black, White, and Other.

Sometimes, officer-identified data has to be used instead of self-reported data in producing statistics, because it is the only available source for the information. This is the case for statistics which use data from the Police National Computer, which contains only officer-identified information.

Another issue is that because this data is only based on visual appearance, it assumes ethnicity can be determined only by someone’s race.

The courts proceedings database also contains self-reported ethnicity data for the same people collected from other sources. This allows the Ministry of Justice (MoJ) to analyse the accuracy of officer-identified ethnicity data by comparing the number of times it matches the ethnicity reported by the person.

Table 2 shows the results of this analysis. It shows that in 98% of cases, when a person self-reported as being in the White groups, this matched the officer-identified ethnicity. For people reporting in the Black and Asian groups, the figures were 96% and 90% respectively. Officer-identified ethnicity data is therefore of reasonable quality for these 3 ethnic groups.

However, the figures also show that almost two-thirds of people who self-reported as having mixed ethnicity were wrongly identified as Black by an officer or administrator. They also show that 36% of people who self-reported as Chinese or Other were erroneously identified in the White classification by officers.

Table 2: Consistency between officer-identified and self-reported ethnicity data in the Court Proceeding Database, 2010 to 2014

The row across the top of the table represents the officer-identified ethnicity using the 4-point classification. The left side column represents self-identified identity using the 5-point classification.

White Black Asian Other Not stated Total (all)
White 98% 0% 0% 0% 1% 100%
Black 1% 96% 1% 1% 1% 100%
Asian 2% 1% 90% 6% 1% 100%
Mixed 17% 64% 10% 4% 5% 100%
Chinese or Other 36% 8% 12% 38% 6% 100%
Not stated 17% 4% 1% 1% 76% 100%
Total (all) 72% 10% 5% 1% 12% 100%

Source: Statistics on Race and the Criminal Justice System 2016, Ministry of Justice, November 2017

Recommendations

When a database has both proxy-reported and self-reported ethnicity data for the same person, the self-reported data should be prioritised when calculating official statistics. The way in which the MoJ links these types of data should continue so the most accurate ethnicity data is always available in these databases.

Where possible, ethnicity data should be self-reported and should be as detailed as possible by using the current Government Statistical Service’s harmonised classification for ethnicity. Analysis conducted separately by the RDU shows how differences in outcomes between detailed ethnic groups such as Indian or Pakistani can be masked when only the figures for the aggregated group are reported. When data has to be proxy reported, data might be most accurate when reported at the 5+1 level due to difficulties in accurately assigning an individual to one of the detailed ethnic groups.

The quality of self-reported ethnicity data which is collected in the criminal justice system could be improved. The Criminal Justice Board has commissioned work to investigate data gaps in the criminal justice system, which will look at ethnicity data. Initial findings have found:

  • low and inconsistent rates of collection through the system
  • a lack of trust or understanding from system users around why this data was being collected and how it was being used
  • a need to consider ethnicity data collection alongside other protected characteristics

When the MoJ publishes data which is compiled from multiple sources with a mix of self-reported and officer-identified ethnicity data, it should include tables that show the percentage of each.

5.4 Proxy reporting by someone who doesn’t know the person and has never interacted with them

This is possible when a third party estimates a person’s ethnicity based on other information known about them.

This may be done by using people’s names if a list of known name-ethnicity pairs is used to populate ethnicity data for a list of names. Easily accessed large data sources such as the electoral register, which makes people’s names and geographic information readily available, make this kind of proxy reporting relatively simple.

Another way is to use people’s countries of birth or their nationality by using a list of countries and their main ethnic groups. However, country of birth and nationality are regarded as poorer proxies for ethnicity than name matching. This is because of the increasing population of people who are second generation migrants, and because of dual-nationalities and the fact a person can change their nationality.

Research into the effectiveness of name-based proxy reporting has concluded that the method has significant potential to help fill gaps in areas where ethnicity data is in short supply.

If there is a clear benefit to the population, such as in improving the effectiveness of a health service, this information offers a useful replacement to the other methods of collecting ethnicity data if they can’t be used.

But there are important limitations to the approach. Specific groups of people are often misrepresented, such as:

  • people from a mixed ethnic background, because last names are often given to people according to the last names of their father
  • women who married someone of a different ethnicity and who took their husband’s last name when they married
  • Black Caribbean people, because of similarities between Caribbean and British surnames
  • people from Muslim countries such as Pakistan and Somalia, because of the ubiquity of Muslim surnames in different countries in Asia and Africa

The relationship between ethnicity and names is specific to particular times, places, and groups of people. To make this type of proxy reporting more accurate, the reference list of names and ethnicities needs to be bespoke.

Recommendations

Name-matching provides a useful replacement to other more direct methods of collecting ethnicity data. However, the limitations should be understood and explained in analyses.

6. Acknowledgements

The following people provided assistance and guidance in the production of this report:

  • Ann Claytor, Department for Education
  • Jenny Bradley, Home Office
  • Jodie Hargreaves, Home Office
  • Robert Reeve, Ministry of Justice
  • Samuel Smith, Ministry of Justice

7. Further information

If you would like further information, or to discuss this report in more detail, please contact darren.stillwell@cabinetoffice.gov.uk.

  1. An ‘unknown’ parent ethnicity could either indicate that that parent does not live in the same house as the child and so no ethnicity data was recorded for them in the survey, or that they do live in the same house but their ethnicity was not determined as part of the survey. 

  2. If someone chooses ‘prefer not to say’, their ethnicity data will usually be categorised separately to someone who ignores the question and whose ethnicity would be recorded as ‘unknown’.

    However, this is likely to vary between both surveys and analysts working with the data, who might aggregate all of these different response types under ‘unknown’. 

  3. If people had answered the ethnicity question in an earlier wave of the study, then their ethnicity data would have been carried forward to wave 10 without them being asked the question again.

    Of those submitting ethnicity data for the first time in wave 10, the information was proxy reported for 2.3% of people.