Research and analysis

Ethnicity data: how similar or different are aggregated ethnic groups?

Published 22 December 2020

1. Introduction

The Ethnicity facts and figures website shows data about the experiences of different ethnic groups. It contains over 180 pages on topics including crime and policing, health, education and employment.

The Race Disparity Unit (RDU) also publishes ethnic group summaries. So far we have published summaries for the Black Caribbean, Indian and Chinese ethnic groups.

These 3 ethnic groups are relatively self-contained and have large populations. They are detailed ethnicity categories, rather than aggregated ones. A good range of statistics were available for the summaries.

An aggregated ethnic group is one that combines more detailed categories. For example, data for the aggregated Asian ethnic group might combine data for the Bangladeshi, Chinese, Indian, Pakistani and Asian Other ethnic groups.

This report looks at the individual and aggregated ethnicity categories in more detail, including:

  • how similar or different aggregated ethnic groups are, both in their outcomes and the characteristics of people in them
  • issues associated with aggregating ethnicity data, or presenting detailed data

We also consider what this means for presenting and publishing data at aggregated levels, and what ethnic group summaries we might produce in the future. For example, is there value in collecting and analysing data for the aggregated Asian group if the Bangladeshi, Chinese, Indian, Pakistani, and Asian Other groups are very different to each other?

Based on some of these findings, we make recommendations about the summary reports we might produce in the future.

2. Groups considered in this report

The 2011 Census of England and Wales gave people a list of 18 ethnic groups to choose from. These can be directly mapped to the 5 aggregated groups.

This report considers those 18 groups compared with the 5 aggregated groups (Asian, Black, Mixed, White and Other).

We have more pages on the Ethnicity facts and figures website that use the 5 aggregated groups than the 18 standardised groups.

3. Summary conclusions

This report summarises some of the quality issues with the use and presentation of data for aggregated and detailed ethnicity classifications:

  • aggregated ethnic groups mask substantial differences in outcomes between their constituent detailed groups
  • there is a large variety of self-reported ethnicities and different countries of birth in the 5 aggregate groups and the Other groups – for example, Asian Other, Black Other and White Other
  • there can be statistical issues in aggregating groups – for example, some anomalies such as Simpson’s Paradox – where a trend appears in several different groups of data but disappears or reverses when these groups are combined
  • there might be cultural sensitivities in how and why some groups are aggregated
  • some data might only be available for aggregated groups, either because a more detailed classification is not practical (for example, when ethnicity is assigned using visual appearance), or when only aggregated information is collected and used to maintain a comparable time series
  • while more datasets are available for aggregated ethnic groups, there is a good selection of data for the detailed groups

4. How similar or different are the ethnic groups?

This section looks at how the detailed 18 groups differ within their aggregated ethnic groups. We can use this data to help understand how similar or different these groups are.

Table 1: Aggregated data for a selection of different measures

Measure % of population living in West Midlands % pupils getting a strong pass (grade 5 or above) in English and maths GCSE % of households owning their own home % never worked or unemployed % 4 to 5 year olds overweight
Year 2011 Academic year ending July 2019 2 years to March 2018 (combined) 2011 Year ending March 2018
Asian          
Bangladeshi 11.7 50.3 46 25.3 20.7
Chinese 8.0 76.3 45 6.7 18.3
Indian 15.5 64.1 74 9.3 13.8
Pakistani 20.2 41.3 58 24.4 20.4
Asian Other 9.0 60.1 39 12.4 19.7
Black          
Black African 6.5 42.9 20 13.5 30.8
Black Caribbean 14.6 26.5 40 9.4 25.2
Black Other 11.1 33.7 37 15.0 28.1
Mixed          
Mixed White and Asian 9.5 55.5 70 8.0 15.5
Mixed White and Black African 5.6 41.5 34 10.4 27.7
Mixed White and Black Caribbean 16.1 31.0 32 12.5 25.4
Mixed Other 7.4 47.0 42 8.3 20.5
White          
White British 9.8 42.5 68 4.7 22.7
White Irish 10.4 54.9 56 4.9 23.7
White Gypsy and Traveller 8.2 - - 31.2 -
Gypsy and Roma - 6.0 - - -
Irish Traveller - 13.9 - - -
White Other 5.6 41.5 30 5.3 20.7
Other          
Arab 9.6 - 17 19.7 -
Any Other 7.8 - 29 14.5 23.5

Table 1 shows some large differences between the highest values and lowest values for 5 measures. A high or low value could be a ‘better’ or ‘worse’ outcome (or neutral) depending on the measure – the main point of interest here is the difference between the highest and lowest values for each measure.

Firstly, there are differences in where people live. For example, 20.2% of the Pakistani population live in the West Midlands, compared with 8.0% of Chinese people. 5.6% of people in the Mixed and White Black African group live in this region compared with 16.1% of people in the Mixed White and Black Caribbean group.

Also, in the Asian group, the percentage of pupils getting a strong pass in English and Maths GCSE ranges from 41.3% for the Pakistani group to 76.3% for the Chinese group. The percentage of households owning their own home ranges from 39% for the Asian Other group to 74% for the Indian group.

Likewise, we see some large differences in the different Black ethnic groups for home ownership – around 20% of the Black African group owned their own home, compared with 40% of the Black Caribbean group.

There can be large differences in outcomes in the Mixed ethnic groups, particularly between the Mixed White and Asian group and the other groups.

Results for the White aggregated group are interesting. Firstly, they demonstrate that for some measures there is less data for the Gypsy, Roma, and Irish Traveller groups that is noted later in this report. Where data does exist, it shows the poorer outcomes for these groups. We have committed to improving the evidence base for these groups in our Quality Improvement Plan.

5. Issues with comparing 2 ethnic groups

Aggregated classifications limit the extent to which data can be compared. Indeed, for many purposes and many users, the aggregated classification is not sufficiently granular. Table 1 shows that the experiences and outcomes of people from Pakistani, Bangladeshi, Indian and Chinese backgrounds are markedly different, even in a relatively small selection of measures. Those differences are lost in an aggregated Asian group.

There are also more than 30 pages on Ethnicity facts and figures that compare only 2 ethnic groups: White and Other than White (or White British and Other than White British). These binary classifications have even less analytical value than the 5 aggregated groups. An example from the Annual Population Survey is given in Figure 1 and Figure 2.

Figure 1: Percentage of 16 to 64 year olds who were employed, by ethnicity (White and Other than White) over time

Line chart showing that White 16 to 64 year olds were more likely to be employed than people from all other ethnic groups combined every year between 2004 and 2018.

Figure 2: Percentage of 16 to 64 year olds who were employed, by ethnicity, 2019

Bar chart showing employment rates for 10 different ethnic groups in 2018, which range from 82% for people in the White Other ethnic group to 57% for people in the combined Pakistani and Bangladeshi group.

Figure 1 shows that aggregating in this binary way does smooth the time series and prevent large fluctuations.

However, Figure 2 shows that the estimate of 77.5% for the White group in 2019 hides variation within the White group (where the employment rate for the White Other group is 83.0%). Within the Other than White rate of 66.2%, rates vary between 56.1% for the Pakistani and Bangladeshi group, and 75.7% for the Indian group.

6. Statistical issues with using aggregated or detailed groups

The differences between ethnic groups shown above mean that ideally we would want to use the most detailed classifications as much as possible. This next section highlights some of the issues with using detailed and aggregated data.

6.1 Data quality and sample sizes

Annex A shows figures from the 2011 Census. The White British group makes up 80.5% of the population in England and Wales. While most ethnic minority groups make up a small percentage of the population, the Mixed and Other groups have some of the smallest populations. While the White Other group makes up 4.4% of the population, most of the Mixed and Other groups make up less than 1%. In a way, this is a good feature of the classification, as in general we probably wouldn’t want a classification with large numbers of people in the Other groups.

Of course, the actual populations themselves are large. But when put into the context of a sample survey, it explains why it might be difficult to get a large enough sample for analysis of some topics for the groups with smaller populations. Even large surveys like the Annual Population Survey and the Labour Force Survey need to combine ethnic groups in a given year to give sample sizes big enough for robust analyses, or combine more than one year of data, although this has implications for timeliness.

The populations, and therefore the sample sizes, of the 5 aggregated groups are larger. A wider range of data can be analysed, and results are more accurate. Having a larger sample size for a group, and in particular for the 5 aggregated groups, means that analysis of other characteristics such as age, gender and geography (as well as ethnicity) might also be possible.

An example of the reliability of different sized ethnic groups can be seen in the percentage of people who have been victims of crime. This is one of the most viewed datasets on the Ethnicity facts and figures website, which includes confidence intervals up to 2019. In the year ending March 2019, the sample size of the Mixed White and Black African group was below the threshold of 50 for publication for this dataset. There were 62 people in the sample for the Black Other group. This means a full time series is unavailable for some groups.

There are larger variations in the estimates over time for the smaller groups, in part because of smaller sample sizes (see Figure 3). In general, small sample sizes lead to higher variance in estimates, reflected in wide confidence intervals. This then leads to large variations over time in the estimates and can make generalisations very difficult.

In the year ending March 2019, the estimate for ‘victim of crime in the last year’ for the Mixed White and Asian group was 12.2%, with a confidence interval between 4.1% and 20.4%. The small sample size also means that the numbers can change by a large amount year-on-year, although not necessarily in a statistically significant way. In the year ending March 2018, the estimate for this group was 28.6%, with confidence intervals between 17.2% and 40.0%.

The confidence intervals are narrower for ethnic groups that have larger sample sizes (the Pakistani and aggregated Black groups are given here as examples). The effect on the width of the confidence intervals around the estimate can be seen in Figure 3. As the sample sizes get bigger, the confidence interval ranges narrow. This might mean we are more able to detect significant changes over time in these groups, or between them. Wider confidence intervals have less analytical value.

Figure 3: Percentage of people aged 16 years and over who said they were victims of crime, by ethnicity over time

3 line charts that illustrate how confidence intervals (which show how accurate estimates are) are affected by sample sizes

Note: estimates with confidence intervals are available up to the year ending March 2019 on Ethnicity facts and figures.

6.2 Quality vs quantity

The previous section described some issues relating to sample sizes, and how this affects the quality of data. However, we also want people to have the guidance and understanding to be able to find and respond with the ethnicity they most relate to. Given a choice of the 5 aggregated groups, we would want someone who identified as Mixed White and Asian to choose the Mixed option, and not the Asian option, for example. If people chose a different category to that in the intention of the design, we risk coming to incorrect conclusions from the statistics. Discrepancies like this may occur here because of the ordering of categories, for example the harmonised Census order versus an alphabetical order, or if categories are worded differently to what respondents expect.

Also, how data is recorded is important. Data on detentions under the mental health act may use ‘Other’ categories for people whose specific ethnicity was unknown. This means that the rate of detention for people in the Black, Asian, Mixed and White Other ethnic groups, and the Any Other group are considered to be overestimates. It also means the actual rates of detention for people in the ethnic groups not labelled as ‘Other’ may be underestimated, particularly those within the Black ethnic groups.

A final related point here is that many of the most-viewed pages use population estimates based on data from the 2011 Census – this gives the most accurate estimate for the population and some other topics, but the data is now 9 years out of date.

6.3 Cultural impacts of aggregating data

While we understand that data will generally become more robust when we use aggregated data instead of the more detailed groups, there are some issues when aggregating data.

We have already seen how aggregating detailed ethnic groups can mask differences in statistics between different groups. However, aside from the statistical issues, there may be cultural sensitivities in how and when to group ethnicity classifications together.

6.4 Only aggregate data available

Some data involves the ethnicity of a person being assigned by someone else (for example, a police officer). There tends to be a level of consistency between self-identified and, in this case, officer-identified ethnicity but this does mean that the assignment of a detailed ethnicity category is not usually possible in that situation. So in this case, only Asian, Black, White and Other can be assigned.

Some of the quality issues associated with third-party reporting of ethnicity will be reported in a future methods and quality report.

6.5 Counterintuitive results

Sometimes aggregating data can produce counterintuitive results. These are usually as a result of apparent statistical anomalies called ‘ecological fallacies’. A famous example of an ecological fallacy is Simpson’s Paradox. This is where a common trend appears in several different groups but disappears or reverses when these groups are combined. We will show an example of Simpson’s Paradox in a future report.

6.6 Changes in classifications

Sometimes, due to changes in the overall ethnicity classification used, it may not be possible to get a consistent time series for both types of group. For example, a dataset might change from collecting aggregated data to detailed data, or vice versa. Even changing an aggregated data collection can cause discontinuities. For example, the Chinese ethnic group was part of the Other ethnic group in the 2001 Census, and the Asian group in 2011.

This is of course a wider issue of harmonisation. Some data owners may have a reason to place a population in a different category than another data owner would. For example, putting Gypsy, Roma and Irish Traveller into Other because the particular topic they are exploring leads them to want to analyse this group as an ethnic minority other than White, while another owner may put these groups in the White group.

Data owners may also use different classifications in order to maintain comparability over time.

7. The people within different ethnic groups

7.1 Differences between people within different ethnic groups

Next, we look at the ethnicities and countries of birth of people within the groups. This is to see whether people within the aggregated and more detailed groups are broadly similar or more diverse.

People completing the 2011 Census could choose an ethnic group from a list of 18. Each of the 5 aggregated ethnic groups also had an ‘Any other’ option where people could write in their ethnicity using their own words.

Table 2: Population in detailed ethnic groups, from the 2011 Census write in answers

Aggregated group Ethnic group Population % of aggregate group
White English, Welsh, Scottish, Northern Irish or British 45,134,686 93.6
White Irish 531,087 1.1
White Polish 510,561 1.1
White Other Western European 396,571 0.8
White Any other ethnic group 318,604 0.7
White 54 other response categories 1,317,886 2.7
Mixed White and Black Caribbean 426,715 34.9
Mixed White and Asian 341,727 27.9
Mixed White and Black African 165,974 13.6
Mixed Any other ethnic group 143,138 11.7
Mixed Black and White 13,695 1.1
Mixed 66 other response categories 133,151 10.9
Asian Indian or British Indian 1,412,958 33.5
Asian Pakistani or British Pakistani 1,124,511 26.7
Asian Bangladeshi, British Bangladeshi 447,201 10.6
Asian Chinese 393,141 9.3
Asian Sri Lankan 146,627 3.5
Asian 36 other response categories 689,093 16.4
Black African 989,628 53.1
Black Caribbean 594,825 31.9
Black Black British 134,524 7.2
Black Any other ethnic group 73,898 4.0
Black Somali 37,708 2.0
Black 21 other response categories 34,307 1.8
Other Arab 230,600 40.9
Other Any other ethnic group 88,240 15.7
Other Iranian 32,577 5.8
Other Latin, South or Central American 32,107 5.7
Other Kurdish 30,928 5.5
Other 48 other response categories 149,244 26.5

Table 2 shows the diversity of ethnic groups recorded in the 2011 Census, based on ethnic groups that were chosen from the list categories, or what people had written into the ‘Any other’ field. Over 200 individual response categories were recorded in this process.

In the White group, 94% identified as White British. In the Other aggregated group, 42% chose a category other than Arab, Iranian, Latin, South or Central American, or Kurdish.

In the Asian group, 16% of respondents wrote in another category other than Indian, Pakistani, Bangladeshi, Chinese or Sri Lankan.

7.2 How do the ‘Other’ groups vary?

As well as the 5 aggregated ethnic groups, this analysis also looks at the variation of the Other categories. Looking at these differences is important if we are considering writing reports on these groups as part of the 18 reports covering the detailed groups. Some main points of this analysis are:

  • 76% of the White Other population were from Europe, including Eastern and Western Europe, the Baltic States, the Commonwealth of Independent (Russian) States and Turkey
  • there were 68 different ethnicities categorised in the Mixed Other group, including an ‘Any other Mixed ethnic group’ – the most ethnicities out of the 5 Other groups
  • 18% of the Asian Other group identified as Sri Lankan, 15% Filipino, 8% Afghan and 7% Nepalese
  • in the Asian Other category, a large percentage of people identified as Sikh – the ONS has looked at this further
  • 48% of people in the Black Other group identified themselves as Black British, and 13% as Somali
  • 10% of the Any Other group people identified themselves as Latin, South or Central American, 10% as Iranian, and 9% as Kurdish

7.3 Country of birth of people in the Other groups

We can also look at the country of birth of Census 2011 respondents who report being in one of the 5 aggregated groups, and the 18 detailed groups (Figure 4).

ONS analysis of Census data shows that the population has become more diverse over the last 60 years, with an increase in the number of residents born outside the UK. There is variation within groups in where people are born, and differences between groups.

The data shows that:

  • 93% of people in the White ethnic group were born in UK
  • 50% of Asian people were born in the Middle East or Asia
  • 80% of people in the Mixed group were born in the UK
  • nearly half of people in the Black group were born in Africa, the Americas or the Caribbean
  • there was significant variation in the countries of birth of people in the Other group

Figure 4: Percentage of Census 2011 population born in different areas of the world, by ethnic group

Bar chart showing that White people were the most likely out of all ethnic groups to have been born in the UK, while people from the Asian and Other ethnic groups were the least likely to have been.

We can conclude that within the 5 aggregated groups and the 5 Other groups, there is a wide range of self-identified detailed ethnicities, and the people in them come from a variety of places in the world. Having this diversity can therefore make it hard to make conclusions in analyses.

8. Data availability

8.1 How many datasets are available for detailed groups?

The data on Ethnicity facts and figures currently uses 20 different ethnicity classifications.

3 of those classifications include all of the individual ethnic groups that make up the detailed ethnic groups in a way that is either the 2011 Census classification, or what the RDU feels is close to it. They are:

  • the 2001 Census classification of 16 ethnic groups
  • the 2011 Census classification of 18 ethnic groups
  • the 2001 Census classification plus Gypsy/Roma and Irish Traveller (18 ethnic groups in total)

While there are differences between the 2001 and 2011 classifications they are broadly similar in many regards for the purposes of using them for ethnic group summary reports.

Together, these 3 classifications are used on 62 pages of statistics and analysis on the website. Datasets use the 5 aggregated ethnic groups on 104 pages.

Table 3: Number of pages on Ethnicity facts and figures website, by ethnicity classification (as at August 12, 2020)

Classifications on the website Number of pages with the classification
18 ethnic group classification (or close variant) 62
5 ethnic group aggregated classification (or close variant) 104
Other 59

Note: some pages have measures that use more than one classification – see Annex B for the availability of the data for detailed 18 ethnic groups from the pages for which data is available (out of 62)

8.2 Data available for individual ethnicities

Here we look at the availability and completeness of data used in the pages featuring the most detailed classifications – those that would allow reports for each of the most detailed groups.

A broadly comparable set of reports could be produced for most of the 18 ethnic groups using the 62 measure pages shown in Annex B. Some reports could also be supplemented by data where a classification uses a subset of the 18 detailed ethnic groups, but not all 18.

There are fewer datasets available that give information for the Gypsy, Roma and Irish Traveller groups. The 40 datasets that are available are almost all demographic information from the 2011 Census, or data on educational attainment.

This analysis looks only at the data for the most recent year and for data disaggregated by ethnicity only. These classifications may not be supported:

  • for each year within a time series
  • for analysis by an additional factor such as geography, age or gender

9. How this informs our plans

9.1 Moving away from aggregated classifications

Earlier sections of this report have shown how diverse people within some of the aggregated categories are. More datasets are available for analysis for these groups, with the data having larger sample sizes and generally being more robust. However, aggregation can mask considerable variation between the detailed ethnic groups. This problem is even more significant when using binary classifications (for example, White and Other than White).

This is why we are actively trying to move data publishers away from providing aggregated groups, and binary classifications – which really do have limited analytical value – and moving towards supplying data using detailed Census classifications (which are currently for 2011 18+1, but for 2021 when finalised).

However, we are mindful of the challenge of balancing the need for comparability over time, the data needs of users, and trying to make as much data available as possible.

9.2 User needs

In general, there is a user need for any significant data or insights about our work, and this extends to ethnicity summaries for the 18+1 ethnic groups, especially the larger groups, such as Pakistani and Bangladeshi. The needs of different users can vary. So some people such as academics and policymakers would prefer the most granular 18+1. For other non-analytical audiences, the aggregate groups are sufficient in terms of understanding that there are disparities between ethnic groups.

Theoretically, a few categories should be easier for the general public to understand, and as the high-level labels (White, Black, Mixed, Asian and Other) are short words, they are likely to have a better readability score and therefore may be better for conveying information to non-analytical audiences.

In user research, there have not been questions or requests for data on the Other and Mixed groups. When Other groups have been commented on in research sessions, it is often the participant asking for clarity – for example, ‘What does Black Other mean?’.

The user need would appear to be to better understand definitions, and the statistical and data quality issues around some of the groups, and this report covers some of these points.

Insights about the individual groups that make up the Mixed ethnic group (highlighting differences in the groups using data from Table 1) would be a valuable report for users.

9.3 Reporting plans

This final section brings together information from the previous sections to conclude the following about producing ethnicity summaries.

There is more data available for the 5+1 groups than the 18+1 groups. However, we conclude that we won’t produce the reports for the 5 aggregated groups, or the 5 groups labelled ‘Any other’ because:

  • the large variety of self-reported ethnicities and countries of birth in the aggregated and Other groups
  • the aggregated groups mask substantial differences in outcomes between their constituent detailed groups
  • there is a wide enough set of data to produce ethnicity summaries for the detailed groups

10. Next steps

We will:

  • continue to engage with users about their interests in different ethnic groups, and ensure we are providing relevant analyses to meet their needs
  • work with government departments to obtain more data using more detailed ethnicity classifications, and data that uses harmonised principles (some of which is outlined in our Quality Improvement Plan), and continue to support publishing data at the most detailed level where possible
  • fill in gaps where we have ethnicity data for the detailed groups but not the aggregated groups – this will allow users to compare with data only available at the aggregated level
  • produce ethnicity summaries for each of the groups, excluding the 5 Other groups, or the aggregated groups
  • produce a combined rather than aggregated report for the Mixed group that highlights some of the top-level differences between the sub-groups (building on, for example, Table 1 in this report), because of the small populations in these detailed groups
  • produce a Gypsy, Roma and Irish Traveller report with the aim of describing some of the statistics, and use it to demonstrate ways of improving the quality of data for these smaller groups as outlined in the Quality Improvement Plan

If you would like to be part of this work, please contact Darren Stillwell at darren.stillwell@cabinetoffice.gov.uk or Richard Laux at richard.laux@cabinetoffice.gov.uk.

11. Acknowledgements

The RDU is grateful for advice provided by Charles Lound, Helen Ross and Sofi Nickson (Office for National Statistics).

12. Annex A

Table A gives the approximate size of the populations in the ethnic groups considered in this report, along with the percentage of the England and Wales population.

Table A: Population in ethnic groups, from the 2011 Census

Ethnic group Population % of England and Wales population
White British 45,135,000 80.5
Irish 531,000 0.9
Gypsy or Irish Traveller 58,000 0.1
White Other 2,486,000 4.4
Mixed White and Black Caribbean 427,000 0.8
Mixed White and Black African 166,000 0.3
Mixed White and Asian 342,000 0.6
Mixed Other 290,000 0.5
Indian 1,413,000 2.5
Pakistani 1,125,000 2.0
Bangladeshi 447,000 0.8
Chinese 393,000 0.7
Asian Other 836,000 1.5
Black African 990,000 1.8
Black Caribbean 595,000 1.1
Black Other 280,000 0.5
Arab 231,000 0.4
Any other ethnic group 333,000 0.6

13. Annex B

Table B: Availability of data for detailed 18 ethnic groups on the Ethnicity Facts and Figures website (as at 12 August 2020)

Ethnic group Pages for which data is available (out of 62)
English, Welsh, Scottish, Northern Irish or British 62
Irish 60
Gypsy or Irish Traveller 40
White Other 62
Mixed White and Black Caribbean 60
Mixed White and Black African 57
Mixed White and Asian 58
Mixed Other 59
Indian 62
Pakistani 62
Bangladeshi 62
Chinese 61
Asian Other 62
Black African 62
Black Caribbean 62
Black Other 57
Arab 51
Any Other 61

Data might be missing for a group because it is suppressed to protect confidentiality of the respondents, or for reasons of accuracy (or both).