Quality report: Personal Income Statistics release from tax year 2021 to 2022

Q: 1. Contact

Organisation unit - Knowledge, Analysis and Intelligence ( KAI ) Name – M Whent Function - Statistician, Personal Taxes Mail address - Room 3c/04-6, 100 Parliament Street, London, SW1A 2BQ Email - spi.enquiries@hmrc.gov.uk

Question 1

1.  Contact

Accepted Answer

Organisation unit - Knowledge, Analysis and Intelligence (KAI)
Name – M Whent
Function - Statistician, Personal Taxes
Mail address - Room 3c/04-6, 100 Parliament Street, London, SW1A 2BQ
Email - spi.enquiries@hmrc.gov.uk

Question 2

2.  Statistical presentation

Accepted Answer

2.1 Data description

Statistics about personal incomes are assessed using the annual Survey of Personal Incomes (SPI). This publication provides detailed statistics about individuals liable to UK Income Tax and their incomes for the tax year with the most recent outturn data.

2.2 Classification system

The SPI is carried out annually and is based on information held by HMRC on the income assessable for Income Tax for individuals who could be liable to UK Income Tax in a given tax year. Published breakdowns of the number of taxpayers, income, tax liabilities, allowances and deductions are determined based on data submitted by the individual in their Self Assessment Return or by their employer in the PAYE data.

A unique Income Tax payer reference assigned to each individual is used to aggregate the data.

2.3 Sector coverage

Income Tax is an annual tax paid on most sources of income including pay from employment, profits from self-employment, private and occupational pensions, retirement annuities, state retirement pensions, foreign income, income from property, taxable social security income, savings income, income from shares (dividends) and income from trusts. Employees who receive non-cash benefits from their employers such as company cars, fuel, medical insurance, living accommodation or loans also pay Income Tax on these benefits.

Adding all these sources together will give an individual’s total income assessable for tax, an aggregate that appears in several tables in this publication. Some sources of income are not liable for Income Tax including certain social security benefits, Child Tax Credit and Working Tax Credit, and income from tax exempt savings accounts (such as Individual Savings Accounts (ISAs) and some National Savings & Investment products). Most people in the UK get a Personal Allowance which is the amount of income on which no tax will be charged. Some people are also eligible for tax reliefs.

All individuals who could be liable to UK Income Tax are covered by this publication.

2.4 Statistical concepts and definitions

Income Tax

Once tax-free allowances have been taken into account, Income Tax due is calculated using different Income Tax rates for specific types of income across a series of Income Tax bands. There are 3 different sources of income for Income Tax purposes:

earnings, or income other than savings and dividends, also known as non-savings non-dividends (NSND) income (for more information please see the Supporting Documentation for the relevant year)
savings income (such as bank and building society interest)
dividends (such as income from shares in UK companies)

Income tax powers over earned income have been devolved to Scotland and Wales. In the rest of the UK (rUK), earned income is taxed within three main bands of Income Tax rates: the basic rate, the higher rate and the additional rate. In Wales, earned income is taxed within the same bands as the rest of the UK but the rates can be different. In Scotland, earned income is taxed within 5 main bands of rates: the starting rate, basic rate, intermediate rate, higher rate and additional rate. Savings and dividend income are taxed on same basis across the whole of the UK, with the bands aligned to the rUK bands for earned income but the rates for dividends can be different. Some basic rate taxpayers are also eligible for a starting rate for savings. For tax years up to and including 2019 to 2020, the income tax liability for an individual in the SPI was calculated with reference to their residential postcode. For the tax year 2020 to 2021 and future years, the income tax liability calculation has been updated to reflect the tax regime that would be applicable to an individual.

Income Tax typically works on a ‘stack’ basis. This means that earnings are generally taxed first, then savings income and finally dividend income. This means that if an individual has earnings after allowances sufficient to completely fill the basic rate Income Tax band, all savings or dividend income would be charged at the higher or additional rates of tax.

Further detail on how Income Tax liabilities are calculated is provided in Annex A of the Supporting Documentation, and a full list of definitions of terms used in this publication can be found Glossary of Terms section in the Supporting Documentation.

Tax year

The statistics are aggregated into tax years. A tax year stretches from 6 April until 5 April the following calendar year.

Taxpayer

An individual calculated to have a positive Income Tax liability for the tax year, based on the income, allowances, reliefs and deductions for the year.

Total income

The sum of an individual’s components of income taken into account in calculating Income Tax. This includes earnings from employment, profits from self-employment, pension income, some social security benefits, savings income, income from shares (dividends), rental income, and income paid from trusts. It excludes:

gains from the disposal of assets that are classified as capital gains
interest, dividends or bonuses from tax exempt investments (for example, ISAs and National Savings & Investments Savings Certificates)
interest and terminal bonuses from Save As You Earn Schemes
Premium Bond, National Lottery and gambling prize winnings

Total income is calculated before relief for contributions to occupational and personal pensions, other deductions and reliefs or personal allowances.

In the tax system, income is streamed into three main categories: dividends; savings income (not dividends); and non-savings income as different rules apply.

Taxable income

Income assessable to Income Tax after allowances.

Income Tax liabilities

The amount of Income Tax due on taxable income after applying tax rates to the tax base. The Income Tax liability for each sample case in the SPI is calculated by reference to the amounts of income by type, deductions and reliefs and the tax regime parameters that apply for the year. The calculated liability for a tax year will differ from the amount of Income Tax receipts collected in that tax year.

Personal Allowance

The amount of income you can receive for the tax year without having to pay tax on it.

Personal Savings Allowance

The amount of savings income you can receive for the tax year without having to pay tax on it.

Dividend Allowance

The amount of dividend income you can receive for the tax year without having to pay tax on it.

Pay As You Earn (PAYE)

PAYE is the system used by HMRC to collect and account for Income Tax on earnings from employment and pensions. Income Tax and National Insurance Contributions are deducted by the employer and paid over to HMRC on behalf of the individual for each pay period.

Self Assessment (SA)

Self Assessment is a system where an individual declares their income and can calculate their own Income Tax due after the end of the tax year. Taxpayers included in Self Assessment can be higher earners, self-employed and taxpayers with complex tax affairs.

Industry

Industry categories are based on UK Standard Industrial Classification of Economic Activities 2007 (SIC2007). Income from self-employment (sole trade and partner) is assigned an industry using the business text descriptions supplied on Self Assessment returns.

Geographical Areas

Some tables present information for sub-UK areas described as Government Office Region, County, District and Parliamentary Constituency. Administrative and Political geographical areas are not held on taxpayers’ records. For the SPI, the areas are attached by matching the individual’s postcode to the Office for National Statistics Postcode Directory. National Statistics are accredited official statistics.

2.5 Statistical unit

The unit in the statistics release are Income Tax payers in the UK, with the exception of tables 3.9 and 3.10 which cover all individuals with income from self-employment.

2.6 Statistical population

All individuals liable to pay Income Tax in the UK. Individuals who may have income but are not liable for Income Tax are excluded; this may occur if the individual has no Income Tax liability due to their deductions, reliefs and Personal Allowances exceeding their total income or if their income is below their Personal Allowance.

2.7 Reference area

The geographic region covered by the data is the United Kingdom (UK).

2.8 Time coverage

The statistics cover the time period from the tax year 1999 to 2000 until the latest tax year available. Note that data for 2008 to 2009 is not available.

Question 3

3.  Statistical processing

Accepted Answer

3.1 Source data

The SPI is carried out annually and is based on information held by HMRC on the income assessable for Income Tax for individuals who could be liable to UK Income Tax in a given tax year.

The SPI data is currently sampled from two HMRC operational computer systems:

the National Insurance and PAYE Service (NPS) system covers all employees and occupational pension recipients with a Pay As You Earn (PAYE) record
The Computerised Environment for Self Assessment (CESA) system covers people with self-employment, rental or untaxed investment income. It also covers those with higher incomes and other people with complex tax affairs. Where people have both NPS and CESA records, their CESA record is selected because it provides a more complete picture of their taxable income.

The SPI previously also sampled from another HMRC operational computer system (but this is no longer the case from the tax year 2019 to 2020):

the Claims system includes individuals without NPS or CESA records that have had too much Income Tax deducted at source and claim a repayment. For operational efficiencies, R40 Claim forms were migrated onto NPS. The last survey that samples were drawn from the Claims system was the 2015 to 2016 SPI.

Additional data about the sample cases are also collected from other HMRC administrative systems, as follows:

The PAYE Real Time Information (RTI) system covers submissions from employers and pension providers to report Income Tax and National Insurance contributions before they pay wages or pensions to employees and pensioners. This has been used as the source of ‘net pay’ pension contributions in the SPI. From the tax year 2021 to 2022, information submitted via RTI has been used as the source of ‘relief at source’ contributions, together with information submitted via the RPSCOM100(Z) and Self Assessment returns. You can find out more information about the RPSCOM100(Z) in section of this report titled ‘Data compilation: Imputation of pension income’.

3.2 Frequency of data collection

The SPI dataset is compiled on an annual basis and is usually available around 23 months after the end of the tax year. The raw data is drawn from HMRC’s systems approximately one year after the tax year end and it takes about a year to process, analyse and produce the SPI publication.

3.3 Data collection

Separate samples are drawn from each of the administrative systems and different sampling strategies are used for each, which reflect the skewed distribution of Income Tax liabilities. The samples are structured as follows:

Some individuals with a PAYE record are also in the Self Assessment system. These individuals are excluded from the PAYE population prior to sampling, as their Self Assessment record provides a more complete picture of their taxable income.

Once data are collected for the constituent parts of the sample, the datasets are joined together.

The sampling strategies described above intentionally yield large sub-samples of SPI cases with very high incomes and subsequently account for a large proportion of total Income Tax liabilities. This increases the precision of estimates of liabilities and taxable incomes drawn from the SPI. After allowing for non-response and for records that failed data validation tests, the SPI contains 800,000 to 900,000 records, representing approximately 1.8% of individuals in contact with HMRC. For exact sample counts, please refer to the Supporting Documentation for the relevant year.

3.4 Data validation

Checks carried out on the SPI include:

Automated checks take place when loading the data into the analysis database.
Analysts check that the number of records loaded into the analysis database is as expected.
Data validation checks are performed on date of birth, sex, income and postcode. Data are checked against other internal systems and in some cases are validated manually.
Analysts check any outliers in the data which are then examined on a case-by-case basis. Outlier checks sometimes result in adjustments to the dataset as required to improve accuracy or prevent skewing.

Any large changes in the number of taxpayers, income or Income Tax liability figures from one statistical release to the next are investigated.

3.5 Data compilation

Imputation of characteristics

HMRC does not have complete information on age and sex of Income Tax payers. Where no information is held, estimated values are imputed.

Imputation of savings income

The coverage of savings income for the sample drawn from NPS prior to 2018 to 2019 was incomplete. This is because most Income Tax payers with savings income do not report it to HMRC.

Prior to April 2016, banks and building societies deducted tax at a basic rate on interest and paid this to HMRC. Only individuals below the Personal Allowance or above the higher rate threshold needed to report interest to HMRC to ensure that the correct tax was paid.
Post April 2016, most interest income is covered by a combination of the Personal Savings Allowance, the Personal Allowance, and the starting rate for savings and therefore is not liable to Income Tax. Those that do need to pay Income Tax on their savings income do so by contacting HMRC to report their savings income, where this information has not already been provided through Self Assessment.

HMRC collects data on savings income directly from banks and building societies. From 2018 to 2019 onwards, this income data feeds into the NPS system so that HMRC can collect the appropriate tax without Tax payers needing to contact HMRC directly. This income has replaced the previous method which estimated savings interest by imputation from the 2018 to 2019 SPI onwards.

Previous savings imputations (prior to the 2018 to 2019 SPI) followed a similar method to the dividend imputation outlined below, however the targets for the total number of individuals and the total interest received were taken from data provided by the UK banks.

Between the 2016 to 2017 and 2018 to 2019 releases, the claims component was imputed by selecting the claims cases that had no employments from the tax year 2015 to 2016 survey (the last year data was available). The income for these cases was projected using Office for Budget Responsibility determinants in order to estimate the level of income and the tax due in the respective tax year.

Imputation of dividend income

In order to create a full picture of total income for the SPI, it is necessary to impute values of dividend income to some sample cases. For the dividends imputation, the amount for each SPI case:

is known for cases in Self Assessment from the amount declared on the Self Assessment Return
can be inferred or estimated reasonably for NPS cases where there is an adjustment to the tax code for taxpayers
is unknown for NPS cases where there is no coding adjustment

Where no information at case level is available from HMRC administrative systems, estimated values are imputed to cases so that the population as a whole has amounts consistent with evidence from other sources.

Starting from control totals at UK level for the number of cases and total amount of dividends, the Self Assessment and NPS cases with coding adjustments are deducted to leave targets for the remainder of the taxpayer population. These targets are at UK level – no attempt is made to control the targets to sub-UK geographical units. The cases to which amounts are attached by the imputation process and the amounts attached are determined by probabilistic methods with just the UK targets and distributions in mind. For dividend income, the number of non-Self Assessment cases with dividend income and distribution of imputed amounts were inferred from Family Resources Survey data for the relevant tax year. The cases to which amounts are attached by the imputation process and the amounts attached are determined by probabilistic methods with just the UK targets and distributions in mind.

Imputation of pension income

As with dividends income, HMRC does not have complete information about superannuation or personal pension contributions. Pension contributions can be made under 3 types of arrangement, either a net pay scheme, relief at source or salary sacrifice scheme.

HMRC holds information on the value of employee pension contributions paid under “net pay arrangements” in Real Time Information (RTI) submissions by their employer. This data has been used to match SPI cases to “net pay” pension contributions. Pension schemes operating a net pay scheme are occupational pension schemes. However, as some employers operate relief at source or salary sacrifice schemes, contributions to those schemes are not included in the “net pay” figures and thus the ”net pay” figures do not include all “occupational” individual contributions.

HMRC receives information from relief at source schemes on individual contributions via the APSS106 and RPSCOM100(Z). These contributions are made post-tax. Relevant rate relief (equivalent to the rUK basic rate of income tax) is claimed on all individual contributions by scheme providers from HMRC. Individuals with higher marginal tax rates than the relevant rate can claim the additional relief from HMRC via Self Assessment.

The RPSCOM100(Z) is an annual return completed by pension scheme administrators which provides HMRC with information about individual contributions made to relief at source pension accounts in the tax year. It is used to match relief at source pension contributions to cases in the SPI, including basic rate tax relief. Where applicable, information submitted via RTI and Self Assessment returns has been used. Individual contributions recorded in the different data sources have been adjusted, where necessary, to include basic rate tax relief. Some adjustments are made to align total contribution values with the APSS106. The APSS106 is a form used by pension scheme administrators to make annual claims for the recovery of tax deducted from individuals.

A change was made to the classification of pension contributions in the tax year 2017 to 2018, to better reflect their treatment in the tax system. As a result of the methodological improvement, the new pension contributions statistics (net pay and relief at source) aren’t comparable with statistics under the previous classifications (occupational and personal).

The methodology for estimating contributions to relief at source pensions was revised in the 2021 to 2022 tax year so it better aligns with the methodology used in HMRC’s Private Pension statistics. In previous tax years, only information from Self Assessment returns was used to match Self Assessment cases.

In addition, in the 2020 to 2021 tax year only, relief at source pension contributions matched to PAYE cases in the SPI were net of basic rate tax relief. Whereas in previous tax years and the 2021 to 2022 tax year, contributions were gross of basic rate tax relief. It is estimated that including basic rate tax relief would increase total individual relief at source contributions in the 2020 to 2021 SPI by around £1.5 billion to £12.9 billion.

Additionally, the SPI includes contributions made to retirement annuity contracts and contributions made to employer’s schemes not deducted at source.

Employers, individuals and schemes providers are not required to report individual contributions made using salary sacrifice to HMRC. These contributions are deducted from an individual’s gross earnings and added to the contributions made by their employer. Individual contributions made using salary sacrifice arrangements are not included in this publication.

The estimated value for “relief at source” and “net pay” contributions has been combined with other pension reliefs and included in these statistics. For more information on these pensions data sources please refer to the latest methodology document for the Private pension statistics release.

Imputation of Marriage Allowance

HMRC collects data regarding claimants (receivers and transferers) of Marriage Allowance through coding adjustments for those in NPS or via Self Assessment returns. The latest available administrative data is matched to the SPI sample data allowing for the calculation of tax liabilities adjusting for Marriage Allowance.

The SPI sample is not stratified around any subsets of populations including Marriage Allowance claimants, and therefore when grossed up and subset for just Marriage Allowance claimants it does not exactly match the population of claimants separately estimated and published using the collected administrative data. To calibrate to published Marriage Allowance claimants, estimated values are imputed to cases so that the population as a whole has amounts consistent with the evidence from these other sources.

Starting from published estimates at UK level for the number of cases, the Self Assessment and NPS cases with coding adjustments are deducted to leave targets for the remainder of the claimant population. These targets are at UK level – no attempt is made to control the targets to sub-UK geographical units. For Marriage Allowance, the number of eligible claimant cases were inferred from Family Resources Survey data for tax year ending 2021. The cases to which claims are received or transferred are attached by the imputation process to align to the published estimates of take up.

Sampling Framework and Grossing Factors

The sample is drawn from records held on HMRC transactional systems and the available information reflects what is known about the cases approximately one year after the tax year to which the survey relates. Allowance is made for Self Assessment cases yet to file a return and the overlap between the Self Assessment and PAYE systems when estimating the likely final grossed population for the tax year. The SPI data reflects the information held on HMRC systems at the time the sample was drawn, therefore values associated with some cases, particularly in Self Assessment could continue to evolve after the sample is drawn.

Each SPI sample case is drawn from Self Assessment and PAYE systems according to a stratified samping framework and has a grossing factor associated with it and these are used to create estimates of overall numbers of Income Tax payers, total income and total Income Tax liabilities for the entire UK population. Grossing factors vary depending on different factors, for example where the sample case data was sourced from (PAYE or Self Assessment), income type, and where in the income distribution the sample individual sits.

Changes to Self Assessment grossing factors in the tax year 2018 to 2019

In addition, when the sample of Self Assessment returns is drawn, not all of the returns that have been issued have been received. In addition, some returns may be issued after the sample is drawn. The Self Assessment grossing factors include an uplift to account for these returns.

Prior to the tax year 2018 to 2019, the grossing factors for Self Assessment cases included an assumption that the returns received to date for that tax year were representative of the returns still to be received, and so there was a constant component to the grossing factors that was applied across all Self Assessment cases.

For the tax year 2018 to 2019, this assumption was reviewed. This was in response to feedback from key users of the Personal Incomes Statistics, such as the Scottish Fiscal Commission, who found differences in the Scottish Outturn and SPI statistics on additional rate taxpayers. This is discussed in their 2018 Forecast Evaluation Report, (pages 9 to 11). It was found that the population of returns received was not entirely representative of the returns still to be received.

Therefore, the grossing factors for Self Assessment cases were revised in the tax year 2018 to 2019, to better reflect the population of individuals expected to submit their Self Assessment returns after the sample was drawn. These factors were refined based on the latest information available at the time and considered a wider set of characteristics than the previous grossing factors.

The main effects of the grossing factor revisions were a reduction in the estimated number of individuals who submit returns through Self Assessment in the tax year and a small redistribution of individuals by income and source of income. The main groups affected by the change were high-income individuals (those with income over £100,000 are required to submit a Self Assessment tax return) and individuals with income from self-employment, property or dividends. The effects of these changes are discussed in more detail in the Personal Incomes statistical commentary and supporting documentation for the tax year 2018 to 2019.

The Self Assessment grossing factors are regularly reviewed to ensure that the assumed distribution of returns received after the sample is drawn is representative of the distribution of returns actually received after the sample was drawn in recent years.

Changes to the Self Assessment sampling framework in the tax year 2019 to 2020

The sampling framework draws cases from the Self Assessment and PAYE systems stratified by income, with cases drawn at a lower rate at lower incomes and a higher rate at higher incomes. Above a high-income threshold, all Self Assessment cases are drawn into the sample.

The Self Assessment sampling framework was reviewed and refined from the tax year 2019 to 2020 to improve precision at the higher end of the income distribution. The high-income threshold for which cases are sampled with a rate of 1 in 1 was lowered, resulting in an increase of around 50,000 extra sample cases. The change was made in response to feedback from key users of the Personal Incomes Statistics, such as the Scottish Fiscal Commission who have found differences in the Scottish Outturn and SPI statistics on additional rate taxpayers; as detailed in the changes to grossing factors section above.

This change improved precision in the estimates at the higher end of the income distributions. The residual differences between the Outturn and SPI statistics will be due to a combination of methodological differences (as outlined in the supporting documentation for the tax year 2019 to 2020 and the precision of the cases not sampled with a rate of 1 in 1.

Modelling Income Tax liabilities with the Personal Tax Model

Total Income Tax liabilities in the SPI are modelled using the Personal Tax Model (PTM) which uses all income sources in the SPI together to give an individual’s total income assessable for tax.

The PTM is a micro simulation model of the UK Income Tax system. ‘Micro simulation’ refers to modelling with individual level data, in this case using the SPI dataset. For each SPI sample case, the PTM models Income Tax liabilities in a given tax year based on incomes assessable for Income Tax and the main features and parameters of the Income Tax system for that year.

An overview of the PTM modelling process applied to each SPI sample case is provided in Annex B of the Supporting documentation for the relevant year for the Income Tax Liabilities statistics.

Aggregating data

Data are aggregated using the statistical reference number and grossing factor assigned to each sample case.

Changes to the criteria for identifying self-employed individuals in the tax year 2020 to 2021

The criteria for identifying individuals with self-employment income and unknown sources of self-employment income was refined from the tax year 2020 to 2021, to ensure greater consistency across the publication. In Tables 3.9 and 3.10, individuals are identified as having self-employment income if there was evidence of business activity in the tax return. This evidence included more than just profits. If information about profits was missing, both the individuals and the sources are counted in the £0-1 range of self-employment income. The main effect of changes was an increase in the number of sources of self-employment income classified as “Unknown Industries” and a reduction in the number of individuals with self-employment income. The total value of self-employment profits was unchanged but estimates of the composition of total income for individuals with self-employment income did change.

Further information can be found in the supporting documentation for the tax year 2020 to 2021.

Question 4

4.  Quality Management

Accepted Answer

4.1 Quality assurance

All official statistics produced by KAI, must meet the standards in the Code of Practice for Statistics produced by the UK Statistics Authority and all analysts adhere to best practice as set out in the ‘Quality’ pillar.

Analytical quality assurance (QA) describes the arrangements and procedures put in place to ensure analytical outputs are error free and fit-for-purpose. It is an essential part of KAI’s way of working as the complexity of our work and the speed at which we are asked to provide advice means there is a high risk of error, which can have serious consequences on KAI’s and HMRC’s reputation, decisions, and in turn on peoples’ lives.

Every piece of analysis is unique, and as a result there is no single QA checklist that contains all the QA tasks needed for every project. Nonetheless, analysts in KAI use a checklist that summarises the key QA tasks and is used as a starting point for teams when they are considering what QA actions to undertake.

Teams amend and adapt it as they see fit to take account of the level of risk associated with their analysis and the different QA tasks that are relevant to the work.

At the start of a project, during the planning stage, analysts and managers make a risk-based decision on what level of QA is required.

Analysts and managers construct a plan for all the QA tasks that will need to be completed, along with documentation on how each of those tasks are to be carried out, and turn this list into a QA checklist specific to the project.

Analysts carry out the QA tasks, update the checklist, and pass onto the Senior Responsible Officer for review and eventual sign off.

4.2 Quality assessment

The QA for this project adhered to the framework described in ‘4.1 Quality assurance’ and the specific procedures undertaken were as follows:

Stage 1 – Specifying the question

Up to date documentation was agreed with stakeholders setting out outputs needed and by when; how the outputs will be used; and all the parameters required for the analysis.

Stage 2 – Developing the methodology

Methodology was agreed and developed in collaboration with stakeholders and others with relevant expertise, ensuring it was fit for purpose and would deliver the required outputs.

Stage 3 – Building and populating a model/piece of code

Analysis was produced using the most appropriate software and in line with good practice guidance.
Data inputs were checked to ensure they were fit-for-purpose by reviewing available documentation and, where possible, through direct contact with data suppliers.
QA of the input data was carried out.
The analysis was audited by someone other than the lead analyst – checking code and methodology.

Stage 4 – Running and testing the model/code

Results were compared with those produced in previous years and differences understood and determined to be genuine.
Results were compared with comparable independent estimates, and differences understood.
Results were determined to be explainable and in line with expectations.

Stage 5 – Drafting the final output

Checks were completed to ensure internal consistency (e.g. totals equal the sum of the components).
The final outputs were independently proof read and checked.

Question 5

5.  Relevance

Accepted Answer

5.1 User needs

This analysis is likely to be of interest to users under the following broad headings:

national government – policy makers and MPs
regional and local governments
academia and research bodies
media
business community
general public

5.2 User satisfaction

Formal investigations into user satisfaction have not been undertaken, however feedback from users following the release have been received and KAI are always open to ideas for new analysis to meet changing user requirements.

5.3 Completeness

It is a legal requirement that all individuals who are liable to Income Tax either pay the tax due through PAYE or through Self Assessment. Penalties exist for non-compliance.

It is likely that there will be Self Assessment cases yet to file a return after the data are drawn from transactional systems, however, allowances are made to account for late filed returns when estimating the likely final grossed population for the tax year. The statistics contained in this report can therefore be considered as complete. More information on the approach taken can be found in the Supporting Documentation for the Personal Incomes Statistics.

Question 6

6.  Accuracy and reliability

Accepted Answer

6.1 Overall accuracy

These statistics and analyses are based on administrative data and use a sample database that is designed to represent the UK Income Tax paying population. Accuracy is addressed by eliminating non-sampling errors as much as possible through adherence to the quality assurance framework. Moreover, the SPI sampling methodology is constantly reviewed and refined to improve the accuracy and reliability of the sample, and to reduce sampling error.

The key potential sources of error are:

Individuals entering incorrect information on their Self Assessment return or organisations entering incorrect information submitting PAYE information
Individuals not completing their Self Assessment return by the required date
The stratified sampling process used for the SPI, which reflects the skewed distribution of Income Tax liabilities across the UK population. This is described in the Supporting Documentation for the relevant year
The imputation process for missing age, sex and dividend data, described in detail in Annex B: Coverage of the SPI and missing data.
The grossing factors which are used to scale the SPI data to the UK Income Tax paying population (see Annex B: Grossing factors).
Mistakes in the programming code used to analyse the data and produce the statistics.

6.2 Sampling error

These statistics are produced from the annual SPI, the purpose of which is to create a dataset that is representative of the UK Income Tax paying population that can be used to infer the size of that population and the estimated liabilities of all Income Tax payers. As the SPI is a sample and does not include the whole population of Income Tax payers, estimates drawn from the SPI are subject to sampling variation and will differ from the actual figures purely by chance. A stratified random sample is drawn across the NPS and CESA transactional systems. Cases are categorised by income band and other characteristics. Categories involving higher incomes tend to be sampled more intensively to improve the precision in estimates of total income. Confidence Intervals are published for sub-UK estimates.

To quantify the sampling error associated with the statistics presented in this publication, 95% confidence intervals were calculated. A confidence interval is a range of values within which there is reasonable certainty that the true value lies. A 95% confidence interval means that if the population were sampled repeatedly you would expect to get estimates within the range 95% of the time, and that if the entire population were sampled then there is a 95% probability of the true value lying in that range. The 95% confidence intervals are based on standard error calculations; standard error is a type of standard deviation (a measure of variability) and is a measure of the precision of the sample mean.

There are published 95% confidence intervals in Table 3.13a to 3.15a for all estimates of the number of UK Income Tax and total liabilities. Please refer to these tables for the relevant year from the Personal Incomes Statistics.

6.3 Non-sampling error

Coverage error

The coverage of investment income for the sample drawn from NPS is incomplete. In order to create a full picture of total income for this survey, it is necessary to impute values of dividends to some sample cases. Where no information at case level is available from HMRC administrative systems, estimated values are imputed to cases so that the population as a whole has amounts consistent with evidence from other sources.

HMRC does not have complete information about pension contributions. To compile complete estimates for relief at source pensions and total income for the SPI, a significant proportion of the amount of relief at source pension contributions has been estimated using data from external data sources. The estimated value for this and for net pay contributions has been combined with other pension reliefs and included in these statistics.

Model errors

Income Tax liabilities in this publication are estimated at case level with the base SPI data using the PTM. The Income Tax modelling process attempts to capture all significant features of the UK Income Tax system, but inevitably this involves certain simplifications and omissions.

The modelling outputs are regularly benchmarked at case level against the Income Tax liabilities that are recorded as due in HMRC’s Self Assessment system. Differences between the outputs and the SPI sub-population Self Assessment data arise for known and specific reasons and only in a small minority of sample cases. The impact of these simplifications is judged to be small for key aggregates at UK level, and for most UK Income Tax payer sub-populations.

Measurement error

Accuracy of the SPI data are based on information from HMRC systems that are used to administer the income tax system, which ensures that the most accurate picture of declared personal incomes is used for the production of these statistics.

Incorrectly entered data may include abnormally small or large incomes or other factors that may skew the distribution of the data and the overall statistics. To mitigate against this, checks are conducted on the SPI database before the statistics are produced and any incorrectly small or large values detected are altered.

Non-response error

Case level non-response arises primarily because of late filing of tax returns. This is dealt with by estimating the likely size of the final population.

Item level non-response refers to when a sample record has incomplete information for some characteristics. Age, sex and postcode are key variables for published statistics and if these items are missing from sample records, other sources are examined.

For most cases in the PAYE system, it is not necessary to know about dividends or certain pension contributions etc. To create a complete estimate of total income for such cases, an imputation process allocates amounts randomly to cases so that population estimates follow pre-determined distributions.

Processing error

It is possible that errors exist in the programming code used to analyse the data and produce the statistics. This risk is reduced through developing a good understanding of the complexities of Income Tax, and regularly reviewing and testing the programs that are used.

6.4 Data revision

Data revision – policy

As per the United Kingdom Statistics Authority Code of Practice for Official Statistics HMRC has published a policy on revisions.

A summary of HMRC’s policy for different types of revisions is outlined below:

Planned revisions, which usually take place after receipt of expected information or data. Outputs that are subject to scheduled revisions will include an explanation of how these are dealt with.
Unplanned revisions, which can occur when data suppliers missed the original deadline, data was submitted incorrectly, or when errors were made during analysis or processing. In these cases, a judgement will normally be made by the Head of Profession for Statistics as to whether the change is significant enough to publish a “revised” statistical release.
Revisions that occur as a result of changes to data source systems or methodology would be planned and where possible would be conducted in consultation with users.

Data revision – practice

These statistics are published annually and includes an estimate of tax liabilities for the latest available complete tax year. We do not regularly revise statistics for previous years.

6.5 Seasonal adjustment

Seasonal adjustment is not applicable for this analysis.

Question 7

7.  Timeliness and punctuality

Accepted Answer

7.1 Timeliness

The reference period for the Personal Incomes Statistics is the income tax year ending on 5 April. Statistics for the income tax year will normally be published around 23 months after the end of the tax year. The information is drawn from the transactional systems approximately a year after the reference period. This is to allow time for individuals to file their Self Assessment returns and for PAYE reconciliation. It takes approximately a year to turn the raw dataset into information and commentary ready for publication. The reason is due in part to the time required to complete the data validation, complex analysis and quality assurance.

7.2 Punctuality

In accordance with the Code of Practice for official statistics, the exact date of publication will be given no less than one calendar month before publication on both the Schedule of updates for HMRC’s statistics and the Research and statistics calendar of GOV.UK.

The full publication calendar or any delays to publication dates can be found on both the Schedule of updates for HMRC’s statistics and the Research and statistics calendar of GOV.UK.

Question 8

8.  Coherence and comparability

Accepted Answer

8.1 Geographical comparability

Breakdowns of Income Tax payers are available for the UK, country, region, county, local authority district and parliamentary constituency. The statistics also detail non-savings / non-dividend Income Tax for Scotland, Wales and rest of UK; figures are comparable between the geographic areas.

8.2 Comparability over time

Comparability is to some extent determined by the scope of income tax and allowable reliefs which drives the information available from HMRC administrative systems. The supporting documentation highlights any changes made in methodologies such as the calculation of figures that are presented in the data or data issues (if any).

The population is not stratified by geographical area before the SPI sample is selected. Year on year changes in published estimates of taxpayer numbers within small geographical areas (e.g. districts and constituencies) should be viewed with caution. The confidence interval for the difference could be large relative to the measured difference, so any observed change may be due to sampling fluctuation alone. Confidence intervals are published for sub-regional breakdowns.

Comparisons over time may be affected by changes in methodology. Notably, there was a revision to the grossing factors in the 2018 to 2019 publication, which is discussed in the commentary and supporting documentation for that tax year, and the section titled ‘Data compilation’ in this document. Other significant changes include:

From the tax year 2010 to 2011, dividends are no longer imputed to non-Self Assessment cases to represent individuals incorporating as businesses. For more information and an estimate of the impact of the change, please refer to the Personal Incomes Statistics for the tax year 2010 to 2011.
A change was made to the classification of pension contributions in the tax year 2017 to 2018, to better reflect their treatment in the tax system. As a result, the new pension contributions statistics (net pay and relief at source) aren’t comparable with statistics under the previous classifications (occupational and personal). For more information, please refer to the Personal Incomes statistics for the tax year 2017 to 2018 and the ‘Imputation of pension income’ section of this document.
The methodology for estimating contributions to relief at source pensions was revised in the 2021 to 2022 tax year so it better aligns with the methodology used in HMRC’s Private Pension statistics. For more information and an estimate of the impact of the change, please refer to the supporting documentation for the tax year 2021 to 2022 and the ‘Imputation of pension income’ section of this document.
From the tax year 2016 to 2017, interest paid net of tax has been combined with interest paid gross when presented in Table 3.7. Up until the tax year 2015 to 2016, interest paid gross was included in ‘other investment income’ and interest paid net was included in ‘interest from building societies and banks’. From the tax year 2016 to 2017 all interest paid net or gross is included in ‘interest from building societies and banks’. This change was made following the introduction of the Personal Savings Allowance. Further information can be found in the Personal Incomes Statistics for the tax year 2016 to 2017.
From the tax year 2018 to 2019, data on interest from banks and building societies was received through NPS data (as well as Self Assessment data) and no longer estimated through imputation. For more information and an estimate of the impact of the change, please refer to the supporting documentation for the tax year 2018 to 2019.

8.3 Coherence – cross domain

Estimates from the Personal Incomes statistics tables may be compared with some higher level figures from the Income Tax Liability Statistics publication.

Coherence – sub-annual and annual statistics

All statistics are presented as annual outputs. No coherence issues exist.

Coherence – national accounts

This publication shows income and Income Tax liabilities for each tax year. Income Tax liabilities are amounts of Income Tax due on incomes arising in a given tax year, whereas receipts are amounts of Income Tax paid and collected in a given year. The breakdowns of Income Tax liabilities provided in this publication are not available on a receipts basis.

8.4 Coherence – internal

Rounding of numbers may cause some minor internal coherence issues as the figures within a table may not sum to the total displayed. Effort has been made to ensure totals between tables remain constant where appropriate.

For regional and sub-regional breakdowns, individuals are allocated to sub-regions, regions and countries according to their residence based on postcode. Some members of the Forces and Merchant Navy, people serving overseas and people with overseas addresses have not been allocated to a location within UK but have been included in the UK figures. There are also a small number of individuals in the sample where it was not possible to identify their postcode and therefore identify the correct location within UK. These have also been included in the UK figures. Therefore, the regional amounts may not add up to the UK total.

Question 9

9.  Accessibility and clarity

Accepted Answer

9.1 News release

There haven’t been any press releases linked to this data over the past year.

9.2 Publication

The tables and associated commentary are published on the Statistics about Personal Incomes webpage of GOV.UK.

Tables are published in the OpenDocument format, and the associated commentary as an accessible HTML webpage.

Both documents comply with the accessibility regulations set out in the Public Sector Bodies (Websites and Mobile Applications) (No. 2) Accessibility Regulations 2018.

Further information can be found in HMRC’s accessible documents policy.

9.3 Online databases

This analysis is not used in any online databases.

9.4 Micro-data access

SPI Public Use Tape microdata are available to approved researchers on the UK Data Service website.

9.5 Other

There aren’t any other dissemination formats available for this analysis.

9.6 Documentation on methodology

Supporting documentation for each annual statistics release is publicly available to users.

9.7 Quality documentation

All official statistics produced by KAI, must meet the standards in the Code of Practice for Statistics produced by the UK Statistics Authority and all analysts adhere to best practice as set out in the ‘Quality’ pillar.

Information about quality procedures for this analysis can be found in section 4 of this document.

Question 10

10.  Cost and burden

Accepted Answer

Because all necessary data for these statistics is obtained from administrative data sources (NPS and CESA) there is no additional burden on individuals or HMRC tax inspectors to provide information.

It is estimated to take about a year to produce the annual analysis and publication, with input from a small number of analysts across different teams.

Question 11

11.  Confidentiality

Accepted Answer

11.1 Confidentiality – policy

HMRC has a legal duty to maintain the confidentiality of taxpayer information.

Section 18(1) of the Commissioners for Revenue and Customs Act 2005 (CRCA) sets out our duty of confidentiality.

This analysis complies with this requirement.

11.2 Confidentiality – data treatment

The statistics in these tables are presented at an aggregate level so identification of individuals is not possible.

To make sure no individual taxpayers can be identified, statistical disclosure control (SDC) is applied to cells within tables. SDC is the application of methods to ensure confidential data is not disclosed to parties who don’t have authority to access it.

SDC modifies data so that the risk of data subjects being identified is within acceptable limits while making the data as useful as possible.

Disclosure in this analysis is avoided by applying rules that prevent categories of data containing:

small numbers of contributors, and
small numbers of contributors that are very dominant

If a cell within a table is determined to be disclosive, its contents are suppressed either by removing the data or combining categories. Further information on anonymisation and data confidentiality best practice can be found on the Government Statistical Service’s website.

Our statistical practice is regulated by the Office for Statistics Regulation (OSR). OSR sets the standards of trustworthiness, quality and value in the Code of Practice for Statistics that all producers of official statistics should adhere to. You are welcome to contact us directly with any comments about how we meet these standards by emailing spi.enquiries@hmrc.gov.uk. Alternatively, you can contact OSR by emailing regulation@statistics.gov.uk or via the OSR website.

The Personal Incomes Statistics were independently reviewed by the Office for Statistics Regulation in October 2020. They comply with the standards of trustworthiness, quality and value in the Code of Practice for Statistics and should be labelled ‘accredited official statistics’. Accredited official statistics are called National Statistics in the Statistics and Registration Service Act 2007.

Cookies on GOV.UK

1. Contact

2. Statistical presentation

2.1 Data description

2.2 Classification system

2.3 Sector coverage

2.4 Statistical concepts and definitions

Income Tax

Tax year

Taxpayer

Total income

Taxable income

Income Tax liabilities

Personal Allowance

Personal Savings Allowance

Dividend Allowance

Pay As You Earn (PAYE)

Self Assessment (SA)

Industry

Geographical Areas

2.5 Statistical unit

2.6 Statistical population

2.7 Reference area

2.8 Time coverage

3. Statistical processing

3.1 Source data

3.2 Frequency of data collection

3.3 Data collection

3.4 Data validation

3.5 Data compilation

Imputation of characteristics

Imputation of savings income

Imputation of dividend income

Imputation of pension income

Imputation of Marriage Allowance

Sampling Framework and Grossing Factors

Changes to Self Assessment grossing factors in the tax year 2018 to 2019

Changes to the Self Assessment sampling framework in the tax year 2019 to 2020

Modelling Income Tax liabilities with the Personal Tax Model

Aggregating data

Changes to the criteria for identifying self-employed individuals in the tax year 2020 to 2021

4. Quality Management

4.1 Quality assurance

4.2 Quality assessment

Stage 1 – Specifying the question

Stage 2 – Developing the methodology

Stage 3 – Building and populating a model/piece of code

Stage 4 – Running and testing the model/code

Stage 5 – Drafting the final output

5. Relevance

5.1 User needs

5.2 User satisfaction

5.3 Completeness

6. Accuracy and reliability

6.1 Overall accuracy

6.2 Sampling error

6.3 Non-sampling error

Coverage error

Model errors

Measurement error

Non-response error

Processing error

6.4 Data revision

Data revision – policy

Data revision – practice

6.5 Seasonal adjustment

7. Timeliness and punctuality

7.1 Timeliness

7.2 Punctuality

8. Coherence and comparability

8.1 Geographical comparability

8.2 Comparability over time

8.3 Coherence – cross domain

Coherence – sub-annual and annual statistics

Coherence – national accounts

8.4 Coherence – internal

9. Accessibility and clarity

9.1 News release

9.2 Publication

9.3 Online databases