Quality report: Personal Income Statistics release from tax year 2020 to 2021

Q: 1. Contact

Organisation unit - Knowledge, Analysis and Intelligence ( KAI ) Name – N Anderson Function - Statistician, Personal Taxes Mail address - Three New Bailey, New Bailey Square, Salford, Manchester, M3 5FS Email - spi.enquiries@hmrc.gov.uk

Question 1

1. Contact

Accepted Answer

Organisation unit - Knowledge, Analysis and Intelligence (KAI)
Name – N Anderson
Function - Statistician, Personal Taxes
Mail address - Three New Bailey, New Bailey Square, Salford, Manchester, M3 5FS
Email - spi.enquiries@hmrc.gov.uk

Question 2

2. Statistical presentation

Accepted Answer

2.1 Data description

Statistics about personal incomes are assessed using the annual Survey of Personal Incomes (SPI). This publication provides detailed statistics about individuals liable to UK Income Tax and their incomes for the tax year with the most recent outturn data.

2.2 Classification system

The SPI is carried out annually and is based on information held by HMRC on the income assessable for Income Tax for individuals who could be liable to UK Income Tax in a given tax year. Published breakdowns of the number of taxpayers, income, tax liabilities, allowances and deductions are determined based on data submitted by the individual in their Self Assessment Return or by their employer in the PAYE data.

A unique Income Tax payer reference assigned to each individual is used to aggregate the data.

2.3 Sector coverage

Income Tax is an annual tax paid on most sources of income including pay from employment, profits from self-employment, private and occupational pensions, retirement annuities, state retirement pensions, foreign income, income from property, taxable social security income, savings income, income from shares (dividends) and income from trusts. Employees who receive non-cash benefits from their employers such as company cars, fuel, medical insurance, living accommodation or loans also pay Income Tax on these benefits.

Adding all these sources together will give an individual’s total income assessable for tax, an aggregate that appears in several tables in this publication. Some sources of income are not liable for Income Tax including certain social security benefits, Child Tax Credit and Working Tax Credit, and income from tax exempt savings accounts (such as Individual Savings Accounts (ISAs) and some National Savings & Investment products). Most people in the UK get a Personal Allowance which is the amount of income on which no tax will be charged. Some people are also eligible for tax reliefs.

All individuals who could be liable to UK Income Tax are covered by this publication.

2.4 Statistical concepts and definitions

Income Tax

Once tax-free allowances have been taken into account, Income Tax due is calculated using different Income Tax rates for specific types of income across a series of Income Tax bands. There are 3 different sources of income for Income Tax purposes:

earnings, or income other than savings and dividends, also known as non-savings non-dividends (NSND) income (for more information please see section 14 of the Supporting Documentation for the relevant year)
savings income (such as bank and building society interest)
dividends (such as income from shares in UK companies)

Income tax powers over earned income have been devolved to Scotland and Wales. In the rest of the UK (rUK), earned income is taxed within three main bands of Income Tax rates: the basic rate, the higher rate and the additional rate. In Wales, earned income is taxed within the same bands as the rest of the UK but the rates can be different. In Scotland, earned income is taxed within 5 main bands of rates: the starting rate, basic rate, intermediate rate, higher rate and additional rate. Savings and dividend income are taxed on same basis across the whole of the UK, with the bands aligned to the rUK bands for earned income but the rates for dividends can be different. Some basic rate taxpayers are also eligible for a starting rate for savings.

Income Tax typically works on a ‘stack’ basis. This means that earnings are generally taxed first, then savings income and finally dividend income. This means that if an individual has earnings after allowances sufficient to completely fill the basic rate Income Tax band, all savings or dividend income would be charged at the higher or additional rates of tax.

Further detail on how Income Tax liabilities are calculated is provided in Annex A of the Supporting Documentation, and a full list of definitions of terms used in this publication can be found Glossary of Terms section in the Supporting Documentation).

Tax year

The statistics are aggregated into tax years. A tax year stretches from 6 April until 5 April the following calendar year.

Taxpayer

An individual calculated to have a positive Income Tax liability for the tax year, based on the income, allowances, reliefs and deductions for the year.

Total income

The sum of an individual’s components of income taken into account in calculating Income Tax. This includes earnings from employment, profits from self-employment, pension income, some social security benefits, savings income, income from shares (dividends), rental income, and income paid from trusts. It excludes:

gains from the disposal of assets that are classified as capital gains
interest, dividends or bonuses from tax exempt investments (for example, ISAs and National Savings & Investments Savings Certificates)
interest and terminal bonuses from Save As You Earn Schemes
Premium Bond, National Lottery and gambling prize winnings

Total income is calculated before relief for contributions to occupational and personal pensions, other deductions and reliefs or personal allowances.

In the tax system, income is streamed into three main categories: dividends; savings income (not dividends); and non-savings income as different rules apply.

Taxable income

Income assessable to Income Tax after allowances.

Income Tax liabilities

The amount of Income Tax due on taxable income after applying tax rates to the tax base. The Income Tax liability for each sample case in the SPI is calculated by reference to the amounts of income by type, deductions and reliefs and the tax regime parameters that apply for the year. The calculated liability for a tax year will differ from the amount of Income Tax receipts collected in that tax year.

Personal Allowance

The amount of income you can receive for the tax year without having to pay tax on it.

Personal Savings Allowance

The amount of savings income you can receive for the tax year without having to pay tax on it.

Dividend Allowance

The amount of dividend income you can receive for the tax year without having to pay tax on it.

Pay As You Earn (PAYE)

PAYE is the system used by HMRC to collect and account for Income Tax on earnings from employment and pensions. Income Tax and National Insurance Contributions are deducted by the employer and paid over to HMRC on behalf of the individual for each pay period.

Self Assessment (SA)

SA is a system where an individual declares their income and can calculate their own Income Tax due after the end of the tax year. Taxpayers included in SA can be higher earners, self-employed and taxpayers with complex tax affairs.

Industry

Industry categories are based on UK Standard Industrial Classification of Economic Activities 2007 (SIC2007). Income from self-employment (sole trade and partner) is assigned an industry using the business text descriptions supplied on Self Assessment returns.

Geographical Areas

Some tables present information for sub-UK areas described as Government Office Region, County, District and Parliamentary Constituency. Administrative and Political geographical areas are not held on taxpayers’ records. For the SPI, the areas are attached by matching the individual’s postcode to the Office for National Statistics Postcode Directory.

2.5 Statistical unit

The unit in the statistics release are Income Tax payers in the UK, with the exception of tables 3.9 and 3.10 which cover all individuals with income from self-employment.

2.6 Statistical population

All individuals liable to pay Income Tax in the UK. Individuals who may have income but are not liable for Income Tax are excluded; this may occur if the individual has no Income Tax liability due to their deductions, reliefs and Personal Allowances exceeding their total income or if their income is below their Personal Allowance.

2.7 Reference area

The geographic region covered by the data is the United Kingdom (UK).

2.8 Time coverage

The statistics cover the time period from the tax year 1999 to 2000 until the latest tax year available. Note that data for 2008 to 2009 is not available.

Question 3

3. Statistical processing

Accepted Answer

3.1 Source data

The SPI is carried out annually and is based on information held by HMRC on the income assessable for Income Tax for individuals who could be liable to UK Income Tax in a given tax year.

The SPI data is currently sampled from two HMRC operational computer systems:

the National Insurance and PAYE Service (NPS) system covers all employees and occupational pension recipients with a Pay As You Earn (PAYE) record
The Computerised Environment for Self Assessment (CESA) system covers people with self-employment, rental or untaxed investment income. It also covers those with higher incomes and other people with complex tax affairs. Where people have both NPS and CESA records, their CESA record is selected because it provides a more complete picture of their taxable income.

The SPI previously also sampled from another HMRC operational computer system (but this is no longer the case from the tax year 2019 to 2020):

the Claims system includes individuals without NPS or CESA records that have had too much Income Tax deducted at source and claim a repayment. For operational efficiencies, R40 Claim forms were migrated onto NPS. The last survey that samples were drawn from the Claims system was the 2015 to 2016 SPI.

Additional data about the sample cases are also collected from other HMRC administrative systems, as follows:

The PAYE Real Time Information (RTI) system covers submissions from employers and pension providers to report Income Tax and National Insurance contributions before they pay wages or pensions to employees and pensioners. This has been used as the source of ‘net pay’ pension contributions in the SPI.

3.2 Frequency of data collection

The SPI dataset is compiled on an annual basis and is usually available around 23 months after the end of the tax year. The raw data is drawn from HMRC’s systems approximately one year after the tax year end and it takes about a year to process, analyse and produce the SPI publication.

3.3 Data collection

Separate samples are drawn from each of the administrative systems and different sampling strategies are used for each, which reflect the skewed distribution of Income Tax liabilities. The samples are structured as follows:

the PAYE population from NPS is stratified by age, sex and the sum of pay plus occupational pension income for the previous tax year. Where the previous year’s income is not available, cases are stratified by sex and by whether they are a higher rate or additional rate taxpayer for the current tax year based on information available at the time the sample was drawn. Approximately 400,000 individuals in the SPI sample are selected from NPS. See the Supporting Documentation for the relevant year for further information on the sample counts and the associated sampling rates.
for the Self Assessment population from CESA, the main source of income (self-employment or employment/occupational pension) and ranges of income and tax are used to stratify the sample. Approximately 450,000 individuals in the SPI sample are selected from CESA. See the Supporting Documentation for the relevant year for further information on the sample counts and the associated sampling rates.
from the tax year 2019 to 2020 onwards, a separate sample for the claims population is no longer included. The majority of claims cases were non-taxpayers and therefore, excluded from the statistical tables in this publication.

Some individuals with a PAYE record are also in the SA system. These individuals are excluded from the PAYE population prior to sampling, as their SA record provides a more complete picture of their taxable income.

Once data are collected for the constituent parts of the sample, the datasets are joined together.

The sampling strategies described above intentionally yield large sub-samples of SPI cases with very high incomes and subsequently account for a large proportion of total Income Tax liabilities. This increases the precision of estimates of liabilities and taxable incomes drawn from the SPI. After allowing for non-response and for records that failed data validation tests, the SPI contains 800,000 to 900,000 records, representing approximately 1.5% of individuals in contact with HMRC. For exact population counts, please refer to the Supporting Documentation for the relevant year.

3.4 Data validation

Checks carried out on the SPI include:

Automated checks take place when loading the data into the analysis database.
Analysts check that the number of records loaded into the analysis database is as expected.
Data validation checks are performed on date of birth, sex, income and postcode. Data are checked against other internal systems and in some cases are validated manually.
Analysts check any outliers in the data which are then examined on a case-by-case basis. Outlier checks sometimes result in adjustments to the dataset as required to improve accuracy or prevent skewing.

Any large changes in the number of taxpayers, income or Income Tax liability figures from one statistical release to the next are investigated.

3.5 Data compilation

Imputation of characteristics

HMRC does not have complete information on age and sex of Income Tax payers. Where no information is held, estimated values are imputed.

Imputation of savings income

The coverage of savings income for the sample drawn from NPS prior to 2018 to 2019 was incomplete. This is because most Income Tax payers with savings income do not report it to HMRC.

Prior to April 2016, banks and building societies deducted tax at a basic rate on interest and paid this to HMRC. Only individuals below the Personal Allowance or above the higher rate threshold needed to report interest to HMRC to ensure that the correct tax was paid.
Post April 2016, most interest income is covered by a combination of the Personal Savings Allowance, the Personal Allowance, and the starting rate for savings and therefore is not liable to Income Tax. Those that do need to pay Income Tax on their savings income do so by contacting HMRC to report their savings income, where this information has not already been provided through Self Assessment.

HMRC collects data on savings income directly from banks and building societies. From 2018 to 2019 onwards, this income data feeds into the NPS system so that HMRC can collect the appropriate tax without Tax payers needing to contact HMRC directly. This income has replaced the previous method which estimated savings interest by imputation from the 2018 to 2019 SPI onwards.

Previous savings imputations (prior to the 2018 to 2019 SPI) followed a similar method to the dividend imputation outlined below, however the targets for the total number of individuals and the total interest received were taken from data provided by the UK banks.

Between the 2016 to 2017 and 2018 to 2019 releases, the claims component was imputed by selecting the claims cases that had no employments from the tax year 2015 to 2016 survey (the last year data was available). The income for these cases was projected using Office for Budget Responsibility determinants in order to estimate the level of income and the tax due in the respective tax year.

Imputation of dividend income

In order to create a full picture of total income for the SPI, it is necessary to impute values of dividend income to some sample cases. For the dividends imputation, the amount for each SPI case:

is known for cases in Self Assessment from the amount declared on the Self Assessment Return
can be inferred or estimated reasonably for NPS cases where there is an adjustment to the tax code for taxpayers
is unknown for NPS cases where there is no coding adjustment

Where no information at case level is available from HMRC administrative systems, estimated values are imputed to cases so that the population as a whole has amounts consistent with evidence from other sources.

Starting from control totals at UK level for the number of cases and total amount of dividends, the Self Assessment and NPS cases with coding adjustments are deducted to leave targets for the remainder of the taxpayer population. These targets are at UK level – no attempt is made to control the targets to sub-UK geographical units. The cases to which amounts are attached by the imputation process and the amounts attached are determined by probabilistic methods with just the UK targets and distributions in mind. For dividend income, the number of non-Self Assessment cases with dividend income and distribution of imputed amounts were inferred from Family Resources Survey data for the relevant tax year.

Imputation of pension income

As with dividends income, HMRC does not have complete information about superannuation or personal pension contributions. Pension contributions can be made under 3 types of arrangement, either a net pay scheme, relief at source or salary sacrifice scheme.

HMRC holds information on the value of employee pension contributions paid under “net pay arrangements” in Real Time Information (RTI) submissions by their employer. This data has been used to match SPI cases to “net pay” pension contributions. Pension schemes operating a net pay scheme are occupational pension schemes. However, as some employers operate relief at source or salary sacrifice schemes, contributions to those schemes are not included in the “net pay” figures and thus the ”net pay” figures do not include all “occupational” individual contributions.

HMRC receives information from relief at source (RAS) schemes on individual contributions via the APSS106 and RPSCOM100(Z). These contributions are made post-tax, relevant rate relief (equivalent to the rUK basic rate of income tax) is claimed on all individual contributions by scheme providers from HMRC. Individuals with higher marginal tax rates than the relevant rate can claim the additional relief from HMRC via self-assessment.

The APSS106 and RPSCOM100(Z) have been used to match PAYE cases in the SPI to “RAS” pension contributions – net of any relief claimed. For SA cases, this has been taken from the information submitted via Self Assessment returns, which is gross of basic rate tax. Additionally, the SPI includes contributions made to retirement annuity contracts and contributions made to employer’s schemes not deducted at source.

Employers, individuals and schemes providers are not required to report individual contributions made using salary sacrifice to HMRC. These contributions are deducted from an individual’s gross earnings and added to the contributions made by their employer. Individual contributions made using salary sacrifice arrangements are not included in this publication.

The estimated value for “RAS” and for “net pay” contributions has been combined with other pension reliefs and included in these statistics. For more info on these pensions data sources please refer to the latest methodology document for the Private pension statistics release

Imputation of Marriage Allowance

HMRC collects data regarding claimants (receivers and transferers) of Marriage Allowance through coding adjustments for those in NPS or via Self Assessment returns. The latest available administrative data is matched to the SPI sample data allowing for the calculation of tax liabilities adjusting for Marriage Allowance.

The SPI sample is not stratified around any subsets of populations including Marriage Allowance claimants, and therefore when grossed up and subset for just Marriage Allowance claimants it does not exactly match the population of claimants separately estimated and published using the collected administrative data. To calibrate to published Marriage Allowance claimants, estimated values are imputed to cases so that the population as a whole has amounts consistent with the evidence from these other sources. Starting from published estimates at UK level for the number of cases, the Self Assessment and NPS cases with coding adjustments are deducted to leave targets for the remainder of the claimant population. These targets are at UK level – no attempt is made to control the targets to sub-UK geographical units. For Marriage Allowance, the number of eligible claimant cases were inferred from Family Resources Survey data for tax year ending 2021. The cases to which claims are received or transferred are attached by the imputation process to align to the published estimates of take up.

Grossing

The sample is drawn from records held on HMRC transactional systems and the available information reflects what is known about the cases approximately one year after the tax year to which the survey relates. Allowance is made for Self Assessment cases yet to file a return and the overlap between the Self Assessment and PAYE systems when estimating the likely final grossed population for the tax year. The SPI data reflects the information held on HMRC systems at the time the sample was drawn, therefore values associated with some cases, particularly in Self Assessment could continue to evolve after the survey is completed.

Each SPI sample case has a grossing factor associated with it and these are used to create estimates of overall numbers of Income Tax payers, total income and total Income Tax liabilities for the entire UK population. Grossing factors vary depending on different factors, for example where the sample case data was sourced from (PAYE or Self Assessment), income type, and where in the income distribution the sample individual sits.

Modelling Income Tax liabilities with the Personal Tax Model

Total Income Tax liabilities in the SPI are modelled using the Personal Tax Model (PTM) which uses all income sources in the SPI together to give an individual’s total income assessable for tax.

The PTM is a micro simulation model of the UK Income Tax system. ‘Micro simulation’ refers to modelling with individual level data, in this case using the SPI dataset. For each SPI sample case, the PTM models Income Tax liabilities in a given tax year based on incomes assessable for Income Tax and the main features and parameters of the Income Tax system for that year.

An overview of the PTM modelling process applied to each SPI sample case is provided in Annex B of the Supporting documentation for the relevant year for the Income Tax Liabilities statistics.

Aggregating data

Data are aggregated using the statistical reference number and grossing factor assigned to each sample case.

Question 4

4. Quality Management

Accepted Answer

4.1 Quality assurance

All official statistics produced by KAI, must meet the standards in the Code of Practice for Statistics produced by the UK Statistics Authority and all analysts adhere to best practice as set out in the ‘Quality’ pillar.

Analytical quality assurance (QA) describes the arrangements and procedures put in place to ensure analytical outputs are error free and fit-for-purpose. It is an essential part of KAI’s way of working as the complexity of our work and the speed at which we are asked to provide advice means there is a high risk of error, which can have serious consequences on KAI’s and HMRC’s reputation, decisions, and in turn on peoples’ lives.

Every piece of analysis is unique, and as a result there is no single QA checklist that contains all the QA tasks needed for every project. Nonetheless, analysts in KAI use a checklist that summarises the key QA tasks and is used as a starting point for teams when they are considering what QA actions to undertake.

Teams amend and adapt it as they see fit to take account of the level of risk associated with their analysis and the different QA tasks that are relevant to the work.

At the start of a project, during the planning stage, analysts and managers make a risk-based decision on what level of QA is required.

Analysts and managers construct a plan for all the QA tasks that will need to be completed, along with documentation on how each of those tasks are to be carried out, and turn this list into a QA checklist specific to the project.

Analysts carry out the QA tasks, update the checklist, and pass onto the Senior Responsible Officer for review and eventual sign off.

4.2 Quality assessment

The QA for this project adhered to the framework described in ‘4.1 Quality assurance’ and the specific procedures undertaken were as follows:

Stage 1 – Specifying the question

Up to date documentation was agreed with stakeholders setting out outputs needed and by when; how the outputs will be used; and all the parameters required for the analysis.

Stage 2 – Developing the methodology

Methodology was agreed and developed in collaboration with stakeholders and others with relevant expertise, ensuring it was fit for purpose and would deliver the required outputs.

Stage 3 – Building and populating a model/piece of code

Analysis was produced using the most appropriate software and in line with good practice guidance.
Data inputs were checked to ensure they were fit-for-purpose by reviewing available documentation and, where possible, through direct contact with data suppliers.
QA of the input data was carried out.
The analysis was audited by someone other than the lead analyst – checking code and methodology.

Stage 4 – Running and testing the model/code

Results were compared with those produced in previous years and differences understood and determined to be genuine.
Results were compared with comparable independent estimates, and differences understood.
Results were determined to be explainable and in line with expectations.

Stage 5 – Drafting the final output

Checks were completed to ensure internal consistency (e.g. totals equal the sum of the components).
The final outputs were independently proof read and checked.

Question 5

5. Relevance

Accepted Answer

5.1 User needs

This analysis is likely to be of interest to users under the following broad headings:

national government – policy makers and MPs
regional and local governments
academia and research bodies
media
business community
general public

5.2 User satisfaction

Formal investigations into user satisfaction have not been undertaken, however feedback from users following the release have been received and KAI are always open to ideas for new analysis to meet changing user requirements.

5.3 Completeness

It is a legal requirement that all individuals who are liable to Income Tax either pay the tax due through PAYE or through Self Assessment. Penalties exist for non-compliance.

It is likely that there will be Self Assessment cases yet to file a return after the data are drawn from transactional systems, however, allowances are made to account for late filed returns when estimating the likely final grossed population for the tax year. The statistics contained in this report can therefore be considered as complete. More information on the approach taken can be found in the Supporting Documentation for the Personal Income statistics.

Question 6

6. Accuracy and reliability

Accepted Answer

6.1 Overall accuracy

These statistics and analyses are based on administrative data and use a sample database that is designed to represent the UK Income Tax paying population. Accuracy is addressed by eliminating non-sampling errors as much as possible through adherence to the quality assurance framework. Moreover, the SPI sampling methodology is constantly reviewed and refined to improve the accuracy and reliability of the sample, and to reduce sampling error.

The key potential sources of error are:

Individuals entering incorrect information on their Self Assessment return or organisations entering incorrect information submitting PAYE information
Individuals not completing their Self Assessment return by the required date
The stratified sampling process used for the SPI, which reflects the skewed distribution of Income Tax liabilities across the UK population. This is described in sections 4 and 10 of the Supporting Documentation for the relevant year
The imputation process for missing age, sex and dividend data, described in detail in Annex B: Coverage of the SPI and missing data.
The grossing factors which are used to scale the SPI data to the UK Income Tax paying population (see Annex B: Grossing factors.
Mistakes in the programming code used to analyse the data and produce the statistics.

6.2 Sampling error

These statistics are produced from the annual SPI, the purpose of which is to create a dataset that is representative of the UK Income Tax paying population that can be used to infer the size of that population and the estimated liabilities of all Income Tax payers. As the SPI is a sample and does not include the whole population of Income Tax payers, estimates drawn from the SPI are subject to sampling variation and will differ from the actual figures purely by chance. A stratified random sample is drawn across the NPS and CESA transactional systems. Cases are categorised by income band and other characteristics. Categories involving higher incomes tend to be sampled more intensively to improve the precision in estimates of total income. Confidence Intervals are published for sub-UK estimates.

To quantify the sampling error associated with the statistics presented in this publication, 95% confidence intervals were calculated. A confidence interval is a range of values within which there is reasonable certainty that the true value lies. A 95% confidence interval means that if the population were sampled repeatedly you would expect to get estimates within the range 95% of the time, and that if the entire population were sampled then there is a 95% probability of the true value lying in that range. The 95% confidence intervals are based on standard error calculations; standard error is a type of standard deviation (a measure of variability) and is a measure of the precision of the sample mean.

There are published 95% confidence intervals in Table 3.13a to 3.15a for all estimates of the number of UK Income Tax and total liabilities. Please refer to these tables for the relevant year from the Personal Income Statistics.

6.3 Non-sampling error

Coverage error

The coverage of investment income for the sample drawn from NPS is incomplete. In order to create a full picture of total income for this survey, it is necessary to impute values of dividends to some sample cases. Where no information at case level is available from HMRC administrative systems, estimated values are imputed to cases so that the population as a whole has amounts consistent with evidence from other sources.

HMRC does not have complete information about pension contributions. To compile complete estimates for relief at source pensions and total income for the SPI, a significant proportion of the amount of relief at source pension contributions has been estimated using data from external data sources. The estimated value for this and for net pay contributions has been combined with other pension reliefs and included in these statistics.

Model errors

Income Tax liabilities in this publication are estimated at case level with the base SPI data using the PTM. The Income Tax modelling process attempts to capture all significant features of the UK Income Tax system, but inevitably this involves certain simplifications and omissions.

The modelling outputs are regularly benchmarked at case level against the Income Tax liabilities that are recorded as due in HMRC’s Self Assessment system. Differences between the outputs and the SPI sub-population Self Assessment data arise for known and specific reasons and only in a small minority of sample cases. The impact of these simplifications is judged to be small for key aggregates at UK level, and for most UK Income Tax payer sub-populations.

Measurement error

Accuracy of the SPI data are based on information from HMRC systems that are used to administer the income tax system, which ensures that the most accurate picture of declared personal incomes is used for the production of these statistics.

Incorrectly entered data may include abnormally small or large incomes or other factors that may skew the distribution of the data and the overall statistics. To mitigate against this, checks are conducted on the SPI database before the statistics are produced and any incorrectly small or large values detected are altered.

Non-response error

Case level non-response arises primarily because of late filing of tax returns. This is dealt with by estimating the likely size of the final population.

Item level non-response refers to when a sample record has incomplete information for some characteristics. Age, sex and postcode are key variables for published statistics and if these items are missing from sample records, others sources are examined.

For most cases in the PAYE system, it is not necessary to know about dividends or certain pension contributions etc. To create a complete estimate of total income for such cases, an imputation process allocates amounts randomly to cases so that population estimates follow pre-determined distributions.

Processing error

It is possible that errors exist in the programming code used to analyse the data and produce the statistics. This risk is reduced through developing a good understanding of the complexities of Income Tax, and regularly reviewing and testing the programs that are used.

6.4 Data revision

Data revision – policy

As per the United Kingdom Statistics Authority Code of Practice for Official Statistics HMRC has published a policy on revisions.

A summary of HMRC’s policy for different types of revisions is outlined below:

Planned revisions, which usually take place after receipt of expected information or data. Outputs that are subject to scheduled revisions will include an explanation of how these are dealt with.
Unplanned revisions, which can occur when data suppliers missed the original deadline, data was submitted incorrectly, or when errors were made during analysis or processing. In these cases a judgement will normally be made by the Head of Profession for Statistics as to whether the change is significant enough to publish a “revised” statistical release.
Revisions that occur as a result of changes to data source systems or methodology would be planned and where possible would be conducted in consultation with users.

Data revision – practice

These statistics are published annually and includes an estimate of tax liabilities for the latest available complete tax year. We do not regularly revise statistics for previous years

6.5 Seasonal adjustment

Seasonal adjustment is not applicable for this analysis.

Question 7

7. Timeliness and punctuality

Accepted Answer

7.1 Timeliness

The reference period for the Personal Income statistics is the income tax year ending on 5 April. Statistics for the income tax year will normally be published around 23 months after the end of the tax year. The information is drawn from the transactional systems approximately a year after the reference period. This is to allow time for individuals to file their Self-Assessment returns and for PAYE reconciliation. It takes approximately a year to turn the raw dataset into information and commentary ready for publication. The reason is due in part to the time required to complete the data validation, complex analysis and quality assurance.

7.2 Punctuality

In accordance with the Code of Practice for official statistics, the exact date of publication will be given no less than one calendar month before publication on both the Schedule of updates for HMRC’s statistics and the Research and statistics calendar of GOV.UK.

The full publication calendar or any delays to publication dates can be found on both the Schedule of updates for HMRC’s statistics and the Research and statistics calendar of GOV.UK.

Question 8

8. Coherence and comparability

Accepted Answer

8.1 Geographical comparability

Breakdowns of Income Tax payers are available for the UK, country, region, county, local authority district and parliamentary constituency. The statistics also detail non-savings / non-dividend Income Tax for Scotland, Wales and rest of UK; figures are comparable between the geographic areas.

8.2 Comparability over time

Comparability is to some extent determined by the scope of income tax and allowable reliefs which drives the information available from HMRC administrative systems. The supporting documentation highlights any changes made in methodologies such as the calculation of figures that are presented in the data or data issues (if any).

The population is not stratified by geographical area before the SPI sample is selected. Year on year changes in published estimates of taxpayer numbers within small geographical areas (e.g. districts and constituencies) should be viewed with caution. The confidence interval for the difference could be large relative to the measured difference, so any observed change may be due to sampling fluctuation alone. Confidence intervals are published for sub-regional breakdowns.

8.3 Coherence – cross domain

Estimates from the Personal Incomes statistics tables may be compared with some higher level figures from the Income Tax Liability Statistics publication.

Coherence – sub-annual and annual statistics

All statistics are presented as annual outputs. No coherence issues exist.

Coherence – national accounts

This publication shows income and Income Tax liabilities for each tax year. Income Tax liabilities are amounts of Income Tax due on incomes arising in a given tax year, whereas receipts are amounts of Income Tax paid and collected in a given year. The breakdowns of Income Tax liabilities provided in this publication are not available on a receipts basis.

8.4 Coherence – internal

Rounding of numbers may cause some minor internal coherence issues as the figures within a table may not sum to the total displayed. Effort has been made to ensure totals between tables remain constant where appropriate.

Question 9

9. Accessibility and clarity

Accepted Answer

9.1 News release

There haven’t been any press releases linked to this data over the past year.

9.2 Publication

The tables and associated commentary are published on the Statistics about Personal Incomes webpage of GOV.UK.

Tables are published in the OpenDocument format, and the associated commentary as an accessible HTML webpage.

Both documents comply with the accessibility regulations set out in the Public Sector Bodies (Websites and Mobile Applications) (No. 2) Accessibility Regulations 2018.

Further information can be found in HMRC’s accessible documents policy.

9.3 Online databases

This analysis is not used in any online databases.

9.4 Micro-data access

SPI Public Use Tape microdata are available to approved researchers on the UK Data Service website.

9.5 Other

There aren’t any other dissemination formats available for this analysis.

9.6 Documentation on methodology

Supporting documentation for each annual statistics release is publicly available to users.

9.7 Quality documentation

All official statistics produced by KAI, must meet the standards in the Code of Practice for Statistics produced by the UK Statistics Authority and all analysts adhere to best practice as set out in the ‘Quality’ pillar.

Information about quality procedures for this analysis can be found in section 4 of this document.

Question 10

10. Cost and burden

Accepted Answer

Because all necessary data for these statistics is obtained from administrative data sources (NPS and CESA) there is no additional burden on individuals or HMRC tax inspectors to provide information.

It is estimated to take about a year to produce the annual analysis and publication, with input from a small number of analysts across different teams.

Question 11

11. Confidentiality

Accepted Answer

11.1 Confidentiality – policy

HMRC has a legal duty to maintain the confidentiality of taxpayer information.

Section 18(1) of the Commissioners for Revenue and Customs Act 2005 (CRCA) sets out our duty of confidentiality.

This analysis complies with this requirement.

11.2 Confidentiality – data treatment

The statistics in these tables are presented at an aggregate level so identification of individuals is not possible.

To make sure no individual taxpayers can be identified, statistical disclosure control (SDC) is applied to cells within tables. SDC is the application of methods to ensure confidential data is not disclosed to parties who don’t have authority to access it.

SDC modifies data so that the risk of data subjects being identified is within acceptable limits while making the data as useful as possible.

Disclosure in this analysis is avoided by applying rules that prevent categories of data containing:

small numbers of contributors, and
small numbers of contributors that are very dominant

If a cell within a table is determined to be disclosive, its contents are suppressed either by removing the data or combining categories.

Further information on anonymisation and data confidentiality best practice can be found on the Government Statistical Service’s website.

Cookies on GOV.UK

1. Contact

2. Statistical presentation

2.1 Data description

2.2 Classification system

2.3 Sector coverage

2.4 Statistical concepts and definitions

Income Tax

Tax year

Taxpayer

Total income

Taxable income

Income Tax liabilities

Personal Allowance

Personal Savings Allowance

Dividend Allowance

Pay As You Earn (PAYE)

Self Assessment (SA)

Industry

Geographical Areas

2.5 Statistical unit

2.6 Statistical population

2.7 Reference area

2.8 Time coverage

3. Statistical processing

3.1 Source data

3.2 Frequency of data collection

3.3 Data collection

3.4 Data validation

3.5 Data compilation

Imputation of characteristics

Imputation of savings income

Imputation of dividend income

Imputation of pension income

Imputation of Marriage Allowance

Grossing

Modelling Income Tax liabilities with the Personal Tax Model

Aggregating data

4. Quality Management

4.1 Quality assurance

4.2 Quality assessment

Stage 1 – Specifying the question

Stage 2 – Developing the methodology

Stage 3 – Building and populating a model/piece of code

Stage 4 – Running and testing the model/code

Stage 5 – Drafting the final output

5. Relevance

5.1 User needs

5.2 User satisfaction

5.3 Completeness

6. Accuracy and reliability

6.1 Overall accuracy

6.2 Sampling error

6.3 Non-sampling error

Coverage error

Model errors

Measurement error

Non-response error

Processing error

6.4 Data revision

Data revision – policy

Data revision – practice

6.5 Seasonal adjustment

7. Timeliness and punctuality

7.1 Timeliness

7.2 Punctuality

8. Coherence and comparability

8.1 Geographical comparability

8.2 Comparability over time

8.3 Coherence – cross domain

Coherence – sub-annual and annual statistics

Coherence – national accounts

8.4 Coherence – internal

9. Accessibility and clarity

9.1 News release

9.2 Publication

9.3 Online databases

9.4 Micro-data access

9.5 Other

9.6 Documentation on methodology