Guidance

Methodology statement: DWP benefits statistical summary

Updated 13 February 2024

Introduction

The main purpose of this document is to provide users with information about the methods used to produce the DWP Benefits statistics release and its main products, in accordance with practices set out in the Code of Practice for Statistics. The DWP benefits statistics collection brings together summary National and Official statistics available through Stat-Xplore on the following benefits:

  • Attendance Allowance (AA)
  • Bereavement benefits (BB)
  • Bereavement Support Payment (BSP)
  • Carer’s Allowance (CA)
  • Disability Living Allowance (DLA)
  • Employment and Support Allowance (ESA)
  • Housing Benefit (HB)
  • HB flows
  • Incapacity Benefit (IB)
  • Income Support (IS)
  • Industrial Injuries Disablement Benefit (IIDB)
  • Jobseeker’s Allowance (JSA)
  • Pension Credit (PC)
  • Severe Disablement Allowance (SDA)
  • State Pension (SP)
  • Widow’s Benefit (WB)

A statistical summary document is published on a 6-monthly basis in February and August each year. It contains a high level summary of the latest National Statistics for the benefits listed above and where relevant messages regarding:

  • Personal Independence Payment (PIP)
  • Universal Credit (UC)
  • DWP benefit combinations

We have previously published National and Official statistics on the following benefit breakdowns twice a year in May and November via data tables within this collection:

  • State Pension 5% sample

These data tables were suspended from August 2021 due to the source data not being representative following the introduction of a new DWP computer system. For more information on this issue see the background information note.

We also publish Official statistics on a quarterly basis via data tables as part of this release:

The benefits covered by this document are JSA, IS, IB, IIDB, SDA, ESA, CA, WB/BB, DLA, PC, SP and AA. Statistics for all of these benefits follow a common production process detailed below.

Annex A details the methodology for MA and SP tables produced using 5% sample data.

Separate methodology documents are available for UC, PIP, BSP, HB and HB flows (see Annex B). This is because they use different types of data and follow different production processes.

Overview

Responsibility for producing the data we publish sits with two teams who work closely together within DWP Digital Data and Analytics. The Data Delivery Team produce frozen snapshot data known as the 100% Frozen data source. The datasets are based on administrative data derived from DWP computer systems. The data are sometimes referred to as the Work and Pensions Longitudinal Study (WPLS). The snapshots represent the situation as at the last day of each quarter. In some instances, 5% samples of data are produced where a broader range of variables are needed for a publication.

The Client Statistics Team are responsible for conducting quality assurance checks on the data, for the hosting of the data on the Stat-Xplore dissemination tool and for the production of the statistical summary release.

Data production

Production begins with the creation of the Frozen Datasets. The Frozen Datasets are a snapshot of the benefit system at a certain point in time, the last day of the quarter. They are ran using a program called SAS and are derived from the National Benefits Database (NBD).

Additional data are then provided from a point-in-time dataset called GMSONE. NBD holds claim level information on what benefits a person has claimed past and present, whilst GMSONE holds claim level information on what benefits a person is on at that point in time and contains key characteristics about each benefit claim at that point in time. The data for GMSONE comes from the different benefit computer systems, collected by DWP for administrative purposes.

The SAS program used to create the datasets is run as soon as the latest NBD is available. The Frozen Datasets contain all customers on benefit at that point in time across all benefits and contain key characteristics about each benefit claim at that point in time. Additionally, there is a dataset representing the personal characteristics of each claimant called “Pers”. This contains information such as age, gender, location. The key feature of the Pers dataset is the identification of key groups of people by the combination of benefits they receive. An additional dataset called the Address History File is used to assign geographical information to the pers file in line with this guidance.

Information is brought in from the DWP Customer Information System (CIS) to provide geographic information for each claimant. An additional dataset called the CIS Address History File is used to assign geographies to the pers file in line with this geography information guidance.

From February 2020, CIS data is also being used to verify the date of birth and gender information held on the file, ensuring that age and gender values are as accurate as they can be.

Quality assurance of frozen data

Quality assurance of the Frozen Datasets is a largely automated process carried out in SAS with a manual analysis of output. Before the automated phase can begin, analysts will check SAS production logs to ensure the frozen datasets were created free of any known errors and that the coding has referenced all of the data sources correctly.

The automated program aims to check for various features within the datasets compared to the previous version, such as missing variables, new decodes, missing values, avoidance of any duplicated cases, and changes to the distribution of variables. The results are stored away in a lookup file for use in future quarters and are presented in the form of an html report. Members of the team assess these reports and look for any potential issues within the datasets. All potential issues and suspect movements in the datasets are raised and investigated. It is often the case that movements can be attributed to change in benefit policy and are fully expected.

Types of tests include:

  • caseload numbers which are checked for trends
  • number of cases leaving the data since last quarter
  • number of cases joining the datasets since last quarter
  • check that the number of duplicated values is zero
  • distributions over different variables (including missing values), many of which should have remained similar to the previous quarter
  • Chi-Squared tests
  • Kolmogorov-Smirnov two sample tests
  • reports of which variables have changed the most since the previous quarter

Where necessary, additional longitudinal or time-series checks are performed on relevant variables to check for consistency across a wider timeframe.

Once completed, the Lead Statistician will ensure all checking has been done and signs off the Frozen Datasets. All actions are recorded on a Quality Assurance log, and are checked off once complete to ensure that no steps are missed, in line with Aqua book principles.

Secondary data production

After the initial quality checks are complete, the next stage of the process is to create datasets for our Stat-Xplore dissemination tool and tabled products to help assess quality. This is a two stage process that retains some of the older processing work we used to perform for the old WPLS Tabtool and NOMIS data production. We realised that the Quality Assurance process for NOMIS could be adopted for quality checking the Stat-Xplore data prior to its release.

Cube creation and ethnicity data

To help with Quality Assurance, the frozen datasets are transformed into analytical data cubes, in which the data are summarised to every unique combination for each benefit. This reduces the number of rows we need to store the data and speeds up the production of time-series tables that we use for Quality Assurance. The SAS codes that create the analytical cubes is ran by the Data Delivery Team and it also adds on ethnicity data to the Pers dataset using Labour Market System (LMS) data.

Creation of IIDB data

The creation of IIDB data sits outside of the process outlined above. The data come from the UK Industrial Injuries Computer Scan (IICS) system. The datasets are structured differently and cover different time periods to the rest of 100% Frozen data sources. The data is a snapshot of the IICS system on the last day of the quarter. IIDB data are published for March, June, September and December with a seven-month time lag.

However, the general principles of data production, such as geography assignment and the creation of Stat-Xplore datasets are very similar, and the steps taken to quality assure data are almost identical.

Updating Stat-Xplore

Stat-Xplore is an online tool that allows the creation and download of customised statistical tables, and data visualisations through interactive charts. Stat-Xplore has a user guide. Stat-Xplore is maintained by a central team within Client Statistics who co-ordinate the monthly and quarterly releases across a broad range of benefits and measures.

For this release, every quarter statisticians will run additional SAS codes for each benefit that merge some characteristics from the pers data with the corresponding benefit details from the benefit data to create all of the information required for the range of breakdowns that we publish on Stat-Xplore. Some reclassification of variables takes place to help summarise data. For AA, DLA and CA, data are subset into two datasets to show all entitled cases and cases in payment. The data are exported into a format that Stat-Xplore can then recognise (CSV).

Once all of the relevant datasets have been uploaded onto Stat-Xplore’s development server, statisticians will perform quality checks on the breakdowns, notation and presentation in order to satisfy ourselves that the process of hosting data has worked without errors or any other unintended consequences.

Stat-Xplore data are “perturbed” to hide the true value of data, as a form of disclosure control. However, provided the differences between the raw frozen data and the Stat-Xplore output fall within a specific range then the data tables can be signed off by the Lead Statistician. This type of checking is performed in Microsoft Excel. Additional checks of titles, footnotes and background guidance held on Stat-Xplore is also performed each quarter.

Production of statistical summary and ministerial submission

As mentioned above the DWP benefits statistics collection consists of quarterly data released mainly through Stat-Xplore but also in spreadsheet based tables where necessary. Bi-annually, we release a statistical summary and State Pension 5% data. This section details the production of the statistical summary and the ministerial submission.

What is a ministerial submission?

As the data release and summary contain National Statistics, there is an internal departmental requirement to accompany this release with a Ministerial Submission. The Submission is circulated to the Minister’s office and contains input from policy areas. The Submission is sent one day prior to release day and is fully compliant with the UKSA rules surrounding pre-release access of statistics. It contains a blend of the most notable issues within the latest release, key features, and policy guidance.

Analysis

An analysis of any changes and trends in the data will be conducted across the data production and quality processes listed above.

We also maintain a watching brief on any policy developments that might change the meaning or interpretation of any of our published data.

Where appropriate, additional input from analytical leads will be sought to help explain the underlying reasons for any significant changes noted. The key themes and stories for the next release can then be teased out.

Sources annex

This is a means of presenting as much of the data used for the statistical summary and the ministerial submission as possible into a single Excel Workbook. For most of the benefits, the data for the Sources Annex comes from the Stat-Xplore development server where data have already been checked and had perturbation applied. Additional data are brought in for Benefit Combinations, HB, UC and PIP via Stat-xplore and BSP statistics are added in manually. The Sources Annex uses formulas to apply the current rounding policy to our data so that narratives can be drafted quickly and consistently using the appropriate level of data rounding. After production, a separate analyst will conduct a full quality review of the sources document.

Statistical summary

From August 2019 the statistical summary has been produced in a HTML format. This has enabled us to comply with gov.uk accessibility standards and ensure that the summaries can be accessed on a broader range of digital devices such as smart phones or tablets. The process uses a web language similar to HTML called Markdown. The Client Statistics Team will write all of the content for the summary and work with Digital Publication strands to ensure that any markdown language is consistent and accessible.

Once the analysis has been completed and the sources annex updated, the summary is a relatively simple document to produce; initial drafting is done in Microsoft Word and the previous reports are used as a template.

Tables and charts

Tables and charts presented in the statistical summary are created using Stat-xplore data which are moved into MS Excel. This ensures that chart data employ the same disclosure controls as Stat-xplore tables. In the statistical summary, some tables are presented with additional functionality so that users can toggle between a table and a simple bar chart. However, where a graph or chart requires greater detailing or explanations, it is then necessary to create images with alt-text descriptors sitting behind the image.

Pre-publication and release

We pre-announce the release date of this statistical publication at least 28 calendar days in advance, in accordance with release practices set out in the Code of Practice for Statistics. Find dates of future DWP publications.

In addition to DWP staff who are responsible for the production and quality assurance of the statistics, a limited number of individuals are granted 24 hours pre-release access.

Under the Pre-release Access to Official Statistics Order 2008 government departments are required to maintain an up-to-date list of the job titles and the organizations of everyone who has pre-release access to statistical releases. For transparency, we publish the pre-release access list.

The Statistical Summary is added to the DWP website at 09:30am on Release Day. The publication has its own landing page which contains the publication itself as well as key information about the contents of the release. Old versions of the Statistical Summary are archived and still available. See the DWP benefits statistics landing page. Also at 09.30am on Release day, Stat-xplore will “go-live” so that data for the new quarter are available.

Further information

Read our background information note for more details about changes and revisions to the release.

Annex A: 5% sample data tables

In addition to the 100% Frozen data methodology detailed above, we also use 5% sample data to produce tables for Maternity Allowance and some breakdowns for State Pension where we are unable to use the 100% sources. Data from the samples are summarised using SAS so that the desired tabulations can be produced for publication. Quality checking of both the underlying data and the summarised tables are again performed using Aqua book principles and a checklist to ensure that no checks are omitted.

Limitations

The data contained in those publications are based on 5% of the total ‘live’, GB cases held on the Pensions Strategy Computer System. The 5% sample contains all the characteristics of the complete set of records held on the PSCS. Therefore, if the number of cases in a sample with characteristic A is 500, we can say that the number of cases with characteristic A on the PSCS is equal to 500 multiplied by 20, i.e. 10,000 cases.

The sample taken is just one of many different samples which could be taken, therefore the total calculated number of cases with characteristic A is only an estimate of the actual number of cases. If another sample had been taken then the estimate for the same characteristic may have been slightly different. Table 1 shown below gives the amount of variation that can be expected from the estimated number of cases with a certain characteristic. For example, if from a 5% sample there are 500 cases with characteristic A then the estimated number of cases on the PSCS with the same characteristic is 10,000. We could reasonably expect the actual figure to lie between 9,150 and 10,850.

Estimated value 95% confidence interval Confidence interval as a % of the estimate
0 0 to 60 .
100 34 to 230 .
300 171 to 490 .
500 328 to 732 .
1,000 +/- 270 +/- 27.0%
2,500 +/- 427 +/- 17.1%
5,000 +/- 604 +/- 12.1%
10,000 +/- 850 +/- 8.5%
25,000 +/- 1350 +/- 5.4%
50,000 +/- 1910 +/- 3.8%
100,000 +/- 2700 +/- 2.7%
1,000,000 +/- 9000 +/- 0.9%

Rating factors applied to the data mean that all data are necessarily rounded to the nearest factor of 20. 5% sample cannot easily be combined with 100% sources which limits the range and type of some of the breakdowns available.

Annex B: other benefits

Separate methodology documents are available for UC, PIP, BSP, HB and HB Flows. This is because they use different types of data and follow different production processes.