Research and analysis

Family Resources Survey Transformation: integrating administrative data into the FRS

Published 21 March 2024

1. Main points

  • Changes in the lawful basis for linking in 2018, together with improvements in our linking methodology, mean that we can now link at least 95% of FRS respondents to their administrative records – up from just over 50% previously.

  • This has opened the opportunity to achieve substantial data quality, timeliness and cost efficiency gains through the integration of administrative data into the FRS.

  • The use of administrative data resolves a substantial proportion but not all of the long-standing undercount of benefit receipt on the FRS.

  • It has become apparent that the remaining undercount is due to an under-representation of benefit recipients in the FRS sample. We have developed an experimental revised grossing regime, making use of administrative data, to correct for this sample bias.

  • We have developed imputation routines to estimate benefit payment amounts for the approximately 5% of respondents we cannot link. This demonstrates that it will be possible, with the use of administrative data and imputation, to remove most benefit questions from the survey.

  • Illustrative results show the combined positive effects of replacing survey responses with administrative records, imputation for unlinked respondents, and revised grossing on FRS estimates.

2. About our transformation research

2.1 Background

Each year DWP spends substantial resources in-house editing the raw Family Resources Survey (FRS) data. Particular attention is given to ensure that benefit and income information is as accurate as possible (see the FRS Background Information and Methodology for further details).

This is because the FRS provides the critical source data for DWP’s policy simulation tools, supporting DWP decisions on policy direction and providing the detailed costs information required by HM Treasury and the Office for Budget Responsibility.

The FRS also provides the source data for two other DWP Accredited Official Statistics publications that rely on high quality benefit and income information: Households Below Average Income (HBAI), Pensioners’ Incomes Statistics (PI), as well as Official Statistics on Income-related Benefits: Estimates of Take-up and Separated families statistics.

With current processes it takes a full year from the end of field work to get to publication.

2.2 Linking FRS respondents to their administrative records

FRS respondents were first asked for consent to link their survey responses to administrative records starting in 2007. The approach to obtaining consent was designed to meet the requirements of the Data Protection Act 1998. On average, each year, around two thirds of respondents consented.

Initially, respondents were asked directly for their National Insurance Numbers (NINOs), but the response rate was very poor, and the question was quickly dropped.

As a consequence, consenting respondents were linked by name, postcode and date of birth to DWP’s Customer Information System (CIS) to obtain their NINOs. CIS stores the names, dates of birth, and latest address information, for everyone who has been issued with a NINO. This is a form of deterministic matching – a rules-based process to determine an “exact match” between two records.

Each year we matched around 80% of consenting respondents, giving us an average overall annual match rate of just over 50%.

A lookup file was created for each survey year consisting of anonymised identifiers for FRS respondents (household, benefit unit and person), together with their encrypted NINOs.

The lookup file provides the link between FRS respondents’ survey data and their administrative records.

The implementation of the General Data Protection Regulation (GDPR) in June 2018, provided ‘public task’ (GDPR Article 6(1)(e)) as an alternative to consent as the lawful basis for linking. Since June 2018, all FRS processing in Great Britain, including data linking, has been carried out on the basis that it is necessary for the department to carry out its functions as a public body.

It means we can attempt to link all respondents (who are fully informed about how their information will be used), as opposed to just the two thirds who consented previously.

This change, together with recent improvements in our linking methodology, means that we can now find NINOs for more than 95% of respondents. We have a 5-year time series of lookup files on this basis for Great Britain and three years for the UK (Northern Ireland adopted the new approach from April 2020).

2.3 Benefits of integrating administrative data

Within DWP we have access to a range of high quality, timely, administrative datasets covering the core topic areas of the FRS: DWP benefits, HMRC tax credits, HMRC Real Time Information (RTI) on Pay As You Earn (PAYE), Self Assessment and savings.

Figure 2.1 Administrative data sources available for linking

DWP HMRC
Universal Credit and legacy benefits Real Time Information (RTI) on PAYE earnings
Pensioner benefits Self Assessment data
Disability benefits Tax Credits
Other benefits Savings
Child Maintenance Child benefit

We can link FRS respondents to all these sources via the lookup file. By integrating these data sources into the FRS, we have the potential to realise a range of quality, timeliness, and cost benefits:

1. Eliminating the long-standing undercount of benefits and tax credits. The initial assumption being that this undercount is largely due to respondent misreporting.

2. Improving accuracy of employment and self-employment income. We will do this by getting actual earnings information directly from PAYE and Self Assessment records.

3. Adding more analytical power. We will add more information which is not currently available on the survey, from administrative sources.

4. Reducing costs and respondent burden. With the use of administrative data, we can shorten the questionnaire by dropping many benefit and income questions.

5. Improving timeliness. Use of administrative data can reduce the time required to produce the final survey data. This is because a) most administrative extracts are available shortly after the end of survey year and b) use of these administrative datasets will reduce the need for survey data and editing.

2.4 FRS Transformation project: overview of work to date   

The FRS Transformation project was set up to research and develop the integration of administrative data into the FRS.

Workstreams fall into the following categories:

a) Lookup file development

b) DWP benefits, HMRC tax credits and child benefit

c) HMRC RTI PAYE data for employment and occupational pensions

d) HMRC Self Assessment

e) Other administrative data sources

f) Grossing review

g) Non-response research

h) End to end process review, including questionnaire redesign

Overall approach and assumptions at project outset

The new lookup file will enable us to link at least 95% of respondents to a wide range of administrative data sources.

We will replace survey variables with administrative equivalents, where these have been demonstrated to be of similar or better quality and impute values for the 5% of respondents who are unlinked.

Our approach is to create administrative variables which match the existing FRS variables as far as possible in terms of definitions and focus on respondents’ circumstances at the time of the interview.

This will enable us to reduce the questionnaire length because most survey questions covered by administrative data will not need to be asked at interview. We will retain the minimum questions needed for accurate imputation e.g. basic yes/no questions on benefit receipt.

Linking respondents to administrative data will allow us to eliminate the long-standing undercount of benefit receipt on the FRS. The assumption being this undercount is mainly due to misreporting by respondents.

Timely administrative data, and reduced editing requirements, will enable us to make a substantial step change in timeliness of FRS production.

a) Lookup file development

The movement away from consent and the adoption of GDPR ‘public task’ as the lawful basis for linking in 2018 has been transformative, as it enables us to attempt to link all respondents (as opposed to two thirds previously) and provided the impetus for our transformation work.

Existing deterministic techniques matched around 80% of FRS respondents to their NINOs on CIS.

Our development work then focused on improving the match rate. We have done this by developing automated SAS-based routines which use a large number of match keys made up of every possible combination of full and part elements of first name, middle initial, surname, date of birth and postcode. The decision on whether to accept the match given by a particular key combination depends on the uniqueness of that combination on CIS where:

Key combination uniqueness = (Number of unique combinations / Total number of combinations)

For internal DWP use, the uniqueness threshold was initially set at 90%. That means if we find a unique match on CIS for a particular key combination and that key combination overall is at least 90% unique, we accept the match.

For the five survey years to 2022-2023 this automated process gives a match rate of between 96% and 98%.

Work then turned to the quality assurance of those matches. If we are to replace survey data with administrative records, we need to have a high degree of confidence that we are linking respondents to the right administrative records.

We have developed a manual quality assurance process whereby:

  • Every match between 90%-99% uniqueness is examined.

  • Selected matches with more than 99% uniqueness are also checked.

The quality assurance process involves outputting the full names, dates of birth, postcodes and addresses for respondents and the corresponding details they have been matched to on CIS.

Full details are output for every individual in the household, so that matches can be assessed in a family/household context. This can help by, for example, picking up cases where there has been a name change on marriage. 

The manual checking process identifies particular kinds of errors in the automated matching and picks up some matches that were missed. We also use email addresses and telephone numbers (available for a minority of respondents), which are not used in the automated matching, to confirm matches.

The pre and post quality assurance match rates for the past five years are in the table below. We are rejecting between 1% and 2% of initial matches as part of the quality assurance process, demonstrating the focus we have on achieving high quality matches. The general policy with manual quality assurance is ‘if in doubt, do not accept a match.’

Figure 2.2 Lookup file match rates pre and post quality assurance, by FRS survey year

2018/2019
90% threshold
2018/2019
Post-QA
2019/2020
90% threshold
2019/2020
Post-QA
2020/2021
90% threshold
2020/2021
Post-QA
Matched 29,890 29,292 29,928 29,434 16,916 16,722
Unmatched 1,227 1,825 1,346 1,840 345 539
Match rate 96.1% 94.1% 95.7% 94.1% 98.0% 96.9%
All adults   31,117   31,274   17,261
2021/2022
90% threshold
2021/2022
Post-QA
2022/2023
90% threshold
2022/2023
Post-QA
Matched 26,900 26,542 41,137 40,629
Unmatched 626 984 1,356 1,864
Match rate 97.7% 96.4% 96.8% 95.6%
All adults   27,526   42,493

The automated matching process together with manual quality assurance has been refined with the production of the five-year time series. We now have a business-as-usual, tried and tested, lookup file production and quality assurance process, reliably giving us a high-quality match rate of 95% FRS respondents or more each year.

Work is continuing to further refine and improve the process. For example, we are investigating the potential to use Unique Property Reference Numbers (UPRNs) instead of postcodes. Use of UPRNs has the potential to reduce automated matching errors and increase the overall uniqueness of matches, reducing the volume requiring manual checking.

b) DWP benefits, HMRC tax credits and child benefit

This area has been the main focus of our research and development work so far. Integration of DWP benefits, HMRC tax credits and child benefit is covered in section 3.

c) HMRC RTI PAYE data for employment and occupational pensions

HMRC have provided us with an initial share of RTI PAYE records matched to FRS respondents covering the survey years up to 2020.

Our analysis shows a close match between RTI and FRS employment income values for most respondents but with a minority of mismatches:

  • People reporting employment on the FRS but no RTI records

  • People not reporting employment on the FRS but with RTI records

Through consultation with HMRC colleagues we have gained an insight into some of the reasons these differences. But we have further work to do with assistance from HMRC.

We are also in the process of agreeing a further RTI PAYE data share with HMRC to help us continue with this work.

d) HMRC Self-Assessment

DWP receives an annual Self Assessment (SA) extract covering sole trader/partnership taxable profit. We have linked FRS respondents to this dataset.

Our analysis shows that FRS estimates of self-employment income are significantly higher than SA taxable profit – even for linked respondents who say they have consulted their tax return. This finding is consistent with other research in this area, such as Survey and administrative data comparisons of self-employment income, UK - Office for National Statistics.

We suspect the reasons behind this may include misreporting of taxable profits by FRS respondents. Also, some respondents may be reporting income that is paid via dividends or PAYE as self-employment income in the cases where they are actually company directors.

A comprehensive HMRC SA data extract, covering other elements of SA (e.g. dividend income), is required to enable us to investigate and reconcile these differences.

We are in the process of agreeing a data share with HMRC alongside a further RTI PAYE data share.

e) Other administrative data sources

We are planning future work with other administrative sources including HMRC savings and DWP Child Maintenance data.

f) Grossing review

The need for a review of FRS grossing methodology became apparent from our work linking DWP benefits and HMRC tax credits and child benefit. The details are covered in section 4.

g) Non-response research

FRS response rates have been falling in recent years – from around 60% in 2016 to 44% in 2020 and lower again post-pandemic to 25%. This has increased the importance of understanding the characteristics of non-respondents for maintaining the overall quality of the FRS. One of the recommendations in the Office for Statistics Regulation 2021 review of income-based poverty statistics is that DWP should look to better understand the non-response bias of their surveys.

As an extension of our new approach to linking FRS respondents, we have experimented with linking both responding and non-responding household addresses to CIS.

The idea then is to link respondents and non-respondents, identified as living at those addresses according to CIS, to the same administrative sources (using the NINOs obtained from CIS) and analyse the differences in characteristics between the two groups based on all the available administrative information. These include key socio-economic and demographic characteristics: age, sex, region, benefit receipt, employment status, income, number of children etc.

Results of this work, if successful, could feed into improvements in FRS sample stratification and grossing methodology.

We have been able to match almost 99% of FRS sample addresses to CIS using deterministic and manual checking techniques.

However, confidently identifying the individuals living at these addresses at the time of the interview is challenging. This is particularly because CIS is not always updated with latest address information immediately after a change of address occurs e.g. an employee moving home but failing to inform their employer.

We have compared: 

  • the individuals identified on CIS as living at responding addresses at the time of interview; with

  • the individuals actually living at those addresses, according to the FRS

On average we identify 2.2 adults as potentially resident per responding address at the time of interview, whereas FRS responding households in fact average 1.74 adults per household.

There are significant differences between two groups.

The CIS identified group is younger, more male, has slightly lower income, and is much less likely to be in receipt of state benefits.

Consequently, this draws into question the validity of conclusions that can be drawn about actual respondents and non-respondents from CIS-based comparisons.

The comparisons we have made between CIS identified respondents and non-respondents do not show the kinds of differences we would expect to see based on known or expected differences between respondents and non-respondents.

Work is continuing to see if the linking process can be refined and if adjustments can be applied to help produce more robust respondent – non-respondent comparisons.

h) End-to-end process review, including questionnaire redesign

With the integration of administrative data, there will be a reduced and changed need for survey data collection. Many benefit and income questions will likely be dropped but there will be a need to retain some questions, or design new ones, to meet the requirements for accurate imputation. 

There will also be a reduced and changed survey data editing requirement. The current resource intensive editing processes for benefits and earned income will not be needed. And most of the key administrative data sources (e.g. benefits, PAYE income) will be available shortly after the end of the survey year.

This means a significant end-to-end process redesign will be required to optimise the efficiency and timeliness gains from the use of administrative data. We will be working in collaboration with ONS on the review and redesign process.

We have begun work in this area with a test redesign of the benefits sections of the questionnaire. The approach here is to retain basic questions on benefit receipt, to facilitate imputation of benefit amounts for unlinked respondents, while dropping all other questions.

3. Benefits: integrating benefits and tax credits

3.1 Undercount of benefit receipt on the FRS

FRS estimates have consistently undercounted actual benefit and tax credit caseloads over time. Figure 3.1, from the FRS 2022 to 2023 publication, shows figures for the latest survey year.

Figure 3.1 Receipt of state support, FRS data and administrative data, 2022 to 2023, Great Britain (FRS Table M.6a)

Benefit/Tax credit received FRS 2022 to 2023
Ungrossed percentage
FRS 2022 to 2023
Grossed number (1,000s)
FRS 2022 to 2023
Grossed percentage
FRS 2022 to 2023
Number (1,000s)
Administrative data
Percentage
Administrative data
Percentage difference
All Benefit units 100 34,500 100 34,500 100  
Income Support [low] 180 1 170 [low] 6
Pension Credit 3 1,000 3 1,370 4 -27
Housing Benefit 6 2,000 6 2,510 7 -20
Council Tax Reduction 12 3,900 11 4,500 13 -13
Universal Credit 8 3,000 9 4,290 12 -30
Benefit/Tax credit received FRS 2022 to 2023
Ungrossed percentage
FRS 2022 to 2023
Grossed number (1,000s)
FRS 2022 to 2023
Grossed percentage
FRS 2022 to 2023
Number (1,000s)
Administrative data
Percentage
Administrative data
Percentage difference
All in-work Benefit units 100 21,800 100 21,800 100  
Working Tax Credit 2 400 2 600 3 -33
Child Tax Credit 3 600 3 760 3 -21
Benefit/Tax credit received FRS 2022 to 2023
Ungrossed percentage
FRS 2022 to 2023
Grossed number (1,000s)
FRS 2022 to 2023
Grossed percentage
FRS 2022 to 2023
Number (1,000s)
Administrative data
Percentage
Administrative data
Percentage difference
All Adults 100 51,000 100 51,000 100  
State Pension 30 11,300 22 11,470 22 -1
Attendance Allowance 2 800 2 1,420 3 -44
Carer’s Allowance 1 800 2 950 2 -16
Employment and Support Allowance 2 1,100 2 1,660 3 -34
Benefit/Tax credit received FRS 2022 to 2023
Ungrossed percentage
FRS 2022 to 2023
Grossed number (1,000s)
FRS 2022 to 2023
Grossed percentage
FRS 2022 to 2023
Number (1,000s)
Administrative data
Percentage
Administrative data
Percentage difference
All individuals aged 16 or over 100 52,500 100 52,500 100  
Disability Living Allowance 2 800 2 700 1 14
Personal Independence Payment 5 2,600 5 3,140 6 -17

There has been a long-standing assumption that the FRS benefit undercount is likely to be due to under-reporting by FRS respondents. Therefore, replacing survey responses with administrative records was expected to resolve the undercount.

3.2 Integrating benefits administrative data – our overall approach

Our approach to integrating benefits administrative data is to:

  1. Link respondents to each of the administrative benefits datasets in turn.

  2. Identify the benefits respondents are actually in receipt of at the time of interview.

  3. Segment all respondents according to survey responses and actual benefit receipt according to administrative records.

  4. Identify payment amounts, either directly from the administrative benefits datasets, or via DWP’s Central Payments System (CPS).

  5. Impute benefit amounts for unlinked respondents.

  6. Create an FRS benefits table based fully on administrative data and populate FRS Adult, Household and Benunit tables with administrative data values.

The administrative sources we are using are outlined in figure 3.2 below.

Figure 3.2 Benefit and tax credit administrative sources used for linking to the FRS

Benefit Administrative data source/system
Attendance Allowance, Disability Allowance, Carer’s Allowance, Industrial Injury Disability Benefit, Jobseeker’s Allowance, Employment and Support Allowance and Income Support, State Pension and Pension Credit, Maternity Allowance, Bereavement Support Payment General Matching Service (GMS)
Universal Credit Universal Credit Full Service
Personal Independence Payment Personal Independence Payment (PIP)
Housing Benefit Single Housing Benefit Extract (SHBE)
Child Benefit, Child Tax Credit & Working Tax Credit (GMS extracts) HMRC
Benefit payment amounts and deductions Central Payment System (CPS) is an integrated payment and accounting system for the Department
State Pension - some elements Central Payment System (CPS)

There are a number of differences in structure and coverage between the various administrative sources, but an overall common approach is used when linking respondents and identifying benefit receipt.

A scan, or extract, is taken from each benefit IT system periodically through the year, depending on the frequency of payment for the particular benefit i.e. for Attendance Allowance it is 4-weekly, for Universal Credit it is monthly. From these scans/extracts, SAS tables are created, and these are the datasets we use for linking.  

Firstly, we use the lookup file to scan each dataset, for each benefit, across the survey year to identify respondents in receipt at the time of interview.

We then conduct a segmentation analysis to assess the match/mismatch between survey self-reported benefit receipt and actual receipt based on the administrative records. This analysis formed a very important early part of our research.  

For those in receipt, we then identify the payment amount closest to but before the interview date. If no payment has been made before the interview date (e.g. a new claim, with first payment yet to be made), then we take the first payment amount after the interview date (with time limits applied).

For some benefits, there can be a difference between the payment amount as recorded on the benefit dataset and the amount actually paid. This is often because of deductions, recoveries for overpayments, etc. Therefore, we link the respondents to CPS to identify the actual amount paid and the amounts of any deductions or recoveries.

We then impute payment amounts for unlinked respondents – using their self-reported benefit receipt. The detail of the development of imputation is covered in section 5.

The last stage of the process is to compile a complete FRS benefits table based on administrative data.

One should note, however, that not all state benefits are included. Specifically, a number of smaller non-DWP benefits such as Armed Forces Compensation Scheme and War Widow’s/Widower’s Pension are not integrated as we do not have access to their respective administrative data.

Routines are also run to populate Adult, Household and Benunit tables with admin-based benefit variables.

3.3 Linking respondents, identifying receipt and payments

The diagram below shows how the linking process works, using Attendance Allowance (AA) as an example.

We use the lookup file to link respondents to the General Matching Service (GMS) AA dataset. An AA extract is produced every 4 weeks. In this example, the claimant was interviewed on 05-May-2022 and we identify from GMS that they had a live AA claim in the extract which included the interview date: 02-May-2022 to 29-May-2022.

The total weekly benefit amount according to GMS is £92.40. This represents the entitlement amount at the date of extract. However, the last amount actually received, as recorded in CPS, is different in this case. This is because the last payment covered a period which was partly before and partly after the annual uprating of the benefit amount.

From CPS then, we pick up the most recent payment closest to the interview date which was 12-April-2022, and the amount paid was £90.30.

The diagram 3.4 below gives some examples of how CPS can give a more complete picture of benefit payments. In each of these cases, the entitlement amount recorded on GMS does not take into account the deductions for DWP third party payments and recoveries for loan repayments.

Figure 3.3 Linking to FRS respondents to the GMS Attendance Allowance dataset

Figure 3.4 Using CPS to identify deductions from benefit amounts

For example, Person 1 in the household represented by the Sernum 11111111 was interviewed on 31-July-2022 and is found to have a live claim for Jobseeker’s Allowance (JSA) in the GMS extract 30-Jul-22 to 05-Aug-22, which includes the interview date.

The JSA entitlement amount according to the GMS extract is £77.00. However, by linking to CPS and identifying the actual payment details closest to the interview date we can see that this person had a deduction and a recovery payment which reduced their actual net JSA payment to £59.66.

In our admin-based benefits table, the net amount will be recorded as the JSA amount (benamt) and the deduction and recovery amounts will be recorded separately in their own categories (e.g. Third-Party payments, Social Fund loan repayments). In this way, complete and accurate payment circumstances can be recorded on the final table.

3.4 Segmentation analysis: survey responses vs. administrative records

Having linked respondents, we carried out a segmentation analysis across all benefits to assess the differences between survey responses and administrative records. Figure 3.5 shows the segmentation for AA for FRS 2022 to 2023 as an example.

Figure 3.5 Attendance Allowance: survey responses v administrative records

All FRS respondents thus fall into one of six segments: 

  1. Unlinked and do not report AA on the FRS

  2. Unlinked and report AA on the FRS

  3. Linked and report AA on the FRS but no AA administrative record exists

  4. Linked and report AA on the FRS and an AA administrative record exists

  5. Linked and do not report AA on the FRS but an AA administrative record exists 

  6. Linked and do not report AA on the FRS and no AA administrative record exists

Likely because of misunderstanding, some respondents report Attendance Allowance when they are not actually receiving it (Segment 3). Some on the other hand do not report Attendance Allowance on the FRS when they are in fact receiving the benefit as shown by DWP records (Segment 5).

For Attendance Allowance, the level of mistaken reporting of AA receipt is relatively small. The grossed estimate in Segment 3 is around 19,000 out of an average AA caseload of 1.4 million. However, the estimated level of under reporting is very large, around 356,000 in Segment 5, out of 1.4 million (26%).

For linked respondents then, replacing survey responses with administrative records has two positive effects on FRS accuracy:

  1. Removing mistaken/erroneous reporting of benefits by respondents; and
  2. Removing under reporting of benefits

The question then remains what to do with unlinked respondents. In theory, it might be possible to impute AA receipt from Segments 1 and 2. However, the experiments we have conducted have shown that it is not possible to impute receipt accurately enough to make this option viable. Therefore, the practical solution is to retain Segment 2 as our best practical estimate of benefit receipt for unlinked respondents.  

With the use of administrative data then, our final AA sample is made up of the following segments:  

Segment 2 + Segment 4 + Segment 5 = Integrated FRS

Which for AA has the volumes:

10k + 820k + 360k = 1,190k

This still gives us an administrative-based caseload which is considerably lower than the actual AA administrative caseload of 1.4 million. So, linking to administrative data and grossing does not fully close the undercount for AA.

It can also be seen that the proportionate balance between Segment 2 and Segments 4 and 5, is not as we would have expected, given that 5% of respondents are unlinked. The grossed count of AA caseload in Segment 2 represents is less than 1% of the total for Segments 2, 4 and 5 combined.

3.5 Generalising the segmentation approach across all benefits.

The segmentation analysis carried out for AA was replicated across all the main benefits (and tax credits) and across the five survey years. The results for a selection of key benefits for the 2022 to 2023 survey year are presented in figure 3.6:

Figure 3.6 FRS 2022 to 2023 benefit receipt, grossed estimates Great Britain (‘000s)

We can see the pattern of Segment 2 being a smaller than expected as a proportion of Segments 2, 4 and 5 is repeated across benefits. 

In addition, we can see the pattern of the admin-based estimates partly but not fully closing the undercount is also repeated. For example, the Universal Credit (UC) undercount is reduced from 36% to 23% and the Disability Living Allowance (DLA) undercount is reduced from 42% to 18%. 

These patterns led us to conduct a number of analyses, which led to two conclusions:

1. The lookup file is biased towards benefit recipients. This is because benefit recipients are more likely to have interactions which are recorded on CIS, their addresses are more likely to be up-to-date and they are therefore more likely to be linked.

2. The overall achieved FRS sample is biased. This is because benefit recipients are underrepresented in ways that are not being corrected for by the current grossing regime. This also explains why linking to administrative records alone does not resolve the benefit undercount.

The lookup file bias complicated our segmentation analysis but is not a problem for integration work in and of itself. The overall sample bias is more problematic for achieving a fully representative FRS and led us to conduct a review of the current grossing methodology.

4. Grossing: adding control total for benefits, employment, and self-employment

4.1 Current grossing and ONS review

The current grossing regime for the FRS is outlined in detail in the FRS Background Information and Methodology.

The system used to calculate grossing factors for the FRS divides the sample into a series of control groups representing private households or those living in private households. The population estimates for these groups are obtained from a variety of official sources, many based on census data.

There are two stages of weighting: an initial design stage and then a calibration weighting. The design weighting takes account of the number of households at an address.

The calibration stage then adjusts the design weights to match the set of predetermined control totals. The calibration weights are calculated using a computer program called CALMAR, which was developed by the French National Statistical Institute.

The grossing methodology was last reviewed by the ONS Methodology Service, to coordinate with revisions to the mid-year population estimates following the 2011 Census. The report recommended, in principle, that controls on the types of state support received should be introduced, to deal with the benefit undercount, using counts from DWP’s administrative systems as controls.

However, there was a concern at the time, that measurement errors from people misreporting types of benefits could introduce errors, as linking was not viable at the time because of the low match rate.

The respondent lookup file match rate is now transformed compared to the circumstances when the ONS report was written. With the 95%+ match rate, using benefits caseload counts as control totals is now viable as a means of dealing with the residual undercount after integrating administrative data.

4.2 Adding new control totals

In consultation with ONS methodology colleagues, we have developed a new experimental grossing approach to resolving the benefit undercount – adding control totals for benefits, employment, and self-employment.

This covers the five years of interest for our research up to 2022 to 2023 and covers Great Britain only at this stage. This is because we only have 3 years of high match-rate lookup files on a UK basis, and we wanted to conduct the initial research with as long a consistent time series as possible.

Figure 4.1 Introducing additional control totals to FRS grossing

We used a three-stage process, as outlined in figure 4.1.

  1. Firstly, we added a set of benefit controls. We carried out several preliminary tests, such as for collinearity with existing controls/other benefits. Then, having produced a grossing factor, Gross 5 test 1, we carried out a set of performance tests on those controls and checking for effects on other key variables. Gross 5 test 1 worked very well in controlling to the benefit totals, fully removing the benefit undercount. However, we noticed a significant change on the International Labour Organisation employment status variable Empstat, estimate of employment.

  2. Second, we added RTI PAYE as a control. This brought the Empstat estimate of employment towards its previous value, more or less, depending on the survey year. But this had a negative effect on Empstat for self-employment.

  3. Third, we added a Self Employment control total. making use of the RAPID (Registration And Population Interaction Database) combined Self Assessment, Tax Credits and Universal Credit estimate of self-employment. This was successful in bringing Empstat self-employment measure up to a point between the original FRS estimate and the higher Annual Population Survey measure.

For benefits: The control totals are the monthly averages of the administrative populations for the survey year of interest. Adjustments to the totals have been made where necessary to take account of claimants living in non-private households.  

On the sample data, the respondents marked for control are those linked to admin data or those self-reporting if unlinked i.e. Segments 2, 4 and 5. For details, see figure 4.2.

For RTI PAYE: We only have two years of RTI data linked to FRS respondents. RTI has been used for the years 2018 to 2019 and 2019 to 2020. The control totals are the average of published RTI PAYE monthly pay-rolled employees for the financial year of interest.

We are working with HMRC colleagues to fully reconcile RTI data and FRS survey responses on employment. Therefore, we have had to make a number of provisional assumption when using RTI for grossing. Specifically, that a proportion of Segment 1 and Segment 3 cases actually do have RTI PAYE returns.

For the 3 years to 2022 to 2023 where we do not RTI information, we have used RAPID P14 employment income information instead. This provides a monthly employment marker rather than the exact date which is available from RTI. It also contains some occupational pension records which are not present in RTI.    

Self-employed: The control totals are sourced from RAPID which uses three categories: Self Assessment, Tax Credits and Universal Credit (self-employed who claim Tax Credits or UC and do not make an SA return).

The sample data used for calibration is outlined in Figure 4.2.  

Figure 4.2 Calibration markers

Benefits, Rapid Linking Segment Benefit Segment RTI Segment
1) Not Linked and do not report - 7%
2) Not Linked and report benefit 100% 100%
3) Linked and report on FRS only - 50%
4) Linked and report on FRS and Admin 100% 100%
5) Linked and report on Admin only 100% 100%
6) Linked and not Benefit or RTI - -

The full results of the use of this revised grossing for benefits are available in the accompanying tables and are discussed in Section 6.

5. Imputation: imputing benefit amounts for unlinked respondents

With linking, we now have comprehensive coverage of benefit receipt and payments for 95% of respondents from administrative data. The issue then is what to do with the 5% we cannot link.

One of the aims of transformation work is to reduce the survey data collection – to reduce cost and respondent burden. We only want to continue to collect information via interview if the information is not reliably available from administrative sources.

Therefore, one option, going forward, is to drop all benefit questions from the survey and impute benefit receipt and payment amounts for the 5%.

As mentioned in Section 3.4, we conducted a number of experiments with imputation methods for benefit receipt. For example, using logistic regression to identify recipients, based on their characteristics, and hot decking to allocate payment amounts. Our conclusion is that it is not possible to impute benefit receipt with sufficient accuracy to make the process viable. The probability of correctly identifying a benefit recipient is little more than 50%. In addition, the volumes involved are so low as to not make the effort worthwhile.

Therefore, we plan in the future to retain simple yes/no questions on benefit receipt and drop all other benefit questions from the interview. We will use credibility checks to minimise benefit misreporting by the 5% unlinked.

We have developed a combination of imputation, calculation, and hot-decking routines to impute monetary amounts, with the techniques used varying depending on the benefit.

Figure 5.1 shows our approaches to imputation for the main benefits. We have not developed imputation methods for legacy UC benefits as they are not expected to exist by the time planned FRS Transformation changes to the questionnaire are anticipated. Up to that point there will be no change to the data collection and benefit receipt and payments amounts will be available for each FRS survey year.

Figure 5.1 Imputation method by benefit, FRS 2022 to 2023 Segment 2 sample sizes

Benefit Segment 2 Segment 2, 4 & 5 Segment 2 as a % of 2, 4 & 5 Imputation technique
UC 48 2,595 2% Calculation + modelling (advances & deductions)
JSA (NS) 11 61 18% Apply standard benefit rates
ESA (NS) 11 956 1% Apply standard benefit rates
IS 6 127 5% UC legacy benefit
HB 52 1,824 3% UC legacy benefit
WTC 6 322 2% UC legacy benefit
CTC 17 622 3% UC legacy benefit
AA 9 1,100 1% Hot-decking
DLA 6 849 1% Hot-decking
PIP 28 2,211 1% Hot-decking
IIDB 1 203 0.5% Median when cases arise
CA 8 677 1% Apply standard benefit rates
SP 223 11,822 2% Calculation + modelling difference
PC 12 1,030 1% Calculation
WFP 243 11,785 2% Calculation
CB 120 5,050 2% Apply standard benefit rates
MA 0 20 0% Median when cases arise
BB 0 48 0% Median when cases arise

All of the imputation techniques have been developed using Segment 4 respondents (single or multiple years) as the test sample. This gives us relatively large samples to work with, where survey responses correspond with the administrative records. We can develop our imputation routines and test their accuracy against the actual administrative values for Segment 4 cases.

Figures 5.2 and 5.3 compare imputed values to actual admin values for Segment 4 respondents for the five years to 2022 to 2023.

Figure 5.2 Mean imputed benefit awards as a proportion of mean administrative awards, selected benefits

Benefit 2018/2019 2019/2020 2020/2021 2021/2022 2022/2023 Average
UC 1.11 1.04 0.97 0.94 0.96 1.00
AA 0.99 1.03 1.02 1.01 1.00 1.01
DLA_C 0.97 1.00 0.96 0.99 0.98 0.98
DLA_M 1.00 1.00 1.00 1.00 1.00 1.00
PIP_DL 0.99 1.00 0.98 1.02 1.00 1.00
PIP_M 1.01 0.99 1.02 1.00 0.99 1.00
CA 1.01 1.01 1.01 1.00 1.00 1.01
SP 1.01 1.01 1.00 1.00 1.00 1.00
PC 1.03 1.01 0.93 0.91 1.02 0.98

Figure 5.3 Median imputed benefit awards as a proportion of median administrative awards, selected benefits

Benefit 2018/2019 2019/2020 2020/2021 2021/2022 2022/2023 Average
UC 1.17 1.06 1.06 0.94 0.96 1.04
AA 1.00 1.00 1.00 1.00 1.00 1.00
DLA_C 1.00 1.00 1.00 1.00 1.00 1.00
DLA_M 1.00 1.00 1.00 1.00 1.00 1.00
PIP_DL 1.00 1.00 1.00 1.00 1.00 1.00
PIP_M 1.00 1.01 1.00 1.00 1.00 1.00
CA 1.00 1.00 1.00 1.00 1.00 1.00
SP 1.02 1.02 1.02 1.00 1.00 1.01
PC 1.03 1.06 0.83 0.83 1.06 0.96

There is some variation between imputed and actual values year-on-year. This variation is more notable for Universal Credit and Pension Credit, while still being within +/- 6% of actual values in most years. Overall, we consider that the performance of the imputation routines is good.    

The combination of accurate imputation and a small proportion of unlinked respondents in receipt of benefits (less than 2% in most cases) means that our proposed approach (keeping yes/no benefit questions and imputing benefit award amounts) is workable in practice.

6. Illustrative results

We have put the various elements of our research together to create a five-year test series of integrated benefits tables covering the FRS survey years up to 2022 to 2023. In doing so we have refined our approach, which is to:    

  1. Link the 95% of respondents with NINOs to each of the administrative benefit sources in turn, identifying receipt at the time of interview (Segment 4 and 5 respondents).

  2. Identify benefit awards, deductions, and repayments for linked respondents from CPS.

  3. Use self-reported benefit receipt for unlinked respondents and impute receipt (Segment 2 respondents).

  4. Discard self-reported benefit receipt for linked respondents if no benefit records exist (Segment 3 respondents).

  5. Apply an experimental grossing factor (incorporating benefits, RTI PAYE and SA as new control totals) to produce grossed estimates of benefit receipt which match the actual benefit populations.

  6. Combine the results to create administrative data-based FRS benefits tables, populating the relevant parts of additional FRS tables – Adult, Benunit, Household.

Figure 6.1 shows the effects of the different stages of integration on the FRS AA caseload, mean and median awards and expenditure estimates:

  1. The AA administrative caseload of 1.38 million is calculated by averaging the live claim count across the thirteen, 4-week, AA administrative data extracts over 2022 to 2023. This figure has been adjusted to remove claimants living in non-private households.
  2. The FRS pure survey-based estimate is 850,000, 62% of the actual administrative caseload.
  3. Linking administrative data, retaining self-reported AA receipt for unlinked respondents (Segment 2), dropping linked mistaken/erroneous positive claims (Segment 3), and adding respondents in receipt who have not self-reported receipt (Segment 5) gives a caseload estimate of 1.19 million – 86% of the actual administrative caseload.
  4. We then drop self-reported benefit amounts for Segment 2 cases and replace them with imputed values – to test and demonstrate any effect imputation may have on the overall distribution of monetary values.
  5. The final stage is to apply our experimental grossing factor, Gross5, which has a specific control for AA (and the other main benefits). This gives us a final revised AA estimate of 1.38 million – matching the actual administrative population.

Figure 6.1 Integrating administrative data for Attendance Allowance FRS 2022 to 2023

Caseload (Thousands)
Administrative caseload 1,380
FRS caseload estimate 850
FRS caseload estimate, integrating admin data 1,190
   
FRS caseload estimate, integrating admin data, revised grossing 1,380
   
Coverage - FRS estimates as a percentage of administrative caseloads  
FRS caseload 62%
FRS caseload estimate, integrating administrative data 86%
FRS caseload estimate, integrating administrative data, revised grossing 100%
   
Mean award £ (weekly/monthly)  
Administrative caseload 82
FRS caseload estimate 81
FRS caseload estimate, integrating admin data 81
FRS caseload estimate, integrating admin data, imputing values for unlinked cases 81
FRS caseload estimate, integrating admin data, imputed values for unlinked cases, revised grossing 81
   
Median award (weekly/monthly)  
Administrative caseload 93
FRS caseload estimate 92
FRS caseload estimate, integrating admin data 92
FRS caseload estimate, integrating admin data, imputing values for unlinked cases 92
FRS caseload estimate, integrating admin data, imputed values for unlinked cases, revised grossing 92
   
Annual expenditure £m  
Administrative caseload 5,900
FRS caseload estimate 3,600
FRS caseload estimate, integrating admin data 5,000
FRS caseload estimate, integrating admin data, imputing values for unlinked cases 5,000
FRS caseload estimate, integrating admin data, imputed values for unlinked cases, revised grossing 5,800
Published (expenditure tables) 5,700

A similar pattern is seen across all the other benefits, with some minor variations. For example, child benefit, which is not specifically controlled for in the experimental grossing factor, is slightly over-estimated in some years. 

Mean and median awards were also calculated for each of these steps. The results demonstrate that the FRS is generally accurate in estimating benefit award amounts. It also shows imputation is performing well – in that substituting imputed values for self-reported survey values does not have any significant effect on overall survey mean or median values.

Finally, we produced some expenditure estimates to see the effect of integration on the overall amount of benefit expenditure captured by the FRS.

For AA, the published expenditure estimate for 2022 to 2023 is £5,700 million. This is sourced from Benefit expenditure and caseload tables 2023. We approximated this figure directly, for comparison purposes, by multiplying our average caseload figure of 1.38 million by the average weekly payment by 52 – to get a direct estimate of annual expenditure of £5,900 million. We applied the same method to the other caseload figures. The FRS survey estimate of annual AA expenditure is £3,600 million while our final integrated, imputed, and re-grossed estimate is £5,800 million.

Similar patterns are seen across the other benefits in the tables accompanying this report, demonstrating the broad value of integration on caseload and award estimates, particularly when combined with a revised grossing regime.

7. Next steps

The next phase of FRS transformation work will focus on integrating HMRC RTI PAYE and Self Assessment data. We are currently working with HMRC colleagues on an agreement to share these data sources on an on-going basis and anticipate the next data share will occur during Summer 2024.

We will also begin work on other data sources during the summer, particularly HMRC savings and DWP Child Maintenance data.   

We expect to complete work on an FRS integrated with administrative data on benefits and earnings by March 2025, with a view to releasing the data as part of an experimental release. Our intention is to include details on how the use of administrative data might affect HBAI low-income estimates.   

We will engage with a variety of users on the experimental data, future plans to reduce the amount of information collected via the survey, and how that might affect our end-to-end survey requirement.

8. Feedback

We welcome feedback.

If you have any comments or questions about any aspect of the FRS Transformation project, please contact: frs.transformation@dwp.gov.uk

The landing page for this document is here: Family Resources Survey

Further information on the FRS can be accessed from the Family Resources Survey home page, together with the Background Information and Methodology document.

Accompanying excel and ODS tables can be accessed on the FRS Transformation home page.

10. Glossary

CIS

Customer Information System (CIS) is a DWP information system that stores the names and address history of everyone who has been issued with a NINO.

CPS

Central Payment System (CPS) is the single integrated payment and accounting system used by DWP.

DWP

Department for Work and Pensions.

FRS

The Family Resources Survey (FRS) is a continuous survey which collects information on the income and circumstances of individuals living in a representative sample of private households in the United Kingdom. The survey has been running in Great Britain since October 1992 and was extended to cover Northern Ireland in the survey year 2002 to 2003.

GDPR

General Data Protection Regulation (GDPR) is a European Union regulation which controls how personal information is used by organisations, businesses, or the government. The Data Protection Act 2018 is the UK’s implementation of GDPR.

GMS

General Matching Service (GMS) is primarily a tool used to identify potential fraud and error on DWP customer cases. It provides way of ensuring our data are coherent and consistent by comparing data held by the DWP to our customer’s cases.

HBAI

Households Below Average Income (HBAI) is an annual publication that provides statistics and commentary on living standards in UK households, as determined by disposable income. It includes the number and percentage of people living in low-income households, and changes in income patterns over time.

HMRC

His Majesty’s Revenue and Customs.

NINO

National Insurance Number.

ONS

Office for National Statistics.

PAYE

Pay As You Earn (PAYE) is the system for deducting and collecting Income Tax and National Insurance contributions from employment income.

RAPID

Registration and Population Interaction Database (RAPID) is a database created by the DWP. It provides a single coherent view of interactions across the breadth of benefits and earnings datasets for anyone with a National Insurance Number (NINO)

RTI

Real Time Information (RTI) is the system used by employers to report to HMRC each time they pay their employees. Under RTI, information about PAYE, National Insurance contributions and other deductions is transmitted to HMRC by the employer every time an employee is paid.

Self Assessment

Self Assessment tax return is a system HM Revenue and Customs (HMRC) uses to collect Income Tax. Although tax is usually deducted automatically from wages and pensions using PAYE, people and businesses with other income must report it in a tax return.

UPRN

The Unique Property Reference Number (UPRN) is the unique identifier for every addressable location in the UK.

List of Benefit Abbreviations

Abbreviation Benefit/Tax Credit name
UC Universal Credit
JSA Jobseeker’s Allowance
ESA Employment and Support Allowance
IS Income Support
HB Housing Benefit
WTC Working Tax Credit
CTC Child Tax Credit
AA Attendance Allowance
DLA Disability Living Allowance
DLA_C Disability Living Allowance, Care component
DLA_M Disability Living Allowance, Mobility component
PIP Personal Independence Payment
PIP_DL Personal Independence Payment, Daily Living component
PIP_M Personal Independence Payment, Mobility component
IIDB Industrial Injuries Disablement Benefit
CA Carer’s Allowance
SP State Pension
PC Pension Credit
CB Child Benefit