Family Resources Survey: quality assessment report

Question 1

1.  Introduction

Accepted Answer

This report contains information on the Family Resources Survey (FRS) data sources used by the Department for Work and Pensions (DWP), as well as quality assessments on each of them. The assessment considers the journey of the data from its collection, through processing and analysis and ultimately to publication.

The UK Statistics Authority have published a regulatory standard including a Quality Assurance of Administrative Data (QAAD) toolkit for administrative data. A QAAD report is designed to set out how the producer has explored the administrative data source and assured themselves that the data are of sufficient quality to produce statistics.

The Code of Practice for Statistics emphasises that producers of statistics, regardless of whether they are administrative or survey based, must clearly communicate information on the quality of statistics to users.

“Quality requires skilled professional judgement about collecting, preparing, analysing and publishing statistics and data in ways that meet the needs of people who want to use the statistics.”

Statistics produced from administrative data can use the Quality Assurance of Administrative Data (QAAD) toolkit to provide assurance of data quality to users. This report provides users of the FRS data with a similar framework of assurance for survey data. As such, FRS data has been assessed against the four specific areas for assurance included in the QAAD toolkit. These have been adapted to apply to survey data.

The report also recognises the development of the FRS to make use of administrative data to improve quality.

In summary, the main strengths and limitations of the FRS data are:

Current strengths

The FRS has a long history of review and development, having been run in Great Britain for over 30 years and in Northern Ireland for over 20 years.
A specific tailored, stratified sample design is used, constructed to reduce sampling error.
Face-to-face interviewing enables the collection of data on a wide range of topics, including many personal and family characteristics which are not available from administrative sources.
Effective modes of communication between data collectors, data suppliers, dataset producers and data security teams are well established.
The suppliers of survey data obtained from FRS interviews have established and agreed data processing procedures. Careful attention is paid to the accurate collection of survey information, followed by meticulous data processing, editing, and quality assurance.
DWP has access to a range of high-quality administrative datasets covering the main benefits reported in the FRS dataset, that are utilised for the purpose of editing benefit amounts. The accuracy of these adds to the quality of the final FRS dataset.
DWP has an ongoing dialogue with expert policy and academic research users of the FRS dataset and FRS-based statistics, including annual independent quality assurance of the datasets.

Current limitations

We are aware that response rates have fallen since 2020 to 2021. As the lower the response rate to a survey, the greater the likelihood that those who responded are significantly different to those who did not, we recognise the greater risk of systematic bias in our survey results
Robust analysis of the data is only applicable at the geographic level of Government Region
The FRS questionnaire is lengthy and demanding and a key concern is, where possible, to reduce (or at least not increase) its length, so as not to overburden respondents or interviewers
With over two thousand variables and their associated values collected per interview, not every item can be individually quality assured
The use of administrative data for editing benefit amounts is limited by access to the most appropriate administrative dataset; and for some of these datasets, DWP is not the data controller

1.1. Background

The primary objective of the FRS is to provide DWP with information to inform the development, monitoring and evaluation of social welfare policy. Detailed information is collected on respondents’ incomes from all sources including benefits, tax credits and pensions; housing tenure; expenditure on housing; caring needs and responsibilities; disability; education; childcare; family circumstances; child maintenance; material deprivation, household food security and food bank usage.

The FRS datasets and published information are accredited official statistics, which are called National Statistics in the Statistics and Registration Service Act 2007. They were independently reviewed by the Office for Statistics Regulation (OSR) in 2011 and were then confirmed as National Statistics by the OSR in November 2012. This means that they comply with the standards of trustworthiness, quality and value in the Code of Practice for Statistics and should now be labelled ‘accredited official statistics’.

The OSR sets the standards of trustworthiness, quality, and value in the Code of Practice for Statistics that all producers of official statistics should adhere to. Our statistical practice is regulated by the OSR. You are welcome to contact the team.frs@dwp.gov.uk directly with any comments about how we meet these standards.

1.2. List of Datasets

These datasets are used in the production of this FRS accredited official statistic. A description of them and their uses is below.

Survey data sources

The FRS is a continuous survey which collects information on the incomes and circumstances of individuals living in a representative sample of private households in the United Kingdom. The survey has been running in Great Britain since October 1992 and was extended to cover Northern Ireland in the survey year 2002 to 2003.

The Office for National Statistics (ONS), National Centre for Social Research (NatCen) and Northern Ireland Statistics and Research Agency (NISRA) conduct the operational aspects of the FRS which are not dealt with in-house within DWP. This includes the implementation of questionnaire changes that DWP requests, drawing the survey sample, day-to-day fieldwork management, the collation of data across the three organisations and delivery of the combined dataset to DWP. This also considers region-specific variations required for circumstances in Northern Ireland. Fieldwork in Northern Ireland is conducted by NISRA.

Processing of state-support data used to be a purely manual task, with the FRS analytical team using eligibility guidelines and information about the individual or benefit-unit (family) circumstances to determine a value for any missing or spurious benefit amounts.

Changes in the lawful basis for linking in 2018, together with a focused investment in improving linking methodology, have meant that at least 95% of FRS respondents can be linked to their administrative records. This provides the opportunity to achieve substantial data quality, timeliness and cost efficiency gains through the use of administrative data into the FRS.

DWP has access to a range of high-quality administrative datasets covering the main benefits reported in the FRS dataset.

Figure 1: Administrative data sources used for linking benefit records

Benefit	Benefit Code	Admin Data Source
Child Benefit	3	RAPID
Attendance Allowance	12	WPLS
Disability Living Allowance	1 (care), 2 (mobility)	WPLS
Personal Independence Payment	96 (daily living), 97 (mobility)	PIP Publication Dataset
Carer’s Allowance	13	WPLS
Employment and Support Allowance	16	WPLS
Income Support	19	WPLS
Jobseeker’s Allowance	14	WPLS
Universal Credit	95	UCFS
Housing Benefit	94	SHBE
State Pension	5	WPLS
Pension Credit	4	WPLS
Tax Credits	90 (working tax credit), 91 (child tax credit)	RAPID

Question 2

2.  Quality assurance of data assessment

Accepted Answer

2.1. UK Statistics Authority toolkit

The assessment of the FRS data sources has been carried out in a way which is parallel to the approach in the QAAD toolkit. The toolkit sets out four levels of quality assurance that may be required of a dataset:

A0 – no assurance
A1 – basic assurance
A2 – enhanced assurance
A3 – comprehensive assurance

Level A1 – basic assurance

The statistical producer has reviewed and published a summary of the survey data quality assurance (QA) arrangements.

Level A2 – enhanced assurance

The statistical producer has evaluated the survey data QA arrangements and published a fuller description of the assurance.

Level A3 – comprehensive assurance

The statistical producer has investigated the survey data QA arrangements, identified the results of independent audit and published detailed documentation about the assurance and audit.

To determine which assurance level is appropriate for a statistics publication it is necessary to take a view of the combination of the level of data quality risk and the public interest profile of the statistics. The UK Statistics Authority states that the A0 level is not compliant with the Code of Practice for Statistics.

Figure 2: UK Statistics Authority quality assurance of administrative data (QAAD) risk and profile matrix

Level of risk quality concerns	Public interest profile: Lower	Public interest profile: Medium	Public interest profile: Higher
Low	Statistics of lower quality concern and lower public interest [A1]	Statistics of low quality concern and medium public interest [A1/A2]	Statistics of low quality concern and higher public interest [A1/A2]
Medium	Statistics of medium quality concern and lower public interest [A1/A2]	Statistics of medium quality concern and medium public interest [A2]	Statistics of medium quality concern and higher public interest [A2/A3]
High	Statistics of higher quality concern and lower public interest [A1/A2/A3]	Statistics of higher quality concern and medium public interest [A3]	Statistics of higher quality concern and higher public interest [A3]

Source: Office for Statistics Regulation

2.2. Assessment and justification against the QAAD risk and profile matrix

The FRS survey data, as the main source of data for the FRS, together with supporting sources of administrative data on benefits, has been evaluated according to the QAAD toolkits risk and profile matrix (Figure 2), reflecting the level of risk to data quality and the public interest profile of the statistics.

FRS data is regarded as being a medium risk of data quality concern.

There is a clear formal agreement, between DWP and its data suppliers, of what data will be provided, when, how and by whom. It is accepted that the risks are increased when there are multiple data collection sources; these being ONS, NatCen and NISRA. However, the risk is somewhat mitigated by each organisation using the same questionnaire, interviewer instructions and editing software. The length of time that each organisation has been involved in the FRS and the experience they have built up adds considerable value and provides assurances that lower the risk. This has also added to the quality of regular and effective communication between all partners in the data supply process.

Whilst every effort is made to collect data to the highest quality, as with all survey data it is dependent on suitable data sources, sound methods and assured quality. Checks are made throughout the process from survey sample design, questionnaire consultation, collection of the data to producing the statistical dataset. However, it is acknowledged that some respondent errors and both sampling and non-sampling errors may occur.

FRS datasets are regarded as higher public interest.

FRS data informs income-based poverty measures, generates statistics that are reported widely in the media and impacts upon state support policy. FRS data underpins the DWP Policy Simulation Model (PSM) which is used for the development and costing of policy options. The survey-based dataset is also used extensively by academics and research institutes for economic and social research purposes.

The FRS contains information used by other government departments, particularly for tax and benefit policy analysis by His Majesty’s Treasury and His Majesty’s Revenue and Customs. Other users include the Ministry of Justice, the Department for Education and the Department for Environment, Food and Rural Affairs.

Therefore, as defined by the risk and profile matrix (Figure 2), the combination of medium level of risk of data quality concerns, and higher public interest profile indicate that an enhanced level of assurance [A2] is the minimum level required for FRS statistics.

2.3. Practice areas of quality assurance

The four practice areas under which FRS data are assessed are:

operational context and data collection
communication with data supply partners
quality assurance principles, standards and checks applied by data suppliers
producer’s quality assurance investigations and documentation

Each of the four practice areas are evaluated separately, and the respective level of assurance is stated. This approach provides:

an in-depth investigation of the areas of particular risk of interest to users
a demonstration of evidence of how identified risks are managed and mitigated
transparency in how communication with data suppliers ensures a common understanding of any quality issues
a clear explanation of the strengths and limitations of the data

Evidence for how the FRS data meets the requirements of an enhanced level of quality assurance is outlined in Sections 3 to 6 below.

Question 3

3.  Operational context and data collection (matrix score A2)

Accepted Answer

This demonstrates our understanding of the environment and processes in which the data are being collected and the factors that we have identified that might increase the risks to the quality of our survey data.

3.1. Population and sample selection

The FRS sample is designed to be representative of all private households in the UK. The sampling frame excludes people who are living in communal settings, e.g. nursing homes, halls of residence, barracks or prisons, and people living in temporary (bed and breakfast) accommodation. This is intentional in the design of the survey, as in addition to these properties being difficult to access by interviewers, the purpose of the FRS is to obtain information on “household” income and circumstances.

A Household is defined as “one person living alone, or a group of people (not necessarily related) living at the same address, who share cooking facilities and share a living room or dining area”.

One of the strengths of the FRS is that it collects many personal and family characteristics which are not available from administrative sources. This means that the FRS can be used to analyse income and benefit receipt in ways which are not possible from administrative sources alone.

One of the known limitations of surveys is non-response. The lower the response rate to a survey, the greater the likelihood that those who responded are systematically different to those who did not, and so the greater the risk of systematic bias in the survey results. The FRS stratified sample structure is designed to minimise the impact of non-response being different for different types of households in the achieved sample.

The sampling frame in Great Britain

The Great Britain FRS sample is drawn from the Royal Mail’s small users Postcode Address File (PAF). One of the main advantages of using PAF is that it organises address information into a standard format and is updated daily. The small users PAF is limited to addresses which are not flagged with Royal Mail’s “organisation code”. For the purpose of drawing the FRS sample, an updated version of this list is obtained twice a year.

By only using the small-user delivery points most large institutions and businesses are excluded from the sample. Small-user delivery points which are flagged as small business addresses are also excluded. However, some small businesses and other ineligible addresses remain on the sampling frame. If sampled, they are recorded as “ineligible” once the interviewer verifies that no private household lives there.

Sample design in Great Britain

The Great Britain FRS uses a stratified two-stage probability sample of addresses. The survey samples 3,407 postcode sectors, from around 9,200 in Great Britain, with a probability of selection that is proportional to size. Each postcode sector is known as a Primary Sampling Unit (PSU). The PSUs are stratified by 27 regions and three other variables, described below, derived from the Census of Population. Stratifying ensures that the proportions of the sample falling into each group reflect those of the population.

Within each region, the postcode sectors are ranked and grouped into eight equal bands using the proportion of households where the household reference person (HRP) is in National Statistics Socio-Economic Classification (NS-SEC) 1 to 3. Within each of these eight bands, the PSUs are ranked by the proportion of economically active adults aged 16-74 and formed into two further bands, resulting in sixteen bands for each region.

The bands are then ranked according to the proportion of economically active men aged 16-74 who are unemployed. This set, known as “stratifiers” is chosen to have maximum effectiveness on the accuracy of two key variables: household income and housing costs. Within each PSU a sample of addresses are selected; typically 28 households per PSU each year.

Figure 3: A representation of the FRS sampling frame

The PSUs (postcode sectors) are represented by the small squares. They are assigned to strata, represented by the shading (note that only 6 strata are depicted here).
A sample of PSUs is drawn at random from each stratum (non-shaded PSUs are not visited).
A single PSU is shown, containing many households. A random sample of those (shown brighter) are selected to be interviewed.

The FRS sample stratification variables for Great Britain are as follows:

Regions:

19 in England (inc. Metropolitan vs non-Metropolitan split, 4 in London)
2 in Wales
6 in Scotland

The proportion of households where HRP is in NS-SEC 1 to 3:

8 equal bands

The proportion of economically active adults aged 16-74:

2 equal bands

The proportion of economically active men aged 16-74 who are unemployed:

Sorted within above bands

Each year, half of the PSUs are retained from the previous year’s sample, but with new addresses chosen. For the other half of the sample, a fresh selection of PSUs is made (which in turn will be retained for the following year). This is to improve comparability between years.

Sampling in Northern Ireland

The sampling frame employed on the Northern Ireland FRS is derived from the NISRA Address Register (“NAR”). The NAR is developed within NISRA and is primarily based on the Land and Property Services (LPS) POINTER database, the most comprehensive and authoritative address database in Northern Ireland, with approximately 752,000 address records available for selection. A systematic random sample is selected for the Northern Ireland FRS from the NAR. Addresses are sorted by local council and ward, so the sample is effectively stratified geographically.

Owing to its sample design, the FRS cannot be used to provide robust estimates at Local Authority level, meaning that the lowest level of geography for which the FRS can be used to provide estimates is Government Office Region.

3.2. Sampling Error

All survey estimates have some degree of sampling error attached to them, which stems from the variability of the observations in the sample. From this, a margin of error (confidence interval) is estimated, which indicates the likely range of results that would appear if the same survey were to be conducted many times with a different sample whilst maintaining the same characteristics as the current sample.

It is this confidence interval, rather than the estimate itself, that is used to make statements about the likely ‘true’ value in the population; specifically, to state the probability that the true value will be found between the upper and lower limits of the confidence interval. In general, a 95% confidence interval is calculated within which there is a 95% chance that the true value of the population is found. A narrower confidence interval is generally indicative of a more precise estimate of where the true value lies.

Figure 4: How confidence intervals communicate the precision of an estimate

The FRS sample in Great Britain, as described earlier, is selected using a stratified design, based on addresses clustered within postcode sectors. As a result, FRS sampling error is not just dependent on the variability between sample units (households, benefit units, individuals), but also on the variability between postcode sectors.

For example, if a sample characteristic is distributed differently by postcode sector (i.e. is clustered) the representativeness of a given sample is reduced, and the variability of repeated samples would be greater overall than would occur in a simple random sample of the same size. Therefore, the mathematical accounting for clustering causes the (actual) sampling error to be greater than a sampling error calculated under the assumption of simple random sampling.

Stratification attempts to account pre-emptively for some of the variation between clusters using information at a cluster level known prior to the survey, and so its effect is to reduce the sampling error, relative to what it would otherwise have been. Clustering is not used in Northern Ireland, but households are contacted using systematic random sampling, which has a similar effect to stratification.

Following the survey fieldwork, far more is known about those sampled. For certain characteristics, this information can be compared to known population totals and weighted to ensure proportionate representation of those otherwise disproportionately sampled (see the Grossing Information in Section 6 of the FRS Background Information and Methodology). This can be thought of as increasing the representativeness of the sample or reducing the variability that would be observed if making repeated samples. The effect is therefore similar to stratification and reduces the sampling error. For this reason, this process is also called post-stratification.

Communicating uncertainty within FRS-based Estimates

Whilst the FRS sample is designed to minimise sampling error, naturally there remains a level of uncertainty to the estimates produced from the survey. To help quantify the level of uncertainty associated with a selection of FRS-based estimates, standard errors, design factors and confidence intervals are produced. A larger standard error or wider confidence interval would indicate a greater degree of uncertainty around the FRS-based estimate in question, whilst an increase in an estimate’s design factor would indicate a loss in precision through using a complex sample design with post stratification when compared to a simple random sample estimate.

Standard errors, design factors and confidence intervals vary from survey estimate to survey estimate. A new standard error methodology was introduced from the 2021 to 2022 publication, following a similar method to that used in the HBAI publication since the financial year ending 2016. Standard errors, design factors, and confidence intervals on estimates are now calculated using a bootstrap resampling method that accounts for the complex survey design and post-stratification weighting as fully as possible. An overview of this methodology, written by the Institute for Fiscal Studies, can be found in this methodological note from 2017.

A perfect method for calculating variability in survey estimates would be to have performed the survey many times, independently, and measure the range of estimates observed from so doing. As the survey fieldwork has been performed only once, the method of estimating uncertainty is as follows:

Uncertainty is approximated by treating the achieved sample as if it were the population and repeatedly drawing sub-samples at random from that achieved sample
This process is referred to as ‘resampling’ and the result is a series of ‘resamples’. Resamples are drawn to mimic the original sampling methodology and replicate its effects on the reliability of any results. This means that stratification and clustering information and systematic random sampling processes are used to replicate the original FRS household selection process
Households can be selected multiple times, since resamples are drawn with replacement, and the unequal probability of selection of households is accounted for by using a grossing factor. Each FRS resample is smaller than the original sample size by about two thirds. The magnitude of the estimates of uncertainty is driven by the ratio between the size of the original FRS sample and the size of the resamples. In this case, since the resamples have a smaller sample size than that of the original FRS sample, it is likely that the estimates of uncertainty presented in the Methodology and Standard Error tables are more likely to be overestimates of the true uncertainty in the FRS sample rather than underestimates
Once households are selected, each FRS household is assigned a grossing factor in a process identical to the full sample. Altogether, this produces a series of alternative samples from which to calculate a series of alternative estimates

The variability in these alternative estimates is used to quantify the uncertainty in the original estimate in two ways:

The standard deviation of these alternative estimates is the approximate standard error of the original estimate,
The series of estimates produced are ranked by ascending size, and the 2.5th and 97.5th percentiles extracted. These are then used in calculating the approximate 95% confidence interval around the original estimate.

The size of the actual standard error relative to the standard error calculated under the assumption of simple random sampling is represented by the design factor, which is calculated as the ratio of the two. Where the standard errors are the same, the design factor equals one, implying that there is no loss of precision associated with the use of a complex sample design with post-stratification.

Conversely a design factor of less than one implies the FRS estimate is more precise than would be obtained from a simple random sample. In many cases, the design factor will be greater than one, implying that FRS estimates are less precise than those of a simple random sample of the same size due to the clustered sampling used within the FRS sample design.

Published Methodology and Standard Error Tables provide standard errors, design factors and confidence limits for a selection of variables from the survey. An example of how to interpret figures in this table is as follows:

Example: Uncertainty measures for household composition

Suppose that published tables show that 73% of households did not contain any children, and the standard error is estimated as 0.1 percentage points. These estimates form the final point estimates for the proportion of households without children and this group’s associated standard error.

The design factor for this variable is 0.3. This means that the effect of using a complex survey design and post-stratification, rather than a simple random sample, has led to a reduction in uncertainty of 70%, when using standard error as the measure of uncertainty.

In contrast, a design factor of 1.5 would have denoted an increase in such uncertainty of 50%. Among smaller groups, such larger design factors are not uncommon.

The 95% confidence interval is given as 72.4% to 72.7%. This means that if sampling error is the sole source of error present, there is a 95% chance that the true percentage of households without children lies within this range. Whilst it may appear that the estimate is not within the confidence interval, this is generally due to the rounding applied to the published estimate. It is also important to note that the confidence limits are drawn from the sampling distribution of proportions that the bootstrapping process generates, whilst the point estimate is derived from the ‘original’ FRS sample.

A methodology paper is available for information on estimating variance and confidence intervals in special circumstances e.g. where the number of occurrences of a response in the sample are very small.

3.3. Non-sampling error

Non-sampling errors are systematic inaccuracies in the sample when compared with the population. Non-sampling errors arise from the introduction of some systematic bias in the sample compared with the population it is supposed to represent.

As well as response bias, such biases include inappropriate definition of the population; misleading questions; data input errors; data handling problems; or any other factor that might lead to the survey results systematically misrepresenting the population. There is no simple control or measurement for such non-sampling errors, although the risk can be minimised through careful application of the appropriate survey techniques from the questionnaire and sample design stages through to analysis of results.

It is not possible to eliminate non-sampling error completely, nor can it be easily quantified. However, non-sampling error is minimised in the FRS through:

effective sample design (as described in Section 3.1)
authoritative questionnaire design (as described below)
active fieldwork management (as described below)
strategies to improve response rates (as described below)
the use of skilled and experienced interviewers (as described below)
extensive quality assurance of data (as described later in Sections 5 and 6)

Data collection and fieldwork management

Data is collected by other organisations on behalf of DWP. In Great Britain, ONS and NatCen conduct fieldwork for the FRS. In Northern Ireland the sampling and fieldwork for the survey are carried out by the Central Survey Unit at NISRA. Between them these organisations have operational responsibilities for drawing the sample, programming the survey questionnaire, enacting annual changes which are specified by DWP, contacting the selected households, and some initial data processing.

With fieldwork, each month the Great Britain sample is randomly divided between the two sets of interviewers, with 35% assigned to ONS and 65% assigned to NatCen. The UK set of selected addresses is then assigned to the relevant interviewers. Before interviewers visit the selected addresses, a letter is sent to the occupier explaining that they have been chosen for the survey and that an interviewer will visit the address soon. The letter and accompanying leaflet emphasise that information given in the interview will be treated in the strictest confidence and used only for research and statistical analysis purposes. Further information is provided on the ONS website Family Resources Survey - Office for National Statistics which also explains that the survey relies on the voluntary co-operation of respondents.

The main face-to-face contact with respondents is via doorstep contact. If contact is not made on the first attempt, the interviewer is required to make additional visits to an address. These visits must be made at different times of the day and on different days of the week, including at least one weekend attempt. If more than one household receives mail at an address a single household is interviewed.

Addresses returned as non-contacts or partial refusals can sometimes be re-issued to another interviewer where appropriate, in the hope that an interview at the non-responding household can still be achieved. Interviewing at re-issued addresses can be carried out at any point in the remaining survey year.

Response

To apply a quality measure to how reliable an interview is, a household is defined as fully co-operating when:

an interviewer has been able to interview all adults aged 16 and over
- except those aged 16 to 19 who are classed as dependent children
there are fewer than thirteen ‘don’t know’ or ‘refusal’ answers to monetary amount questions in the benefit unit schedule
- excluding the assets section of the questionnaire.

Proxy interviews are accepted when a household member is unavailable for interview. All data in the FRS dataset and the published statistics in all FRS-based publications refer only to fully co-operating households.

Response rates are calculated as follows:

Response rate = Number of fully cooperating households ÷ Number of all eligible households x100

For a UK survey of the size and complexity of the FRS, a response rate of around 50% was considered reasonable, prior to the COVID-19 pandemic. Response rates of around 30% have become more prevalent since the 2020 to 2021 survey year. However, to ensure that FRS survey data (as with many other social surveys) is still representative of the population, technical measures have been applied, such as changes to the mode of interview and how the sample is weighted.

The Background Information and Methodology document accompanies this and every recent FRS publication, providing details of such changes; together with a technical paper, as required, outlining further quality assurance of the processing of data and production of statistics, particular to each survey year. Methodology tables are also published alongside the main publication, summarising the UK household response rate, regional response rates and the reasons given for refusal if provided.

Households that are not fully co-operating are classified as partially co-operating, refusals, or unable to make contact. A partially co-operating household is one where a full interview has been obtained from the Household Reference Person’s (HRP’s) benefit unit, but others in the household have not co-operated at all or only to a very limited extent.

Refusals include those residents of an address that contact head office to refuse to participate and residents who refuse to participate in the survey when contacted by the interviewer, either by telephone or on the doorstep.

Those who aren’t available to proceed with the interview for other reasons, such as being away throughout the fieldwork period, are counted in “Total number of refusals” when numbers are reported in published methodology tables. However, for more detailed analysis of non-response rates, such as why respondents are refusing to participate, those who are not available are recorded as “non-contacts”, as the interviewer is unable to establish whether they would have chosen to participate if they had been available. The category of “non-contact” only includes those addresses in the issued sample where the interviewer has confirmed that the address is eligible for the survey, but they are unable to contact residents to ask them to participate.

Any information that can be obtained about non-respondents is useful both in terms of future attempts to improve the overall response rate and potentially in improving the weighting of the sample results. Direct information about the non-responding households is valuable although, by definition, difficult to obtain. Monitoring of the components of non-response including the rate of refusals and non-contacts is carried out by the FRS team, using information from monthly performance indicators provided by ONS (see later section). Further investigation into the breakdown of the numerous categories of both refusal and non-contact recorded by interviewers can assist data suppliers in designing and evaluating methods for increasing response.

Interviewer training

Interviewers are trained in the running of an FRS interview prior to commencing interviews. They also receive training in the collection of financial and other sensitive information. This has the advantage over other modes of survey collection, because during a face-to-face interview trust is built and interviewers can assist respondents with understanding complicated questions. The main emphasis is on collecting accurate information, with respondents asked to consult documentation wherever possible to verify figures. This aids the consistency of data capture across different survey years, which is essential.

Interviewers new to the FRS are briefed on the questionnaire and an annual re-briefing is given to all interviewers on changes to the questionnaire. All interviewers working on the survey have the opportunity to describe their experiences with specific parts of the questionnaire and comment on how changes were received in the field. This feedback is provided at the end of each interview, and it is collated into a written report.

Questionnaire design

As part of the process of agreeing annual questionnaire changes, suggestions from users are also considered, as well as those arising from an evaluation of feedback from interviewers. Any changes to the questionnaire are checked for consistency with the harmonised standards for social surveys across government.

Each year, DWP runs a questionnaire consultation and draws up a list of possible questionnaire changes. Users are asked to identify individual questions or sections which are no longer of interest. The FRS questionnaire is lengthy and demanding and a key concern is, where possible, to reduce (or at least not increase) its length, so as not to overburden respondents or interviewers.

New questions are added with the expectation that they will produce useful data that can be delivered to users through additional variables and used to support future policy analysis. Some changes are made to improve the interview experience or to support improvements to data processing.

Operationally, changes are tested on-screen by ONS, NatCen and NISRA once coded; and several test versions of the questionnaire are provided to DWP during the survey year, each with a successively greater number of the year’s changes encoded, culminating in a final version which includes all changes.

Change control systems record all changes to the questionnaire, with individual forms for each change documenting the request for a change, reasons for decisions taken and how it has been implemented. Changes to the resulting dataset are documented in an output spec which assists DWP with quality assuring the dataset once delivered.

Completion and development of the change control arrangements is a joint initiative between all parties with regular communication between DWP and data suppliers. The output spec, along with the updated dataset metadata, document in detail the conversion process from questionnaire variables to output variables and therefore form an integral part of the change control documentation.

The new variables are released in the published dataset subject to successful quality assurance, but not all are added to the main FRS publication on GOV.UK; or to the set of FRS tables available from the Department’s Stat Xplore tool. This decision is made considering user interests and the need for disclosure control.

Strengths

Specific tailored stratified sample design. Stratification can pre-emptively account for some of the variation between clusters, so its effect is to reduce sampling error.
The questionnaire consultation process is integral to the annual development of the survey, involving collaboration between users, suppliers and producers.
Effective data collection processes, with trained interviewers.
Face-to-face interviewing enabling the collection of data on a wide range of topics, including many personal and family characteristics which are not available from administrative sources, together with sometimes sensitive information.

Limitations

Owing to its sample design, the FRS cannot be used to provide robust estimates at Local Authority level.
The lower the response rate to a survey, the greater the likelihood that those who responded are significantly unlike those who did not, and so the greater the risk of systematic bias in the survey results.
The FRS questionnaire is lengthy and demanding and a key concern is, where possible, to reduce (or at least not increase) its length, so as not to overburden respondents or interviewers.

Question 4

4.  Communication with data supply partners (matrix score A2)

Accepted Answer

This provides evidence of how the FRS maintains effective relationships with data suppliers through a memorandum of understanding (MOU). It includes the provision of regular performance reports and documentation of change management processes and the consideration of statistical needs when changes are being made, for example to either the sampling design or the questionnaire.

DWP are the Data Controller for the FRS. DWP determines the interview content (questionnaire), the funding, sample size and address selection policy, data quality assurance procedures and the timing and content of the FRS annual publication.

The collection, processing and transfer of data is governed by a Data Protection Impact Assessment (DPIA), and a Security Assurance for Research and Analysis (SARA). UK GDPR Principle 7 – accountability requirements are met. Data are shared under section 45A of the Statistics and Registration Service Act 2007. The Lawful Basis for Processing is UK GDPR Article 6(1)(e) - Public Task. It is recognised that Special Category data is captured, such that the respective Lawful Basis for Processing is UK GDPR Article 9(2)(j) - Archiving, research and statistics.

The MOU states that all data processors, and where applicable sub-processors, must have in place procedures for storing and transferring FRS data using appropriately secure methods. Transfer is either via PGP encrypted email or GlobalScape, with oversight applied by DWP Data Security to all inbound receipts.

DWP analysts are invited to shadow interviewers in the field. This enables analysts to observe the practicalities of how the data are collected. Potential areas for confusion in questions, and how trained interviewers address these, is important in understanding how well the data values reflect a household’s actual circumstances.

4.1. Field Report

The key objective of this report is to provide DWP with feedback of a qualitative nature on the questionnaire, directly from the people who collect data from respondents. To assess how well any new questions or other changes are performing in the field, this data is collected via feedback from interviewers at the end of the survey. The report describes the feedback from interviewers and fieldwork operations staff.

Interviewers are advised that their feedback will be taken into consideration for changes to the FRS questionnaire in the future. Whilst some interviewers provide feedback on questions, a much larger number provide none, which is more indicative that they are satisfied with how the questionnaire is operating than an indifferent reaction to the request for feedback.

Several other reports are used by ONS and the DWP FRS team to communicate information during the survey year:

Monthly Performance Indicator Reports
Issues Log
Annual Report

4.2. Monthly Performance Indicator Reports

The aim of these reports is to keep DWP informed on the key metrics which may affect quality, in terms of number of productive interviews successfully conducted, compared to the numbers issued, to each organisation. Response rates are determined by the capacity of available interviewers, their ability to make contact with a sampled household and the willingness of the members of that household to participate in the interview.

The first of these has a greater impact, because with a larger stock of trained interviewers there is a greater possibility of contacting further households, should early attempts at productive interviews fall short of the target numbers.

The main metrics in the monthly report are response rates, broken down by Full, Partial, Proxy and Follow-up responses.

Other metrics that are provided are:

Ineligible addresses
Regional response rates
Interview timings by organisation
Encashment rates of incentive vouchers
Cumulative Target vs Achieved cases
Quarterly interviewer numbers

These reports provide input to regular discussions between the DWP FRS Team Leader and the ONS Family Resources Survey Lead. Representatives from NatCen and NISRA are also involved in discussions when appropriate. During these meetings there is the opportunity to discuss any newly emerging quality concerns and/or updates on actions taken to mitigate previously identified risks. Wider bilateral meetings between senior leaders in ONS and DWP allow for the opportunity to address any issues as they emerge, discussing the potential for possible mitigation strategies.

On a monthly basis the regional response rates are analysed by the DWP FRS team, looking at changes over a longer time-series of 2-3 years or more. Comparisons are also made of the regional distribution of the sample, compared to the UK population; to examine how representative the achieved sample is likely to be for the survey year. The identification of any region that may have substantially lower response rates than the UK average, either consistently or in a particular month allows early investigation and possible actions to be taken to address the issue.

These Performance Indicator reports, are communicated to senior DWP colleagues. Colleagues in the devolved administrations of Scotland, Wales and Northern Ireland receive quarterly reports on response rates for their geographic area, including how these compare to the UK average and to historic years.

Communicating the differences between regions and between survey months also assists the processing team with understanding possible risks to data quality.

4.3. Issues Log

This is a detailed log of any issues that are uncovered within the data, by either ONS or DWP. Each item is logged by description, date and identifier. Any supporting information is linked / embedded.

Regular updates to this are made by DWP analysts, with discussions being held at the 6-month and 12-month stage to enable resolution of issues as necessary. Items that remain live at the 6-month stage are flagged to be followed up before the wider 12-month Data QA meeting.

Some issues can be resolved within the existing development dataset, but others require further work by ONS that can only be resolved by a re-issue of the dataset to DWP. A later redelivery (resupply) of both the 6-month and the 12-month datasets is standard, recognising the adjustments required for a dataset with many inter-dependencies.

In circumstances where issues are discovered, which require fundamental changes to the underlying data structure, a further resupply would then be made by ONS.

4.4. Annual Report

The annual report focuses on four areas:

Fieldwork summary for the year just concluded
Quality assurance
Staffing levels
Overall successes and areas for improvement

The quality assurance section covers, amongst other things, a breakdown of the number of individual checks carried out at each validation stage.

Strengths

Effective lines of communication with between data collectors (interviewers) and data suppliers (ONS, NatCen and NISRA) and onwards to DWP, have been established and developed over the lifetime of the survey and continue to evolve.
Weekly discussions between the DWP FRS Team Leader and the ONS FRS Lead provide an opportunity to discuss any newly emerging quality concerns and/or updates on actions taken to mitigate previously identified risks.
Wider bilateral meetings between senior leaders in ONS and DWP allow for the opportunity to address any issues as they emerge, discussing the potential for mitigation strategies and future developments.
Formal agreements, whether they be contracts or MOUs, are carefully scrutinised by Data Security, Data Protection, Legal and Commercial teams to ensure that they are fit-for-purpose. This provides a clear line of accountability.

Limitations

No material limitations have been identified in the communication with data supply partners.

Question 5

5.  Quality assurance principles, standards and checks by data supplier (matrix score A2)

Accepted Answer

This relates to the validation checks and procedures undertaken by the data supplier, any process of audit of the operational system and any steps taken to determine the accuracy of the data.

5.1. Survey interview and post-interview quality assurance

Microsimulation is central to DWP’s use of the data. Therefore, careful attention is paid to the accurate collection of survey information followed by meticulous data processing, editing, and quality assurance. ONS, NatCen and NISRA carry out a range of editing tasks on the captured survey response data, before its transmission to DWP. An overview of these stages and timeline is given below:

Figure 5: Data supplier processing

The stages in the validation, editing and conversion process are as follows:

5.2. The interview

One of the benefits of interviewing using Computer Assisted Personal Interviewing (CAPI) is that in-built checks can be made at the interview stage. This helps to check respondents’ answers and that interviewers do not make keying errors. There are checks to ensure that amounts are within a valid range and cross-checks which make sure that an answer does not contradict a previous response.

However, it is not possible to check all potential inconsistencies, as this would slow down the interview to an unacceptable degree, and there are also capacity constraints on interviewer notes. FRS interviewers can override most checks if the answers are confirmed as accurate with respondents.

A problem inherent in all large surveys is item non-response. This occurs when a household agrees to give an interview, but either does not know the answer to certain questions or refuses to answer them. This does not prevent them being classified as fully co-operating households because there is enough known data to be of good use to the analyst (although see the first paragraph of the Response section above for information about non-response to monetary questions).

Interviewers encourage respondents to consult documentation at all stages of the interview to ensure that the answers provided are as accurate as possible. For some items whether certain documents are consulted or not is recorded on the questionnaire. This assists FRS users in assessing the accuracy of the data.

5.3. Post-interview checks

Interview data is stored on interviewers’ encrypted, password protected laptops or tablets. Interviewers are instructed to transmit completed interviews as soon as possible so that data is not stored on the laptop or tablet unnecessarily.

Once an interview has taken place, data is returned to ONS, NatCen, or NISRA respectively. At this stage, editing takes place, based on any notes made by interviewers. Notes are made by the interviewer when a warning has been overridden, for example, where an amount is outside the expected range, but the respondent has documentation to prove it is correct. Office-based staff make editing decisions based on these notes.

Other edits taking place at this stage are checking amounts of fixed-rate benefits and, where possible, separating multiple benefit payments into their constituent parts, such as separating Disability Living Allowance into the Care and Mobility components.

Checks and enhancements to collated data

Data is collated and edited from all interviews on a monthly basis by teams at ONS, NatCen and NISRA. A limited number of edits are made at this stage, which include:

interviewer edits completed post-interview, such as occupation / industry coding etc
adding certain categorical variables, such as educational level and socioeconomic group
adding certain geographical variables, such as Broad Rental Market Area (BRMA), Lower Super Output Area (LSOA); and Council Tax and NI Rates information
either imputing or suggesting the imputation of various missing items such as net pay, tax, etc using algorithms supplied by DWP

Before further validation, FRS data is converted from CAPI format into SAS-readable tables. Using DWP specifications, SAS-readable tables are created by ONS, with each table displaying information from different parts of the questionnaire.

Both DWP and ONS then carry out checks on key input and output variables to ensure that the data have converted correctly to the new format. Checks include ensuring that the number of adults and children recorded is correct, and that records are internally consistent.

ONS conduct the first round of credibility checks on a monthly basis and these are sent with the initial data delivery. These flag potentially problematic cases and provide suggested edits.

If an error is identified in the data delivered to DWP, specifically one identified in the script that translates the questionnaire answers to SAS-readable files, it can take several attempts to revise the output script to ensure all issues have been addressed.

Strengths

ONS, NatCen and NISRA have established and agreed data assurance processes that evolve as needed to deliver quality data.
Careful attention is paid to the accurate collection of survey information, followed by meticulous data processing, editing, and quality assurance.

Limitations

If an error is identified in the data delivered to DWP, reviewing of outputs is often a manual checking task. It is necessary to ensure that variable categories that are output as final are as expected, given respondent’s answers to all other questions within the associated question block.
With over two thousand variables and their associated values collected per interview not every item can be quality assured by the supplier. For a survey of this magnitude there will always be the risk of unidentified errors. If these are identified by the producer, a re-issue of the dataset from the supplier (ONS) to the producer (DWP) may be required.

Question 6

6.  Producers’ quality assurance investigations and documentation (matrix score A2)

Accepted Answer

This demonstrates the quality assurance conducted by the FRS Team, including corroboration against other data sources.

The FRS dataset is used for a wide range of analyses beyond the published tables. For many users, the dataset is more important than the statistical releases themselves. The use of the FRS dataset for policy modelling places a premium on accuracy, in that an inaccurate dataset could lead to policy costs or benefits being incorrectly assessed, and/or a suboptimal choice of policy option. As small groups of cases could affect the results of user analyses, a thorough examination of case-specific information is made.

6.1. The FRS Interface

The original interface for processing FRS data was developed in the 1990s, as a SAS AF/SCL based application. Once it was determined that this technology had passed end-of-life in support terms, it was replaced with a new, HTML-led data management solution, built to modern standards. This was an important investment for the future of the FRS project. This solution (FRESCO) recreates many of the old interface’s functions, but also offers improved code version control, improved data viewing, on-screen editing and a part-automated anonymisation setup.

6.2. Producer expertise

The FRS Team at DWP features people from a mix of professions and experience working across all elements of the project together. Team members have distinct, but integrated roles, such that someone is responsible for questionnaire consultation or managing the publication process, but every member of the team is involved in an aspect of dataset processing and leading a topic of publication.

This team structure means that there is expertise across the whole project, so that people can be assigned to development projects, alongside the routine processing, to investigate and suggest improvements.

Clear desk instructions for all aspects of data processing are easily accessible and routinely updated, so expertise is not lost when team members change. These not only include how to carry out a processing function, but also why these actions are taken. This allows the rationale to be questioned and where necessary be improved over time.

An outline of the producer processing checks is presented below:

Figure 6: Data producer processing

6.3. Pre-processing checks

Some validation checks are performed as part of the Data Load itself:

That the data contains the tables and variables that are expected
That the data content for every variable is valid, that it is not missing (.) and that the values fall within certain minimum and maximum limits

The initial validation of survey data received from the suppliers involves:

Producing a Changes document, showing the changes in the data content since the last delivery of data
Manual data content validation for all variables and record types that have been changed since the previous data delivery, especially if this was for a previous survey year
Record Creation checks: checks between tables in the dataset to ensure (for example) high-level parent records can be linked to all expected lower-level child records, and vice versa
Set Type checks: investigates sets of variables that have a response pattern of “Yes”, “No”, “None” for invalid patterns
Skipped Important Variables check: a routing check for many of the most important variables on the FRS dataset, to identify where respondents may have been incorrectly routed to the wrong questions

The Initial Validation process checks that:

Variables are skipped where expected to be
Variables do not have values where skipped values would be expected
New variables and changed variables have values that are sensible
Records in tables have been created in line with their parent-level ‘flags’

Initial validation like this may occur after any new delivery of data and will always consider both new changes and previously outstanding issues.

6.4. State support validation

DWP validates all state support records within the FRS dataset. Information on benefit receipt is one of the key areas of the FRS, and it is very important that this section is thoroughly validated and cleaned. It is not appropriate to use imputation methods, such as hot-decking, algorithms or bulk edits (see below) for benefits data so instead a separate procedure of validation and editing is used.

Missing benefit amount values

Since 2019 to 2020 the FRS has made use of administrative data to check on the accuracy of the monetary amounts reported during the interview. The data is also used to check the respondent’s eligibility for the various elements of state support. The information includes respondents’ (true) amounts of benefit received, allowing closer editing of benefit rates.

For cases where a respondent had answered ‘yes’ to whether they are in receipt of a particular benefit, but did not give the amount received, we impute using linked data where possible, depending on the benefit. For benefits such as Universal Credit, where the rate could vary greatly depending on the circumstances of the respondent, we replace all reported amounts with linked amounts, because of the difficulty of making individual benefit assessments.

The process looks at instances where people have stated that they were receiving some form of state support; and where the pound amount reported was in some way nonstandard or otherwise questionable. FRS respondents are linked to all the administrative sources as listed in Table 1, Section 1.2, via a lookup file.

The strengths and limitations of the methods used for data linking to benefit records has been scrutinised by the FRS and associated FRS Transformation team, who are responsible for data linking. More detail on the data linkage process is available in the published technical report.

Use of the administrative data helps to resolve a substantial proportion, but not all erroneous reporting of benefit amounts on the FRS. Using administrative data in this way both helps to improve the accuracy of the monetary amounts and is more efficient than the previous manual benefit editing approach.

Use of administrative data reduces the time required to produce the final survey data, as it reduces the need for manually editing survey amounts. Most administrative data extracts are available shortly after the end of survey year, so are readily available when FRS processing begins.

Process of data linking for benefit editing

The lookup file is created for each survey year consisting of anonymised identifiers for FRS respondents (household, benefit unit and person), together with their encrypted NINOs. Respondents are linked by name, postcode and date of birth to DWP’s Customer Information System (CIS) to obtain their NINOs. CIS stores the names, dates of birth, and latest address information, for everyone who has been issued with a NINO. This is a form of deterministic matching, that is, a rules-based process to determine an “exact match” between two records.

The lookup file for data linking, containing personal details for respondents is supplied to DWP, under strict agreed security arrangements. Data supplied to DWP with the FRS survey response dataset does not contain the names and addresses of respondents. FRS analysts are only permitted access to one of these files, so that the combined information is never disclosed. This ensures that we adhering to data confidentiality and anonymity principles.

The strengths and limitations of the administrative data sources we use is provided below.

Universal Credit Full Service (UCFS) dataset

This is used for the Universal Credit (UC) editing, because it allows the separation of advance payments from the UC payment. This is used rather than Universal Credit Official Statistics (UCOS), because it has monthly records for each claim alongside payment dates.

All cases reporting that they receive UC, have been edited by replacing their reported pound amount with the UC administrative data amount, since the 2019 to 2020 survey year. We have not yet identified any limitations for admin-linked UC editing.

The following administrative data sources are used as advisory sources where there is a doubt over what a survey respondent has said they receive.

Work and Pensions Longitudinal Study (WPLS)

This is a frozen quarterly cut of the National Benefits Database. Its strengths are that it has quarterly snapshots for benefit claims, and accurate weekly benefit amounts. The only limitation is that it is possible that the admin editing code fails to capture very short spells on these benefits. However, the impact of this will be minor for some benefits. For example, the majority of PC and RP claims will be for a minimum of three months.

PIP Dataset

The main strength of this Personal Independence Payment dataset is that it has monthly records per claimant. This allows us to determine benefit receipt with a great deal of accuracy. The editing code looks at the monthly record just before the interview date to check the receipt. We have not identified any limitations for admin-linked benefit editing. However, checks will be carried out to ascertain whether discrepancies could occur between survey interviews and administrative data records for individuals.

Single Housing Benefit Extract (SHBE)

This data contains monthly snapshots of benefit records and expresses benefit amounts in weekly terms. This is its main strength. It has some limitations:

It does not contain any administrative records for Northern Ireland which means that receipt will be underestimated here to the extent that Housing Benefit (HB) is underreported in the raw survey data.
A lot of Housing Benefit claims have missing records for some but not all months. Where these months are missing around the time of interview, the editing code mitigates for this by using the payment closest to the interview date.

Registration and Population Interaction Database (RAPID)

This is primarily used for benefits that are administered by HMRC rather than DWP. It is therefore the only data source that DWP has ready access to for Child Benefit and Tax Credit data; this is its main strength. One limitation is that it contains just one record per person, per year and as such displays benefit amounts in annual terms.

We do not have access to the respective administrative data for several smaller non-DWP benefits such as the Armed Forces Compensation Scheme and War Widow’s/Widower’s Pension. For these benefits a more general method has been used, and an imputation decision has been made, based upon all the available evidence available about that person’s circumstances.

The following types of validation are also carried out for FRS benefits data:

Near-zero amounts

It is not possible for interviewers to enter zero amounts where it is inappropriate to do so. For example, in response to a question on receipt of benefit, a zero amount will result in a warning message being displayed. Some interviewers try to avoid this message by recording near-zero amounts. As a result, all near-zero values are examined, and a decision taken as to whether the value is genuine or whether the value should be treated as missing.

Multiple benefits

Any combined benefit amounts (for example where State Pension is paid with Attendance Allowance) are assessed on an individual basis and amended accordingly, depending on whether the data had errors or no errors. However, the reported total is preserved where possible.

Validation reports

Computer programs are run to carry out a final check for benefit entitlement and to output any cases that look unreasonable. All cases detected because of this final exercise are individually checked and edited where necessary.

It is acknowledged that some part of the benefit undercount in the FRS dataset is due to an under-representation of benefit recipients in the achieved FRS sample.

Each year we publish Methodology Table M_6a. This compares the grossed number of benefit recipients in the FRS data with the total caseload on benefit from administrative data sources. Typically for all benefits (except Income Support and Disability Living Allowance), the FRS numbers in receipt are below those seen in administrative data. The difference varies by benefit.

6.5. Other pre-imputation cleaning

Apart from state benefits, DWP also validates the other records on the FRS dataset. This includes several edits and checks:

Weekly amounts

In the FRS, most monetary amounts are converted to a weekly equivalent. To calculate this, respondents are usually asked the amount, then the length of time this amount covered. The latter is known as a “period code”. Period codes are used in conjunction with amounts to derive weekly figures for all receipts and payments. Some variables, such as interest on savings accounts, refer to the amount paid in the whole of the past year. These are also converted to a weekly amount.

Sometimes the period code relates to a lump sum or a one-off payment. In these cases, the corresponding value does not automatically convert to a weekly amount. For the data to be consistent across the survey, edits are applied to convert most lump sums and one-off payments to weekly amounts. In the same way, where period codes are recorded as ‘don’t know’ or ‘refused’, these are imputed so that the corresponding amount can be converted to a weekly value in the final dataset.

Near-zero amounts

In the same way as benefit amounts recorded as near-zero are treated, any cases of near-zero amounts in other variables are examined individually, and an edit decision is made.

Outliers

Statistical reports of the data are produced to show those cases where an amount was greater than four standard deviations from the mean. These relate to outliers, data that is beyond the expected value range of the variables being explored based on the other data in the set. It is important that outliers are transformed so that they can validly contribute toward the analysis (or be omitted). Although if the outliers are omitted this could increase the risk that false conclusions are drawn.

For up to seven largest values which are all over four standard deviations from the mean, the individual record is examined and where necessary (but only if a value looks unrealistic), the case is edited. The outliers remaining in the dataset are verified by examining other relevant data for that household; to establish whether the amount is aligned to values reported for other questions. Compared with earlier FRS years, a relatively low number of these edits are now carried out, because of the many range checks in the computerised questionnaire.

Credibility checks

A wide spectrum of checks is carried out for the internal consistency of certain variables; and values which are otherwise found to be unreasonable. Most tables in the FRS dataset will have several checks applied to them each year; the overall number of checks is more than 100. For example, one check on mortgage payments ensures that payments to the mortgage from outside the household are not greater than the mortgage payment itself. Such cases are examined and edited where necessary. These checks are reviewed annually to edit for changes to variables in the first instance, but more widely reviews are undertaken to add new credibility checks to test for errors that were found during the previous year’s processing and quality assurance.

6.6. Imputation

The main objective of imputation is to maximise the information available to users; the imputation carried out simplifies the analysis for users and helps to secure the uniformity of analysis created from the FRS data. If missing data were not imputed on the FRS, then it would be impossible to calculate accurately household income for many households surveyed.

The responses to some questions are much more likely to have missing values than others. For example, it is very unlikely that a respondent will refuse to give or will not know their age or marital status; whereas it is much more likely that they will not be able to provide precise information on the amount of interest received from their investments.

Areas where missing values are a problem are typically income values, such as employee earnings, income from self employment and income from investments. This is because these values are required in the calculation of derived variables, used for reporting total Individual Income [INDINC], Benefit Unit Income [BUINC] and ultimately Household Income [HHINC], used in the Households Below Average Income (HBAI) publication.

Results in the FRS published tables include imputed values. Elsewhere however, values are left to remain as missing in some variables (such as hours of care). Methodology Table M.4 is published alongside the main report each year to illustrate the extent of missing values. The main imputation methods are summarised below, in the order in which they are applied:

Closing down routes

As with any questionnaire, a typical feature of the FRS is a gatekeeper question positioned at the top of a sequence of questions, at which a particular response will open the rest of the sequence. If the gatekeeper question is answered as ‘don’t know’ or ‘refused’ then the whole sequence (route) is skipped.

A missing gatekeeper variable could be imputed such that a further series of answers would be expected. However, these answers will not appear because a whole new sequence (or route) has been opened. For example, if the amount of rent is missing for a record and has since been imputed, any further questions about rent would not have been asked. From the post-imputed dataset, it will appear that these questions should have been asked because a value is present for rent.

For this reason, where the gatekeeper question has been skipped the onward routes should be closed. In most cases, gatekeeper variables are of the ‘yes or no’ type. If missing, these would be imputed to ‘no’, on the basis that if a respondent does not know whether an item is received or paid, then it is likely that it was not received or paid.

Hot-decking

This process looks at characteristics within a record containing a missing value to be imputed and matches it up to another record with similar characteristics for which the variable is not missing. It then takes the known variable and copies it to the missing case. For example, when imputing the Council Tax Band of a household, the number of bedrooms, type of accommodation and region are used to search for a case with a similar record. This method ensures that imputed solutions are realistic and allows for a wide range of outcomes which maintain variability in the data.

Algorithms

These are used to impute missing values for certain variables, for example variables relating to mortgages. The algorithms range from very simple calculations to more sophisticated models, based on observed relationships within the data and individual characteristics, such as age and gender.

‘Mop-up’ imputation

This is achieved by running a general validation report of all variables and looking at those cases where missing values are still present. At this stage, variables are examined on a case-by-case basis to decide what to impute. Credibility checks are re-run to identify any inconsistencies in the data caused by imputation, and further edits are applied where necessary.

All imputations, by each of the methods above, are applied to the un-imputed dataset via a transaction database. This ensures auditability in that it is always possible to reproduce the original data.

Points to note with imputed data

Whilst several processes are used to impute missing values, it should be remembered that they represent only a very small proportion (typically two per cent) of the dataset
Imputation will have a greater effect on the distribution of original data for variables that have a higher proportion of non-response, as proportions of imputed data will be higher
As mentioned above, in certain situations, imputed values will be followed by ‘skipped’ values. It was decided in some cases that it was better to impute the top of a route only, and not large amounts of onward data. For a small proportion of imputed values it is not possible to close down a route. These cases are followed by ‘skipped’ responses (where a value might otherwise be expected)

6.7. Derived variables

Derived variables (DVs) are those which are not created by the original interview, but instead are made by combining information, both within the survey and from other sources.

They are created at the FRS user’s request. Their main purpose is to make it easier for users to carry out analysis and to ensure consistent definitions are used in all FRS analyses. For example, INDINC is a DV which sums all components of income to find an individual’s total income. This is possible because of the various sources collected by the survey.

As new information is collected in the survey, the relevant DVs are updated as necessary, and a record of these updates is available for users of the End-User-Licence and Safe Room datasets held at the UK Data Service (UKDS) and the Secure Research Service at ONS.

6.8. Grossing

Grossing-up is the term given to the process of applying factors to sample data so that they yield estimates for the overall population. The system used to calculate grossing factors for the FRS divides the sample into different groups. Grossing factors attempt to correct for differential non-response, at the same time as they scale up sample estimates. The groups are designed to reflect differences in response rates among different types of household. The software used to make the final weighted sample distributions matches the population distributions through a process known as calibration weighting.

Details of the control variables used in the grossing regimes for Great Britain and Northern Ireland are published annually in the Background Information and Methodology accompanying the main report.

In developing the FRS grossing regime, careful consideration has been given to the combination of control totals, and the way age ranges, Council Tax bands and so on, are grouped together. The aim has been to strike a balance so that the grossing system will provide, where possible, accurate estimates in different dimensions without significantly increasing variances. The published Methodology Table M_3 is shows the extent to which the FRS grossing regime controls for this bias in the achieved sample.

A review of the FRS Grossing Methodology was carried out by the ONS Methodological Advisory Service in 2013. Several relatively minor methodological improvements were made as a result, with the grossing calculations updated to use 2011 Census data at that point. Further details on the methodological changes were published as an explanatory paper of revisions made to the Family Resources Survey grossing methodology in 2014.

6.9. Methodological reviews

Material Deprivation

In December 2021, DWP commissioned a review of FRS material deprivation questions. The review was conducted by the Centre for Analysis of Social Exclusion (CASE) at the London School of Economics and Political Science (LSE). The review recommended a pilot of a short-list of 35 items and activities. Full details of the test questions and changes to methodology are given in Section 4 of the published review.

Following the pilot in 2022 to 2023, the FRS questionnaire introduced 29 updated questions in 2023 to 2024 for the whole of the survey year. Around 75% of the sample were asked the updated questions, with the remaining 25% being asked the old questions. The division of questions was made at random.

Integration of administrative data

As outlined in the DWP Statistical Work Programme – section 2.4, the department is committed to transforming its surveys through the integration of administrative data.

This is in the wider context of the UK Statistics Authority’s Strategy for data linking Joining Up Data for Better Statistics – Office for Statistics Regulation and OSR recommendations in their 2021 review of income-based poverty statistics, that DWP should explore the feasibility and potential of social survey and administrative data integration Review of Income-based poverty statistics – Office for Statistics Regulation.

A technical report on FRS Transformation work to date, with illustrative results for DWP benefits is available at: Family Resources Survey Transformation: integrating administrative data into the FRS. As developments are implemented, they are communicated to users via the FRS Release Strategy.

Inflation

Since the 2014 to 2015 survey year, the Consumer Price Index (CPI) has been used to adjust for inflation. More information concerning this methodological change was published as a statistical notice in 2016.

6.10. Output validation

Internal quality assurance: QA group

Internal users of the FRS dataset, that are authorised to have pre-published access, for either their own publication production and/or assisting with QA, are kept up-to-date with changes.

The FRS dataset is checked by a group of stakeholders, from DWP, other government departments and the devolved administrations. This adds valuable insight from subject-matter experts, which given the breadth of topics covered in the FRS is essential to support the knowledge and experience of the analytical team.

Checks are made at both the 6-month and 12-month stages to allow any concerns to be fully investigated before delivery of the 12-month data. Initial investigations are made by the DWP FRS team and further checks and/or changes can then be made by ONS. This means that the 12-month dataset already has corrections made and ensures that efficiency of processing can be optimised for release of the final dataset.

After internal validation checks and cleaning of the data have been completed stakeholders are presented with a summary of changes to the data and any issues that the FRS team have identified. The test dataset is shared with these stakeholders. Any further issues are then dealt with by direct discussion with the stakeholder. There may be a further issue of a revised test dataset, before the data is declared to be final, and ready to use for publication of analysis.

DWP has an ongoing dialogue with expert users of the FRS-based statistics in relation to several data issues. DWP expects this to continue in the future as part of its long-term work programme.

External assessors

Third-party quality assurance is provided by the Institute for Fiscal Studies (IFS) under contract (which focuses on HBAI and grossing aspects). The aim of the independent validation is to discover and correct inconsistencies in the Households Below Average Income (HBAI) estimates. IFS runs independent checks on the HBAI dataset and all key inputs (e.g. population estimates) and replicates the DWP estimates for each publication year.

IFS has over many years used the FRS, alongside HBAI data to build and maintain a tax and benefit micro-simulation model, which it uses to estimate the distributional impacts of tax and benefit policies. They are therefore ideally placed to act as authoritative quality assurers of the FRS and HBAI data.

IFS works with the DWP team to ensure data issues are resolved. The process is iterative, involving investigation of the causes of the differences (if necessary) by both parties, agreement on revision of processing code, implementation of those changes, and a further round of cross-checking to confirm that differences have been eliminated.

DWP has also established an Expert Advisory Group on Survey-based Income Statistics to support its development work. This is in line with the User Engagement Strategy for Statistics released by the GSS. The purpose of the Group is to provide advice to the Chief Statistician on plans to implement the integration of administrative data into the FRS and related outputs and other technical issues as they arise. Members of the Group include regular users of the FRS and its related outputs, including academic experts, users from third-sector organisations and methodology input from ONS.

Strengths

Investment has been made into the development of the upgraded interface processing tool - FRESCO.
A wide spectrum of checks, totalling more than 100 types of check, is carried out for the internal consistency of certain variables and values.
The FRS Team has the strength of experience and perspective from different analytical professions.
The benefit amounts extracted from the administrative data, for the purpose of editing benefit amounts, are judged to be very accurate and therefore add to the quality of the final FRS dataset.
An Expert Advisory Group provides knowledge and views on developments
DWP has an ongoing dialogue with expert policy and academic research users of FRS-based statistics.
Independent quality assurance has been provided by the Institute for Fiscal Studies (IFS) for several years.

Limitations

With over 250 derived variable codes to review each year for input changes there is a risk of error. On the rare occasions that these have been identified after release of the data, revisions have been made and users informed as soon as possible. We note lessons learned and continue to develop our processes, for example systems may need to be adapted to minimise risk.
The use of administrative data for editing benefit amounts is limited by the availability and access to the most appropriate administrative dataset, some of which are not DWP owned. Data sharing agreements are required to obtain access to data from other government departments.
It is acknowledged that some part of the benefit undercount in the FRS dataset is due to an under-representation of benefit recipients in the achieved FRS sample.

Question 7

7.  Summary

Accepted Answer

FRS accredited official statistics have been assessed by the DWP FRS Team as being assured to level A2 [Enhanced Assurance], as aligned to the UK Statistics Authority QAAD toolkit.

In constantly seeking to improve FRS accredited official statistics, steps will be taken to mitigate the limitations identified in this report, and progress will be communicated to users via the Release Strategy or Background Information and Methodology accompanying the main published report.

If you are of the view that this report does not adequately provide this level of assurance, or you have any other feedback, please contact us via team.frs@dwp.gov.uk with your concerns.

Cookies on GOV.UK

1. Introduction

Current strengths

Current limitations

1.1. Background

1.2. List of Datasets

Survey data sources

Benefit-related administrative data sources

Figure 1: Administrative data sources used for linking benefit records

2. Quality assurance of data assessment

2.1. UK Statistics Authority toolkit

Level A1 – basic assurance

Level A2 – enhanced assurance

Level A3 – comprehensive assurance

Figure 2: UK Statistics Authority quality assurance of administrative data (QAAD) risk and profile matrix

Source: Office for Statistics Regulation

2.2. Assessment and justification against the QAAD risk and profile matrix

FRS data is regarded as being a medium risk of data quality concern.

FRS datasets are regarded as higher public interest.

2.3. Practice areas of quality assurance

3. Operational context and data collection (matrix score A2)

3.1. Population and sample selection

The sampling frame in Great Britain

Sample design in Great Britain

Figure 3: A representation of the FRS sampling frame

Sampling in Northern Ireland

3.2. Sampling Error

Figure 4: How confidence intervals communicate the precision of an estimate

Communicating uncertainty within FRS-based Estimates

Example: Uncertainty measures for household composition

3.3. Non-sampling error

Data collection and fieldwork management

Response

Interviewer training

Questionnaire design

Strengths

Limitations

4. Communication with data supply partners (matrix score A2)

4.1. Field Report

4.2. Monthly Performance Indicator Reports

4.3. Issues Log

4.4. Annual Report

Strengths

Limitations

5. Quality assurance principles, standards and checks by data supplier (matrix score A2)

5.1. Survey interview and post-interview quality assurance

Figure 5: Data supplier processing

5.2. The interview

5.3. Post-interview checks

Checks and enhancements to collated data

Strengths

Limitations

6. Producers’ quality assurance investigations and documentation (matrix score A2)

6.1. The FRS Interface

6.2. Producer expertise

Figure 6: Data producer processing

6.3. Pre-processing checks

6.4. State support validation

Missing benefit amount values

Process of data linking for benefit editing

Universal Credit Full Service (UCFS) dataset

Work and Pensions Longitudinal Study (WPLS)

PIP Dataset

Single Housing Benefit Extract (SHBE)

Registration and Population Interaction Database (RAPID)

Near-zero amounts

Multiple benefits

Validation reports

6.5. Other pre-imputation cleaning

Weekly amounts

Near-zero amounts

Outliers

Credibility checks

6.6. Imputation

Closing down routes

Hot-decking

Algorithms

‘Mop-up’ imputation

Points to note with imputed data

6.7. Derived variables