Guidance

Quality Assurance (QA) and data publication

Updated 6 December 2023

Applies to England

We strive to ensure that any data provided to our users is of the highest quality. To achieve this, all survey data goes through several stages of quality assurance (QA) with automated scripts used where appropriate. These are reviewed and updated on a quarterly basis.

1. Quality assurance workflow

Each quarter, a dataset specification document is updated to ensure that the dataset is clear and easy to use. This includes the naming and labelling conventions for questions and response codes.

Our contractor then produces the dataset using SPSS, which is checked and cleaned using syntax developed by the Kantar Research Team. This is then delivered using a secure transfer system, where Natural England then undertake their own detailed QA workflow. This is split into four stages.

1.1 High Level checks

Firstly, the dataset is checked as a whole and includes (but is not limited to) checks on the following:

  • Fields – All fields exist – All fields contain data where expected – Valid responses provided
  • IDs – All IDs are unique
  • Dates – Interview dates are within the survey lifespan – Visit dates are within 14 days of an interview
  • Coordinates – Visit coordinates plotted and checked for spurious values – Postcodes plotted and checked for spurious values
  • Dataset admin – Fields named and labelled correctly – No spelling mistakes

1.2 Module Level checks

Next, data is checked across each module and focuses on a more detailed QA than the high-level checks. Routing and rule checks are undertaken for each module. These ensure that appropriate questions have been asked based on which modules have been answered, as well as appropriate responses within each question based on previous responses. For example:

  • If a respondent has answered M2 then they must have been asked all questions within that module.
  • If a respondent answered M2_Q2 then they must have answered M2_Q3.
  • If a respondent selected option 4 in M2_Q2 then they cannot select option 4 in M2_Q3.

Once these have been satisfied, we look at similar checks across all modules. These ensure that respondents have been asked the correct combination of modules, and the quotas are met for various demographics and geographies. The survey questionnaire provides a breakdown of survey modules and sample sizes.

1.3 Sense checks

Once the data has been quality assured, its context is then checked (i.e. do the results make sense?). This includes, but is not limited to:

  • Checking weighted responses against monthly indicators.
  • Checking weighted responses against historical datasets.
  • Checking weighted responses of a subset of random questions make sense.
  • Checking similar figures between linked questions within the survey.
  • Comparing standardised questions with other surveys.

Issues discovered during these checks are raised with Kantar who then provide an updated dataset, and these three sets of checks are started again.

Systematic bias checks are also undertaken regularly throughout the survey. These are an extension of the sense checks and enable us to see whether systematic bias exists in the data in regard to non-completion of the survey, individual modules or individual questions as well as lack of comprehension (indicated by latency) among survey participants.

1.4 Publication checks

Once Natural England have received final datasets from Kantar, pre-publication checks are run to double check key aspects of the dataset including (but not limited to) ensuring no sensitive data is published where it shouldn’t be, the correct fields are published for the correct datasets and all the appropriate weights have been supplied.

2. Data publication

Data is currently published on GOV.UK but we are exploring moving the data onto the UK Data Service (UKDS) to increase the robustness in how we manage disclosure of the data collected within the survey. By using the UKDS, we can provide varying levels of potentially sensitive data in line with official advice from the Office for National Statistics as well as adhering to the highest standards of data management required for National Statistic status. Additionally, the UKDS undertake their own disclosure checks which provides another level to the QA procedure for the dataset and also allows us to publish the data in different formats which are used widely across our core users.

Two datasets are planned to be available.

Safeguarded: This dataset is designed for the majority of users, and is the version currently published. There is a very low risk for data to be disclosive due to statistical disclosure control (SDC) undertaken as recommended by the ONS, without having a large effect on the quality of data. The aim of SDC is to reduce the ability for someone to identify a respondent based on a combination of their responses to demographic and geographical questions. As such, edits have been made to the following fields:

  • Age – Removed
  • M2A_Q5 - Removed
  • Ethnicity_Detailed - Removed
  • Visit_Date – Banded into week number for the year
  • M2A_SUB_Q4B – Top coded to £100
  • No_Of_Children – Top coded to ‘6 and over’
  • M3_Q1 – Top coded to ‘6 and over’
  • Income – Top coded to £50,000+
  • Ethnicity_Detailed - Removed
  • Postcode data removed
  • Home geography data removed (LA, UTLA, LSOA, IMD_RANK)
  • Visit geography data removed (LA, UTLA, LSOA, IMD_RANK)

To access this dataset, users need an account with the UKDS and must adhere to their End User License agreement.

Controlled: This dataset is designed for users who are likely to undertake advanced modelling or statistical analysis. Statistical Disclosure Controls haven’t been undertaken on the dataset, and only the respondent postcode, and associated coordinates, have been removed. As such, users will have to be accredited with the UKDS and undertake a training course before they can access the data. Also, their data usage has to be approved by the relevant Data Access Committee.