Guidance

Urgent data quality assurance guidance

Published 21 April 2021

Introduction

Data and analyses are good quality if they are fit for their intended purpose. Quality assurance is a collective term for the activities undertaken to ensure that work is fit for its purpose.

There is an increasing need to produce work at pace for urgent decision making, either for operational or policy purposes. This work must be fit for purpose, even if time is limited.

This guidance does not guarantee quality analysis. It covers the minimum steps you should do for urgent data work. It is intended as a last resort, it does not replace full and thorough quality assurance practices.

Use this only where the risk from poor quality is less than the risk of acting slowly.

Overarching themes of urgent quality assurance

How will the data and analysis be used?

You must work closely with the user(s) and understand how they intend to use the data and/or analysis. You can only gain this understanding by working closely with users.

You must understand the risks your users are willing to accept within various parts of the data. This will help you to prioritise your time and communicate your findings. But if they cannot accept risks around poor quality, consider carefully if analysis is possible within their timescales.

Prioritise your time, starting with what matters most

You have limited quality assurance resources. Focus on what is most important to your users and what will have the biggest impact on them.

Be transparent and frank

Communicate the frank conclusions of your quality assurance with users. Explain the parts of the data that you did not have time to investigate, and therefore cannot quantify potential limitations. This enables users to decide for themselves if they wish to continue to use the data, and what trade-offs they need to make.

Four steps for urgent quality assurance

  1. Prepare the data
  2. Consider the lifecycle
  3. Assess the data
  4. Communicate

1. Prepare the data

Read the metadata available for the data sources and ensure you understand the datasets. Depending on the way you hold or receive the data, tidy it into a format that will enable reliable manipulation and comparisons. This will likely mean units in rows and attribute data in columns. You will waste a lot of time later in your quality assurance if you do not start by formatting the data in a tidy and consistent way.

Make sure data labels reflect the reality of the data values for example “Address at last benefit claim” rather than “Current address,” which could mislead. Ensure that labels are consistent, using the same labels for the same data across multiple files.

2. Consider the data lifecycle

The data lifecycle includes the following steps:

  • Plan
  • Collect, acquire, ingest
  • Prepare, store and maintain
  • Use and process
  • Share and publish
  • Archive or destroy

Reading the relevant metadata will help you to consider and sketch out the lifecycle.

Ask yourself questions about the data at each point in the lifecycle. For example, in the ‘Plan’ phase consider what purpose the data was designed for. How does this differ from your analysis?

Consider what problems or errors could be introduced in each phase of the lifecycle. Could it have been input incorrectly during its collection? This may be a typo, or perhaps the respondent misunderstood what was being asked of them.

Examples of other possible introduced errors could include (but are not limited to):

  • incorrect data linking
  • sampling error
  • automation errors
  • human error during any manual data handling

If you do not know much about the lifecycle of the data, seek out someone who does to ask them about it. Remember each person working on the data has made decisions that could impact it.

3. Assess the data

Start with some basic metrics. How many records were you expecting in the dataset? Are the values of the data within a valid range? For example if an event took place on the date 01/01/2100 you would know that this is incorrect.

Check the completeness of your data. Are there missing records? Do particular rows or columns have more missing values than others?

If relevant, plot the data. Are there patterns in the data? What is causing the pattern? Is it something that you anticipated? If not, there could be issues in the data, investigate this. It might be caused by duplicate records, missing data, imputed data or default values.

Check the data for outliers and abnormalities. Is one event recorded several times?

Consider if the data is up to date. Does it cover the time period you were expecting?

Check the values are consistent:

  • in critical fields within records, for example does a record state that an individual is four years old, however also list an occupation as ‘Doctor’?
  • and across datasets, for example are attributes with the same label capturing the same concept?

Check if your findings are roughly in line with the other similar analysis on the same subject or from the same data. Engage with subject matter experts if they are available to sense check the findings.

Record any assumptions you have made with the findings of the assessment so that these can be communicated.

4. Communicate

Communicate with your users throughout the process. Ensure they are aware of:

  • any decisions you took in selecting or excluding data sources and how you determined whether they were fit for this purpose
  • which quality assurance activities you have undertaken and your findings, both positive and negative
  • aspects of the data you did not have time to cover
  • trade-offs made to ensure the timeliness of the data or analysis
  • any relevant metadata, this should be shared or published alongside the data or data analysis

Aim for them to be as informed about the data as you are.

Consider how to communicate your data quality assurance findings differently with different audiences. Explain any technical terms used.

Finally, ensure your quality assurance findings follow the data. For example, if the data or data analysis is used again for another purpose, ensure your findings are not forgotten.

Disclaimer

This guidance should only be used in extreme situations. It does not take the place of full and thorough quality assurance practices.

More information

The government data quality framework

The HM Treasury Aqua Book provides guidance on quality in the production of analysis

International Organization for Standardization’s Quality Management Principles

Feedback

If you have any suggestions for improvement or other feedback on this document, please email dqhub@ons.gov.uk.