Guidance

5. Use robust practices and work within your skillset

How to implement principle 5 of the Data Ethics Framework for the public sector.

To make best use of data we must ensure we have robust and consistent practices.

This involves:

  • working in multidisciplinary teams
  • getting help from experts outside your team
  • ensuring accountability of algorithms
  • avoiding outputs of analysis which could result in unfair decision making
  • designing for reproducibility
  • testing your model under a range of conditions
  • defining acceptable model performance: false negatives and false positives

Public servants must work within their skillset, recognising where they do not have the expertise to use a particular approach, type of data or tool to a high standard.

Read the Government Aqua Book before you design any analytical quality assurance process. It gives guidance on setting up analytical process and developing the right culture for providing reliable and accurate evidence for policy making.

Within your team, you should establish and document a consistent process for delivering a data science project or new process using data. Having a consistent approach to delivering projects will improve efficiency and simplify management.

Multidisciplinary teams

Ensuring practitioners are working in multidisciplinary teams, with access to all necessary subject matter expertise, is essential to ensure robust processes are established for using data appropriately.

Questions to consider before starting any new project:

  • are data scientists working in multidisciplinary teams to assess how data can be used to meet a user need?
  • do people at all levels of the team understand how and why the proposed use of data is expected to deliver or aid the delivery of a solution to the user need?

Getting help from experts

When using data or designing new processes or tools to do so, it’s essential that you recognise when you do not have all the necessary information or expertise to do something safely or accurately.

Always seek expert help when you do not have all the necessary skills or expertise.

Accountability of algorithms

You should always use the most simple model to achieve the desired outcome.

As machine learning algorithms are designed to learn and develop semi-independently, the processes used can become more difficult to interpret and understand. Teams need to have a reasonable understanding of how the machine learning or pipeline of machine learning models has worked to meet the user need.

You must be able to explain this to non-specialists.

You can design your machine learning process to improve accountability.

Staged process

This involves tackling prediction tasks as a pipeline of machine learning models which should facilitate the potential for overall interpretability. The staged process must be clearly documented with all necessary assumptions and caveats articulated.

Often we do not need to know exactly how a machine learning model has arrived at a particular result if we can show logical steps to achieving the particular outcome. This includes exactly what training data was used and which variables have contributed most to a particular result.

Simplification

Teams should always use the simplest model to achieve the intended measurable and quantifiable goal. This must be balanced against a potential loss in accuracy of the model. Pay extra attention to lost accuracy disproportionately affecting subgroups of the data that might not be well analysed by a simpler approach.

A more complex machine learning model is more likely to lose interpretability. This will be more or less tolerable depending on the intended outcome.

Social bias in algorithms

The Equality Act 2010 makes it illegal to discriminate against anyone based on nine protected characteristics.

The Equality Act 2010 includes the public sector equality duty which requires organisations to eliminate discrimination and support the advancement of equality.

Anyone doing analysis or making policy in the public sector must:

  • make sure that any gathered evidence does not inadvertently produce discriminatory decisions
  • recognise opportunities within their role to usefully flag social biases in data

This applies irrespective of the technique.

Analysis is most often done using historical data. This data might contain issues or patterns that policies are trying to mitigate, not replicate.

Proxy variables in machine learning

Some machine learning methods used to inform decisions can inadvertently base outputs on implicit proxies for variables which might be undesirable or even illegal.

Note that it is often not sufficient to remove protected characteristics from analysis to remove the possibility of discriminatory outcomes based on these variables. This journal article explains a number of statistical reasons for this.

One important reason is that if a protected feature is predictively powerful, you might be able to infer that protected feature from the other data you have. To make sure you’re not implicitly using that feature in decision making, you have to remove statistical traces of it from the whole dataset, rather than just omit it.

This does not prevent analysis being carried out on protected characteristics. Analysis on protected characteristics gives important insight for policy making, but you must make sure it does not inadvertently result in discriminatory actions, for example tailoring service delivery in an unfair way.

Reproducibility

Reproducibility in data-informed services and decision making is essential to:

  • demonstrate accuracy
  • aid transparency and accountability
  • allow others to use and share your work
  • ensure consistency of analysis in an organisation

The 3 requirements for reproducibility are: applying the same logic (code, model or process), to the same data, in the same environment.

The whole workflow should be supported by high quality documentation, including ethical issues raised in this framework and ways to mitigate them.

Using the simplest model possible, with adequate documentation, makes it easier for other teams to understand and use your work.

Version control

Using version control allows an analyst to create a very clear audit trail. It can also be used to formalise a system of quality assurance (QA), for example by pull request review. There are several tools for version control.

Other literate programing tools (like Jupyter notebooks and Rmarkdown) allow researchers to combine code and analysis with narrative, making for much more transparent analysis. The Software Sustainability Institute has done work on developing and publicising best practice in academia, which can be applicable to data science in government.

Software Carpentry and Data Carpentry have useful lessons which advise on good practice methods for version control and processes more broadly.

Reproducible Analytical Pipelines (RAP)

The Government Digital Service (GDS) data scientists have trialled the use of tools for creating reproducible workflows called Reproducible Analytical Pipelines (RAP). The RAP approach draws on existing process management tools such as reproducible research, software engineering and DevOps.

RAP aims to aid practitioners during the analytical process by automatically recording:

  • what changes were made
  • who made those changes
  • why those changes were made when those changes were made

It is important that you research and think about the best approach for your team. Email gdsdatascience@digital.cabinet-office.gov.uk for advice on developing a pipeline within your organisation.

Test the model

Good data science involves testing your models against reality, using appropriate model evaluation techniques.

Make sure you have a clear methodology for testing your findings from the start. This should include desired levels of accuracy and other performance measures.

Testing algorithmic systems before going live is essential.

As part of testing the model, you need to develop guidance on how often you need to update the data that trains it. How regularly do you need to check on a dynamic or live model that receives data about how it’s performing? This is covered more broadly by principle 7.

Define acceptable model performance: false negatives and false positives

You must decide what is acceptable in terms of false negatives and false positives within your intended system. This will determine the type of model and metrics you choose to use.

What your model is used for will determine the threshold for potential errors. Some false positives or negatives can be disastrous, while others simply result in wasted time and resources.

The target you set will determine the type of algorithm, metrics and acceptable loss function. Model performance as measured by these metrics should be compared to a null model or competing model. You must consider how relevant options or decisions can be explained for the purpose of accountability.

Published 13 June 2018